GPU | VRAM | Price (€) | Bandwidth (TB/s) | TFLOP16 | €/GB | €/TB/s | €/TFLOP16 |
---|---|---|---|---|---|---|---|
NVIDIA H200 NVL | 141GB | 36284 | 4.89 | 1671 | 257 | 7423 | 21 |
NVIDIA RTX PRO 6000 Blackwell | 96GB | 8450 | 1.79 | 126.0 | 88 | 4720 | 67 |
NVIDIA RTX 5090 | 32GB | 2299 | 1.79 | 104.8 | 71 | 1284 | 22 |
AMD RADEON 9070XT | 16GB | 665 | 0.6446 | 97.32 | 41 | 1031 | 7 |
AMD RADEON 9070 | 16GB | 619 | 0.6446 | 72.25 | 38 | 960 | 8.5 |
AMD RADEON 9060XT | 16GB | 382 | 0.3223 | 51.28 | 23 | 1186 | 7.45 |
This post is part “hear me out” and part asking for advice.
Looking at the table above AI gpus are a pure scam, and it would make much more sense to (atleast looking at this) to use gaming gpus instead, either trough a frankenstein of pcie switches or high bandwith network.
so my question is if somebody has build a similar setup and what their experience has been. And what the expected overhead performance hit is and if it can be made up for by having just way more raw peformance for the same price.
What do I need to run Kimi? Does it have apple silicon compatible releases? It seems promising.
Depends. You’re in luck, as someone made a DWQ (which is the most optimal way to run it on Macs, and should work in LM Studio): https://huggingface.co/mlx-community/Kimi-Dev-72B-4bit-DWQ/tree/main
It’s chonky though. The weights alone are like 40GB, so assume 50GB of VRAM allocation for some context. I’m not sure what Macs that equates to… 96GB? Can the 64GB can allocate enough?
Otherwise, the requirement is basically a 5090. You can stuff it into 32GB as an exl3.
Note that it is going to be slow on Macs, being a dense 72B model.