How to use GPUs over multiple computers for local AI?

marauding_gibberish142@lemmy.dbzer0.com · 3 days ago

How to use GPUs over multiple computers for local AI?

TootGuitar@sh.itjust.works · edit-2 3 days ago

This is false: Mistral small 24b at q4_K_M quantization is 15GB. q8 is 26GB. A 3090/4090/5090 with 24GB or two cards with 16GB (I recommend the 4060 Ti 16GB) will work fine with this model, and will work in a single computer. Like others have said, 10Gbe will be a huge bottleneck, plus it’s just simply not necessary to distribute a 24b model across multiple machines.

marauding_gibberish142@lemmy.dbzer0.com · 3 days ago

Thank you, but which consumer motherboard + CPU combo is giving me 32 lanes of PCIe Gen 4 neatly divided into 2 x16 slots for me to put 2 GPUs in? I only asked this question because I was going to buy used computers and stuff a GPU in each.

Your point about networking is valid, and I’ll be hesitant to invest in 25Gbe right now

CondorWonder@lemmy.ca · edit-2 3 days ago

You don’t need cards to have full bandwidth, they only time it will matter is when you’re loading the models on the card. You need a motherboard with x16 slots but even x4 connections would be good enough. Running the model doesn’t need a lot of bandwidth. Remember you only load the model once then reuse it.

An x4 pcie gen 4 slot has ~7.8 GiB/s theoretical transfer rate (after overhead), a x16 has ~31.5GiB/s - so disk I/O is likely your limit even for a x4 slot.

overhead was already in calculations

marauding_gibberish142@lemmy.dbzer0.com · 3 days ago

I see. That solves a lot of the headaches I imagined I would have. Thank you so much for clearing that up