The problem is simple: consumer motherboards don’t have that many PCIe slots, and consumer CPUs don’t have enough lanes to run 3+ GPUs at full PCIe gen 3 or gen 4 speeds.

My idea was to buy 3-4 computers for cheap, slot a GPU into each of them and use 4 of them in tandem. I imagine this will require some sort of agent running on each node which will be connected through a 10Gbe network. I can get a 10Gbe network running for this project.

Does Ollama or any other local AI project support this? Getting a server motherboard with CPU is going to get expensive very quickly, but this would be a great alternative.

Thanks

  • Natanox@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    Depends on which GPU you compare it with, what model you use, what kind of RAM it has to work with, ecetera. NPU’s are purpose-built chips after all. Unfortunately the whole tech is still very young, so we’ll have to wait for stuff like ollama to introduce native support for an apples-to-apples comparison. The raw numbers to however do look promising.