The problem is simple: consumer motherboards don’t have that many PCIe slots, and consumer CPUs don’t have enough lanes to run 3+ GPUs at full PCIe gen 3 or gen 4 speeds.

My idea was to buy 3-4 computers for cheap, slot a GPU into each of them and use 4 of them in tandem. I imagine this will require some sort of agent running on each node which will be connected through a 10Gbe network. I can get a 10Gbe network running for this project.

Does Ollama or any other local AI project support this? Getting a server motherboard with CPU is going to get expensive very quickly, but this would be a great alternative.

Thanks

  • Natanox@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    3 days ago

    May take a look at systems with the newer AMD SoC’s first. They utilize the systems’ RAM and come with a proper NPU, once ollama or mistral.rs are supporting those they might give you sufficient performance for your needs for way lower costs (incl. power consumption). Depending on how NPU support gets implemented it might even become possible to use NPU and GPU in tandem, that would probably enable pretty powerful models to be run on consumer-grade hardware at reasonable speed.

      • Natanox@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 days ago

        Depends on which GPU you compare it with, what model you use, what kind of RAM it has to work with, ecetera. NPU’s are purpose-built chips after all. Unfortunately the whole tech is still very young, so we’ll have to wait for stuff like ollama to introduce native support for an apples-to-apples comparison. The raw numbers to however do look promising.

    • just_another_person@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      7
      ·
      3 days ago

      It wouldn’t even matter. OP doesn’t understand how any of this works, and is instead just running rampant calling everything bullshit 😂