makeasnek@lemmy.ml to AI@lemmy.mlEnglish · 10 months agoLLM ASICs on USB sticks?lemmy.mlimagemessage-square13fedilinkarrow-up11arrow-down10file-textcross-posted to: [email protected]
arrow-up11arrow-down1imageLLM ASICs on USB sticks?lemmy.mlmakeasnek@lemmy.ml to AI@lemmy.mlEnglish · 10 months agomessage-square13fedilinkfile-textcross-posted to: [email protected]
Source: nostr https://snort.social/nevent1qqsg9c49el0uvn262eq8j3ukqx5jvxzrgcvajcxp23dgru3acfsjqdgzyprqcf0xst760qet2tglytfay2e3wmvh9asdehpjztkceyh0s5r9cqcyqqqqqqgt7uh3n Paper: https://arxiv.org/abs/2406.02528
minus-squareMike1576218@lemmy.mllinkfedilinkarrow-up0·10 months agollama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.
llama2 gguf with 2bit quantisation only needs ~5gb vram. 8bits need >9gb. Anything inbetween is possible. There are even 1.5bit and even 1bit options (not gguf AFAIK). Generally fewer bits means worse results though.