Lithograph by Margaret Stokes of fol. 34r in the Book of Kells

DavidGarcia@feddit.nl · 4 days ago

Removed by mod

DavidGarcia@feddit.nl · 7 days ago

some linux users dream of having their grandma run linux so they never have to look at windows or macos ever again

DavidGarcia@feddit.nl · 10 days ago

bad chart

DavidGarcia@feddit.nl · 18 days ago

shockingly enough, corporations always act in their own self-interest

DavidGarcia@feddit.nl · 18 days ago

it’s additional rules for the subreddit on top of the site wide rules for all of reddit

DavidGarcia@feddit.nl · 1 month ago

Q4 will give you like 98% of quality vs Q8 and like twice the speed + much longer context lengths.

If you don’t need the full context length, you can try loading the model at shorter context length, meaning you can load more layers on the GPU, meaning it will be faster.

And you can usually configure your inference engine to keep the model loaded at all times, so you’re not loosing so much time when you first start the model up.

Ollama attempts to dynamically load the right context lenght for your request, but in my experience that just results in really inconsistent and long time to first token.

The nice thing about vLLM is that your model is always loaded, so you don’t have to worry about that. But then again, it needs much more VRAM.

DavidGarcia@feddit.nl · 2 months ago

In my experience anything similar to qwen-2.5:32B comes closest to gpt-4o. I think it should run on your setup. the 14b model is alright too, but definitely inferior. Mistral Small 3 also seems really good. anything smaller is usually really dumb and I doubt it would work for you.

You could probably run some larger 70b models at a snails pace too.

Try the Deepseek R1 - qwen 32b distill, something like deepseek-r1:32b-qwen-distill-q4_K_M (name on ollama) or some finefune of it. It’ll be by far the smartest model you can run.

There are various fine tunes that remove some of the censorship (ablated/abliterated) or are optimized for RP, which might do better for your use case. But personally haven’t used them so I can’t promise anything.

DavidGarcia@feddit.nl · edit-2 2 months ago

when you’re in a glowing cum dripping from the ceiling competition and yours is fastest so you win

DavidGarcia@feddit.nl · 2 months ago

I don’t know, feddit.nl is pretty chill. I always see everything and barely anything objectable

DavidGarcia@feddit.nl · 2 months ago

uhm ackchushually A1

DavidGarcia@feddit.nl · 2 months ago

True, but the newest mistral model is already pretty great

DavidGarcia@feddit.nl · 2 months ago

That’s why I avoid wrestlers

DavidGarcia@feddit.nl · 2 months ago

Suspicious lack of Qubes. Who do you work for??? the CIA? China? The Rwandan National Intelligence and Security Agency?

DavidGarcia@feddit.nl · 3 months ago

no tonley fritto down lowed, butte emaity lie sensed a swell

DavidGarcia@feddit.nl · 3 months ago

let’s just rebrand instances to superleddits and communities to subleddits 🤣

DavidGarcia@feddit.nl · 3 months ago

“Reddit but you can block the part that annoys you”

DavidGarcia@feddit.nl · 3 months ago

does anyone know why this sudden uptick?

DavidGarcia@feddit.nl · 3 months ago

why would I want to stream myself peeing??

DavidGarcia@feddit.nl · 3 months ago

Every Linux user has to go through a period of compulsive distro hopping. Don’t worry, eventually you’ll grow tired of it and just settle on one workhorse distro.

DavidGarcia@feddit.nl · 3 months ago

construction workers all across Siberia sigh in relief

DavidGarcia@feddit.nl · 1 year ago

Lithograph by Margaret Stokes of fol. 34r in the Book of Kells