Using Mac M2 Ultra 192GB to Self-Host LLMs?

shaserlark@sh.itjust.works · 4 days ago

I think you’re gonna need to implement function calling in some way. If you add it yourself that’s kinda complicated to implement in a non-annoying way but there must be some assistant frameworks that you can connect with your local LLM instead of an external API

shaserlark@sh.itjust.works · 8 days ago

This is honestly awesome! I was thinking about a similar setup for a long time but wasn’t sure how to do this exactly, this seems exactly like the setup I was looking for. Thank you!

shaserlark@sh.itjust.works · 28 days ago

I think cops are just getting fucked over by dealers so they don’t know the real prices

shaserlark@sh.itjust.works · 1 month ago

You can drploy a Cloudflare worker that exposes an APi endpoint with an SQLite DB completely for free and without doing any maintenance. I don’t think the DB is encrypted , so it wouldn’t be my first choice if privacy is a concern. There’s a bit of a learning curve with all the UI bloat but once you figured it out it’s a very hassle free solution.

shaserlark@sh.itjust.works · 1 month ago

I’ve read a lot about using a VPS with reverse proxy but I’m kind of a noob in that area. How exactly does that protect my machine? Couldn’t an attacker with access to the VPS still harm my local machine? Currently I’m just using a WireGuard tunnel to log into my server, from what I understand you’d tunnel the service from the VPS to the homeserver and then on the VPS URL you could watch right m?

And do I understand correctly that since we’re using the reverse proxy the possible attack surface just from finding the domain would be limited to the web interface of e.g. Jellyfin?

Sorry for the chaotic & potentially stupid questions, I’m just really a confused beginner in this area.

shaserlark@sh.itjust.works · 2 months ago

Shit just works as usual

shaserlark@sh.itjust.works · 2 months ago

OP that’s a killer list of books you’ve read. IMO you have a point. To all the people who say that you’d be alienated from watching old movies, that method acting is important and that special effects of the last 20 years are what makes it different, idk. It really depends on what you’re looking for.

Hitchcock movies or the stuff with Humphrey Bogart, Marlon Brando, even the super racist Italo Western movies, the very old Kubrick stuff, that’s all great cinema.

I’m as left wing as it gets but I also get very alienated by the “diversity” and “feminism“ modern Hollywood & Netflix cinema. It’s the same type of diversity and feminism that exists in corporate, where there is diversity in terms of ethnicity and sexuality, but only within class. It’s a fictional world to me the same way the old movies are, just done by a different bunch of people living in their own world.

There’s still some good cinema and good shows out there every now and then, but to think old movies can’t compete with modern TV & cinema just because they’re old is a very simplistic take.

shaserlark@sh.itjust.works · 2 months ago

https://hotio.dev/containers/qbittorrent/

Why don’t you use the hotio container? That already has it baked in

shaserlark@sh.itjust.works · 3 months ago

Thanks for the reply, still reading here. Yeah thanks to the comments and reading some benchmarks I abandoned the idea of getting an Apple, it’s just too slow.

I was hoping to test Qwen 32B or llama 70b for running longer contexts, hence the apple seemed appealing.

shaserlark@sh.itjust.works · 3 months ago

Congrats on being that guy

shaserlark@sh.itjust.works · 3 months ago

You’re aware that there’s the OpenAI API library right? https://github.com/openai/openai-python

It’s really nothing fancy especially on Lemmy where like 99% of people are software engineers…

shaserlark@sh.itjust.works · 3 months ago

Are you drunk?

shaserlark@sh.itjust.works · edit-2 3 months ago

Yeah I found some stats now and indeed you’re gonna wait like an hour to process if you throw like 80-100k token into a powerful model. With APIs that kinda works instantly, not surprising but just to give a comparison. Bummer.

shaserlark@sh.itjust.works · edit-2 3 months ago

Thanks! Hadn’t thought of YouTube at all but it’s super helpful. I guess that’ll help me decide if the extra Ram is worth it considering that inference will be much slower if I don’t go NVIDIA.

shaserlark@sh.itjust.works · 3 months ago

Yeah I was thinking about running something like Code Qwen 72B which apparently requires 145GB Ram to run the full model. But if it’s super slow especially with large context and I can only run small models at acceptable speed anyway it may be worth going NVIDIA alone for CUDA.

shaserlark@sh.itjust.works · 3 months ago

Meh, ofc I don’t.

shaserlark@sh.itjust.works · 3 months ago

Thanks, that’s very helpful! Will look into that type of build

shaserlark@sh.itjust.works · 3 months ago

I understand what you’re saying but I’m coming to this community because I like having more input, hear about the experience of others and potentially learn about things I didn’t know about. I wouldn’t ask specifically in this community if I wouldn’t want to optimize my setup as much as I can.

shaserlark@sh.itjust.works · 3 months ago

Interesting, is there any kind of model you could run at reasonable speed?

I guess over time it could amortize but if the usability sucks that may make it not worth it. OTOH really don’t want to send my data to any company.

shaserlark@sh.itjust.works · 3 months ago

I’d honestly be open for that but would an AMD setup not take up a lot of space and consume lots of power / be loud?

It seems like in terms of price & speed, the Macs suck compared to other options, but if you don’t have a lot of space and don’t want to hear an airplane engine constantly I’m wondering if there are options.

shaserlark@sh.itjust.works · edit-2 3 months ago

Using Mac M2 Ultra 192GB to Self-Host LLMs?

shaserlark@sh.itjust.works · edit-2 5 months ago

Using Mac M2 Ultra 192GB to Self-Host LLMs?

Using Mac M2 Ultra 192GB to Self-Host LLMs?

Selfhosting GitLab?

Selfhosting GitLab?