• 15 Posts
  • 89 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle
  • Hi! So heres the rundown.

    You are going to need to be willing to learn how computer program services send text messages to eachother over open ports, how to call on a API in a programming script, and slowly piece together how to work with ollamas external API calling tool functions. Heres the documentation

    Essentially you need to

    1. learn how ollama external API works. How to send it text data using a basic program in python on an open port and recieve data back to put into a text file.

    2. learn how to make that python program pull weather and time data from openweather

    3. learn how to feed that weather and time data into ollama on an open port as part of a tool calling function. A tool call is a fancy system prompt that tells the model how to interface with the data in a well defined paratamized way. you say a keyword like get weather, it sends a request to your python program to get data from openweather and sends it back in way the llm is instructed to process.

    example: https://medium.com/@ismalinggazein/openai-function-calling-integrating-external-weather-api-6935e5d701d3

    Unless you are already a programmer who works with sending and recieving data over the internet to be processed, this is a non-trivial task that requires a lot of experimentation and getting your hands dirty with ports and coding languages. Im currently getting ready to delve into this myself so I know its all can feel overwhelming. Hope this helps.


  • This meme was originally made for the [email protected] community in an attempt to give a super niche manga artist fan place a little bit of engagement, I crossposted it here as an afterthought didn’t expect go be brigaded so hard by armchair memologist over the objective definition and location of the funny.

    You’re absolutely correct that you need to have read the Blame! manga to get the reference on this one to really enjoy, even if you did its not that deep. Not too much thought went into it I was high as shit just pasting icons with the ‘linux chad big energy beam, windows/microsoft wojak bad guys its fired at’. Im personally okay with not every one of my memes being super accessible or community bangers I had fun making this and putting the template together. If you get the humor or like the template great. If you don’t, oh well downvote say ‘where funny’ and move on with your life cause im not wasting my time explaining what a graviational beam emitter is to snobs who don’t care in the first place.











  • SmokeyDope@lemmy.worldtoSelfhosted@lemmy.worldlightweight blog ?
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    8 days ago

    Would something like this interest you? Gemtext formatted to html is about as light weight as it gets. lots of automatic gemtext blog software on github that also formats and mirrors an html copy. Whenever a news page article gets rendered to gemtext through newswaffle it shrinks about 95-99% of the page size while keeping text intact. Let me know if you want some more information on gemini stuff.





  • The community consensus is that bandwith is key. An extra 300gb/s is a big bonus.

    Your taking a small amount of risk with second hand but generally laptops will last a good long while and an extra thousand or so off for a better performing product is a decent deal IMO. Ive been using my second hand laptop for over a decade now, so I would probably be biased to take the dice roll on used after testing it out in person.

    I hope it works out for you! Man I wish I could get my rig upgraded on the cheap but GPU market is insane. I wonder how the nvidia digits or and strix will compare to a Mac?






  • Ive tried official Deepseek qwen 2.5 14b r1 distill and a few unofficial mistrals trained on R1 CoT. They are indeed pretty amazing and I found myself switching between a general purpose model and a thinking model regularly before this released.

    DeepHermes is a thinking model family with R1 distill CoT that you can toggle between standard short output or spending a few thousand tokens thinking about a solution. I found that pure thinking models are fantastic for asking certain kinds of problem solving questions, but awful at following roleplay scenarios or complex personality types. DeepHermes let’s you have your cake and eat it too by letting CoT be optional while keeping regular system prompt capabilities.

    The thousands of tokens spent thinking can get time consuming when you only getting 3t/s on the larger 24b models. Its abilities are impressive even if it takes 300 seconds to fully think out a problem at 2.5t/s.

    Thats why I am so happy DeepHermes 8b is pretty intelligent with CoT enabled so I can fit a thinking model entire in vram and its not dumb as rocks in knowledge base either. I’m getting 15-20t/s with 8b instead of 2.5-3t/s partially offloading a larger model.



  • I just spent a good few hours optimizing my LLM rig. Disabling the graphical interface to squeeze 150mb of vram from xorg, setting programs cpu niceness to highest priority, tweaking settings to find memory limits.

    I was able to increase the token speed by half a second while doubling context size. I don’t have the budget for any big vram upgrade so I’m trying to make the most of what ive got.

    I have two desktop computers. One has better ram+CPU+overclocking but worse GPU. The other has better GPU but worse ram, CPU, no overclocking. I’m contemplating whether its worth swapping GPUs to really make the most of available hardware. Its bee years since I took apart a PC and I’m scared of doing somthing wrong and damaging everything. I dunno if its worth the time, effort, and risk for the squeeze.

    Otherwise I’m loving my self hosting llm hobby. Ive been very into l learning computers and ML for the past year. Crazy advancements, exciting stuff.


  • Cool, page assist looks neat I’ll have to check it out sometimes. My llm engine is kobold.cpp, and I just user the openwebui in internet browser to connect.

    Sorry I don’t really have good suggestions for you beyond to just try some of the more popular 1-4bs in a very high quant if not full f8 and see which works best for your use case.

    Llama 4b, mistral 4b, phi-3-mini, tinyllm 1.5b, qwen 2-1.5b, ect ect. I assume you want a model with large context size and good comprehension skills to summarize youtube transcripts and webpage articles? At least I think thats what the add-on you mentioned suggested was its purpose.

    So look for models with those things over ones that try to specialize in a little bit of domain knowledge.





  • DeepHermes 24B CoT thought patterns feels about on par with the official R1 distill Ive tried. Its important to note though my experience is limited to the deepseek r1 NeMo 12B distill as thats what fit nice and fast on my card.

    All the r1 distill thought process internal monolouge humanisms “let me write that down” “if I remember correctly” “oh, but wait that doesnt sound right lets try again” are there. the multiple 'but wait, what if’s" before ending the thought to examine multiple sides are there too. It spends about 2-5k tokens thinking. It tends to stay on track and catch minor mistakes or hallucinations.

    Compared to the unofficial mistral-24b distills this is top tier for sure. I think its toe to toe with ComputationDolphins 24B R1 distill, and its just a preview.