• 0 Posts
  • 186 Comments
Joined 1 year ago
cake
Cake day: June 19th, 2023

help-circle
  • Using Ollama to try a couple of models right now for an idea. I’ve tried to run Llama 3.2 and Qwen 2.5 3b, both of which fits my 3050 6G’s VRAM. I’ve also tried for fun to use Qwen 2.5 32b, which fits in my RAM (I’ve got 128G) but it was only able to reply a couple of tokens per second, thereby making it very much a non-interactive experience. Will need to explore the response time piece a bit further to see if there are ways I can lean on larger models with longer delays still.












  • If memory serves, 175B parameters is for the GPT3 model, not even the 3.5 model that caught the world by surprise; and they have not disclosed parameter space for GPT4, 4o, and o1 yet. If memory also serves, 3 was primarily English, and had only a relatively small set of words (I think 50K or something to that effect) it was considering as next token candidates. Now that it is able to work in multiple languages and multi modal, the parameter space must be much much larger.

    The amount of things it can do now is incredible, but our perceived incremental improvements on LLM will probably slow down (due to the pace fitting to the predicted lines in log space)… until the next big thing (neural nets > expert systems > deep learning > LLM > ???). Such an exciting time we’re in!

    Edit: found it. Roughly 50K tokens for input output embedding, in GPT3. 3Blue1Brown has a really good explanation here for anyone interested: https://youtu.be/wjZofJX0v4M





  • 4o does perform web searches, give summaries from a couple of pages, and include the link to those pages when prompted properly.

    However, as most people know, first couple results doesn’t always tell the full picture and further actual researches are required… but, most “AI assistant” (also including things like those voice assistants in speakers) users tends to take the first response as fact…

    ¯\_(ツ)_/¯



  • Google did not make RCS; RCS is made by GSM consortium as succession of SMS, Google extended it to add some extra features such as end to end encryption (but only when messages are routed through their servers).

    China mandated 5G sold in China must support RCS, hence why Apple added support for this. Since Google is basically banned in China, you can pretty much bet RCS going into/out of China is going to be unencrypted.

    So you’re basically stuck between getting inferior unencrypted messages, or routing everything through Google.

    Avoid RCS like the plague.



  • It is easier to think of the SSL termination in legs.

    1. Client to Cloudflare; if you’re behind orange cloud, you get this for free, don’t turn orange cloud off unless you want to have direct exposure.
    2. Cloudflare to your sever; use their origin cert, this is easiest and secure. You can even get one made specific so your subdomains, or wildcard of your subdomain. Unless you have specific compliance needs, you shouldn’t need to turn this off, and you don’t need to roll your own cert.
    3. Your reverse proxy to your apps; honestly, it’s already on your machine, you can do self signed cert if it really bothers you, but at the end of the day, probably not worth the hassle.

    If, however, you want to directly expose your service without orange cloud (running a game server on the same subdomain for example), then you’d disable the orange cloud and do Let’s Encrypt or deploy your own certificate on your reverse proxy.