cross-posted from: https://lemmygrad.ml/post/9899994

wake up

open twitter to catch up

see deepseek did it again

(and as a reminder, Deepseek-r1 only came out in January so it’s been less than 12 months since their last bombshell)

One more graph:

What this all means

Traditional AI models are trained to be “rewarded” for a correct final answer. Get the expected answer, win points, be incentivized to get the answer more often. This has a major flaw: a correct answer does not guarantee correct reasoning. A model can guess, use a shortcut, or even have flawed logic but still output the right answer. This approach completely fails for tasks like theorem proving, where the process is the product. DeepSeekMath-V2 tackles this with a novel self-verifying reasoning framework:

  • the Generator: One part of the model generates mathematical proofs and solutions.
  • the Verifier: Another part acts as the critic, checking every step of the reasoning for logical rigor and correctness
  • The Loop: If the verifier finds a flaw, it provides feedback, and the generator revises the proof. This creates a co-evolution cycle where both components push each other to become smarter

This new approach allows the model to set record-breaking performance. As you can see from the charts above, it scores second-place on ProofBench-Advanced, just behind Gemini. But Gemini isn’t open-source, Deepseekmath-V2 is.

The model weights are available on Huggingface under an Apache 2.0 license: https://huggingface.co/deepseek-ai/DeepSeek-Math-V2.

This means researchers, developers, and enthusiasts around the world can download, study, and build upon this model right now. They can fine-tune or change the model to fit their needs and research, which promises a lot of exciting math discoveries happening soon - I predict (on no basis mind you) that this will help solve computing problems to start with, either practical or theoretical.

Beyond just the math, the self-verification mechanism is a crucial step towards building AI systems whose reasoning we can trust, which is vital for applications such as scientific research, formal verification, and safety-critical systems. It also proves that ‘verification-driven’ training is a viable and powerful alternative to the ‘answer-driven’ method used to this day.

  • JustSo [she/her, any]@hexbear.net
    link
    fedilink
    English
    arrow-up
    9
    ·
    8 days ago

    Open weights models are not open source.

    Open source models are like this: https://github.com/huggingface/smollm with pre-training and training data provided, training methods described in reproducible detail and of course weights for people to play with without needing to re-do the work unnecessarily.

    Also party-sicko me when China keeps leapfrogging US tech at the one thing the Americans are focused on and they don’t even need to fuck up their entire economy to do it.

    • ☆ Yσɠƚԋσʂ ☆@lemmygrad.mlOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      8 days ago

      The fact that a small team like DeepSeek can produce top tier models on a shoestring budget exposes that the US AI industry is a giant scam. If a handful of researchers can advance the entire field with groundbreaking papers, it begs the question of where the trillions of dollars spent in the US are actually going. There is clearly no meaningful connection between the astronomical spending and tangible, productive output.