The article discusses the recent disruption in the generative AI industry caused by DeepSeek, a Chinese AI company. Here are the key points:
-
DeepSeek has introduced AI models that are competitive with OpenAI’s but significantly more efficient and cheaper to run.
-
This development challenges the prevailing narrative that AI models must be expensive and require massive infrastructure investments.
-
DeepSeek’s models are open-source and can be run locally on modest hardware, unlike OpenAI’s closed and resource-intensive models.
-
The company’s V3 model is competitive with OpenAI’s GPT-4 and Anthropic’s Claude, while being 53 times cheaper to run.
-
DeepSeek’s R1 model competes with OpenAI’s reasoning model (o1) at a fraction of the cost.
-
The company has also released an image generation model that reportedly outperforms StableDiffusion and DALL-E 3.
-
DeepSeek’s approach has raised questions about the massive investments made by tech giants in AI infrastructure.
-
There are concerns about DeepSeek’s funding sources and potential Chinese state involvement, though these remain speculative.
-
The article suggests that OpenAI and Anthropic may have been less incentivized to pursue efficiency due to their abundant funding and lack of profitability pressure.
-
This development could potentially reshape the AI industry, challenging the dominance of well-funded Western tech companies.
- Nvidia will see a lot less chips.
Thanks DeepSeek.
Hey, can you summarize the core argument of this for me? Specifically with an eye to its broader implications about the AI sector and the technology industry as it currently stands in its current form today.
I’ll analyze the core argument of this piece and its broader implications.
The article’s central argument is that DeepSeek’s recent breakthrough has exposed fundamental flaws in how Western tech companies, particularly OpenAI and Anthropic, have approached AI development. The author contends that these companies weren’t actually innovating but rather perpetuating what they call a “Rot Economy” - a system where companies burn through massive amounts of capital while creating an illusion of inevitable progress and necessary scale.
The key revelation isn’t just that DeepSeek created more efficient models, but that Western AI companies apparently never seriously tried to do so. The author argues this wasn’t because they couldn’t, but because they had no incentive to - they were operating in a system that rewarded spending and scale rather than efficiency and innovation. They had convinced investors and the market that the only path forward was through ever-larger models requiring ever-more expensive infrastructure.
The broader implications about the current state of the technology industry are particularly damning:
- The modern tech industry, particularly in AI, has become fundamentally anti-innovative. Rather than solving real problems efficiently, it’s oriented around maintaining narratives that justify continued capital accumulation.
- Silicon Valley has lost its “scrappy builder” spirit. Instead of finding clever solutions to constraints (as DeepSeek did), major companies simply throw money and resources at problems.
- The venture capital and tech media ecosystem has become complicit in this dynamic, uncritically accepting narratives about necessary scale and inevitable progress while failing to ask basic questions about efficiency and practical value.
The author suggests this represents a deeper crisis in American tech innovation - the industry has become dominated by what they call “management consultants that lead teams of engineers to do things based on vibes” rather than genuine technological problem-solving. This has created an environment where companies compete not on actual innovation or efficiency, but on their ability to raise and spend capital while maintaining compelling narratives about future potential.
The piece concludes that much of the current AI boom has been essentially a confidence game - not because the technology doesn’t work, but because the companies developing it have been more focused on maintaining narratives that justify their continued fundraising than on creating genuinely valuable and sustainable products. DeepSeek’s achievement matters not just because it created more efficient models, but because it demonstrated that the emperor had no clothes - that the massive infrastructure and spending of Western AI companies wasn’t actually necessary for technological progress.
This suggests a potential inflection point for the tech industry, where the narrative-driven, capital-intensive model of innovation may be reaching its limits, potentially forcing a return to more genuine technological problem-solving and efficiency-driven development.
My AI summarizer is superior to your AI summarizer. 😃
-
Even more interesting than this whole kerfuffle, are the conversations taking place around it. They look like driven by AIs, with different agendas, trying to discredit and ridicule whatever opponent they identify, rather than interested in finding out what’s really going on.
Have the AI Wars started already?
PR has sprung into action - Fortune articles, Guardian articles, all running damage limitation.
I wish more people would hold Sam Altman’s feet to the fire, hold him to some semblance of accountability. Because the man has made an entire career of failing upwards, from launching a short lived startup that imploded, to suddenly becoming president of ycombinator, to suddenly being worth billions of dollars, and literally paying people in the third world (with monopoly money, of course) for their eyeballs
Oh and there’s the whole thing where he might have molested his kid sister, which is always seemingly glossed over
Even Ed Zitron, who isn’t afraid to go after someone (see his articles about the guy who destroyed Google search) seems to handle Sam with kid gloves
This is one more time when I wonder how the hell people like Altman make it to adulthood. If I acted that way I would have been punched in the head SO MANY TIMES. I don’t get it. Do these cunts live in bunkers or something? Why don’t people call them out to their face or simply punch with all the force you can get through your body? Fucking insane
Lol, Ed Zirtron is very paralleled.
He’s pessimistic and cynical to the point of being conspiratorial and delusional.
He’s someone to listen to when you want to hear someone go on an unhinged rant about the tech industry, not someone you listen to when you want to actually understand how it works.
I mean look at this trash article, he spends 5000 words saying effectively nothing. Things he could have explained by just linking to pre-existing, better written articles, instead, he rehashes everything in a snarky tone while skipping over some of the most important points (like training through distillation).
He didn’t skip anything my dude. You can say it’s saying nothing, I guess the nvidia stock price doesn’t reflect anything at all. Fucking ai morons, your pillow won’t ever be sentient
Lmfao, so now Lemmy thinks that the stock market is an arbiter of value and truth?
I’m sure you really thought that one through. You seem incapable of understanding nuance or thinking through your points, so maybe sit a few conversations out and reflect on the fact that you immediately defaulted to thinking of me as a moustache twirling villain.
Literally. Al lhe said was China made an LLM with less. The end.
If that was all he said, I would have no issue with that. But no, he spent 10,000 words padding that sentiment out with as much tripe and snake as he could.
This seems unfairly dismissive of someone who’s proved themselves time and again. The article might not be about what you wish it was about but it’s insightful about the topic it covers.
Wanting a better world, and holding up a light to the current one to show the differences between what could be and what is, is not at all what “cynical” means. “Cynical” is the opposite of what you mean. “Pessimistic” or “negative” is definitely more apt, yes.
Also:
Now, you’ve likely seen or heard that DeepSeek “trained its latest model for $5.6 million,” and I want to be clear that any and all mentions of this number are estimates. In fact, the provenance of the “$5.58 million” number appears to be a citation of a post made by NVIDIA engineer Jim Fan in an article from the South China Morning Post, which links to another article from the South China Morning Post, which simply states that “DeepSeek V3 comes with 671 billion parameters and was trained in around two months at a cost of US$5.58 million” with no additional citations of any kind. As such, take them with a pinch of salt.
While there are some that have estimated the cost (DeepSeek’s V3 model was allegedly trained using 2048 NVIDIA h800 GPUs, according to its paper), as Ben Thompson of Stratechery made clear, the “$5.5 million” number only covers the literal training costs of the official training run (and this is made fairly clear in the paper!) of V3, meaning that any costs related to prior research or experiments on how to build the model were left out.
While it’s safe to say that DeepSeek’s models are cheaper to train, the actual costs — especially as DeepSeek doesn’t share its training data, which some might argue means its models are not really open source — are a little harder to guess at. Nevertheless, Thompson (who I, and a great deal of people in the tech industry, deeply respect) lays out in detail how the specific way that DeepSeek describes training its models suggests that it was working around the constrained memory of the NVIDIA GPUs sold to China (where NVIDIA is prevented by US export controls from selling its most capable hardware over fears they’ll help advance the country’s military development):
Here’s the thing: a huge number of the innovations I explained above are about overcoming the lack of memory bandwidth implied in using H800s instead of H100s. Moreover, if you actually did the math on the previous question, you would realize that DeepSeek actually had an excess of computing; that’s because DeepSeek actually programmed 20 of the 132 processing units on each H800 specifically to manage cross-chip communications. This is actually impossible to do in CUDA. DeepSeek engineers had to drop down to PTX, a low-level instruction set for Nvidia GPUs that is basically like assembly language. This is an insane level of optimization that only makes sense using H800s.
Tell me: What should I be reading, instead, if I want to understand the details of this sort of thing, instead of that type of unhinged, pointless, totally uninformative rant about the tech industry?
Wanting a better world, and holding up a light to the current one to show the differences between what could be and what is, is not at all what “cynical” means. “Cynical” is the opposite of what you mean. “Pessimistic” or “negative” is definitely more apt, yes.
No, I said cynical and I meant cynical.
I don’t care that he criticizes the tech industry, I care that he feels the innate need to portray everyone in it as moustache twirling villains, rather than normal people caught up in the same capitalist systems and pressures as everyone else.
Even here, he spends all the article focusing on rumours about Chinese researchers making novel ways to outperform OpenAI and the like, and just makes a dismissive joke about the accusations that they effectively trained their model using OpenAI’s model. Regardless of whether or not you agree with the morality of ignoring copyright to copy a copier, it’s an incredibly important point because that is not a replicable strategy for actually creating new models. But rather than address that in any way, he dismisses it in a paragraph to spend another couple thousand words trying to dunk on the western tech industry in the snarkiest tone possible.
But it’s not just that “they effectively trained their model using OpenAI’s model”. The point Ed goes on to make is why hasn’t OpenAI done the same thing? The marvel of DeepSeek is how much more efficient it is, whereas Big Tech keeps insisting that they need ever bigger data centers.
They HAVE done that. It’s one of the techniques they use to produce things like o1 mini models and the other mini models that run on device.
But that’s not a valid technique for creating new foundation models, just for creating refined versions of existing models. You would never have been able to create for instance, an o1 model from Chat PT 3.5 using distillation.
Look up the definition of the word cynical. It means, more or less, asserting that no one is motivated by sincere integrity. Accusing some specific people of lacking integrity, while holding up others as good examples of integrity that everyone should aspire to, is the opposite of cynicism.
He doesn’t address very much the idea that DeepSeek “distilled” their model from OpenAI’s model and others specifically because that is just a rumor with very minimal evidence for it.
OpenAI has reportedly found “evidence” that DeepSeek used OpenAI’s models to train its rivals, according to the Financial Times, although it failed to make any formal allegations, though it did say that using ChatGPT to train a competing model violates its terms of service. David Sacks, the investor and Trump Administration AI and Crypto czar, says “it’s possible” that this occurred, although he failed to provide evidence.
Personally, I genuinely want OpenAI to point a finger at DeepSeek and accuse it of IP theft, purely for the hypocrisy factor. This is a company that exists purely from the wholesale industrial larceny of content produced by individual creators and internet users, and now it’s worried about a rival pilfering its own goods?
Cry more, Altman, you nasty little worm.
The “rumors” you say he discusses about novel ways the Chinese researchers found to outperform OpenAI are based on an extremely detailed look at their paper and their code, as interpreted by experts. The thing you’re upset he doesn’t discuss is based on rumors. He doesn’t discuss it, except to note that it’s just a rumor but would be funny if it’s true, because he is not doing what you accuse him of.
If you’re upset that he was mean to Sam Altman, so much so that you simply don’t care if he also goes deep into a lot of important details and cares about integrity enough to hate a lot on people who don’t have it, then say so. The things you are accusing him of doing are not true, though, and pretty easy to disprove if you can look honestly at his work.
Look up the definition of the word cynical. It means, more or less, asserting that no one is motivated by sincere integrity. Accusing some specific people of lacking integrity, while holding up others as good examples of integrity that everyone should aspire to, is the opposite of cynicism.
Yeah, I know the definition of the word, and I meant what I said. Stop trying to think I said something else because you disagree.
He is incredibly cynical.
He thinks everyone in the tech industry is a moustache twirling villain and always ascribes malice where incompetence would do. Like I said, he’s who you listen to when you want to hear someone go on an unhinged rant about everyone being evil, not someone with an accurate view of human nature or motivations.
He doesn’t address very much the idea that DeepSeek “distilled” their model from OpenAI’s model and others specifically because that is just a rumor with very minimal evidence for it.
There is very minimal evidence for literally EVERYTHING he writes about in this article. The whole talk of them working around the GPU restrictions also has incredibly minimal evidence and is just a rumour.
Once again, his motivation is not informing you, it’s dunking in the tech industry. It’s literally his entire persona and career.
The “rumors” you say he discusses about novel ways the Chinese researchers found to outperform OpenAI are based on an extremely detailed look at their paper and their code, as interpreted by experts.
No, they’re not. He just portrays it that way because that makes the tech industry sound bad. We flat out do not know how they trained Deepseek’s model.
Once again, I don’t care that he’s mean to any tech titan, I care that he’s misinforming people because it’s the easiest path to dunking on an industry that he has a preexisting vendetta against.
He thinks everyone in the tech industry is a moustache twirling villain and always ascribes malice where incompetence would do.
Here’s him talking about people from the tech industry:
Nevertheless, Thompson (who I, and a great deal of people in the tech industry, deeply respect)
Every single article I’ve read about Gomes’ tenure at Google spoke of a man deeply ingrained in the foundation of one of the most important technologies ever made, who had dedicated decades to maintaining a product with a — to quote Gomes himself — “guiding light of serving the user and using technology to do that.”
Back to quoting you:
There is very minimal evidence for literally EVERYTHING he writes about in this article. The whole talk of them working around the GPU restrictions also has incredibly minimal evidence and is just a rumour.
We flat out do not know how they trained Deepseek’s model.
Correct. We do not know the training data, which makes it silly to decide that it is definitely cribbed from OpenAI’s model. What we do know is how the code works, because it is open and they wrote a paper. What would you consider “evidence,” if not the actual code and then a highly detailed explanation from the authors about how it works, and then some independent testing and interpretation by known experts? Do you want it carved on a golden tablet or something?
I think I’m done with this conversation. You seem very committed to simply repeating your point of view at me. You’ve done that, so I think we can go our separate ways.
Picking out random people to lionize too much while you demonize literally everyone else, is still being cynical.
Correct. We do not know the training data, which makes it silly to decide that it is definitely cribbed from OpenAI’s model. What we do know is how the code works, because it is open and they wrote a paper. What would you consider “evidence,” if not the actual code and then a highly detailed explanation from the authors about how it works, and then some independent testing and interpretation by known experts? Do you want it carved on a golden tablet or something?
Because the paper does not prove what DeepSeek is claiming. The paper outlines a number of clever techniques that might help to improve efficiency, but most researchers are still incredibly skeptical that they would add up to a full order of magnitude less compute power required for training.
Until someone else uses DeepSeek’s techniques to openly train a comparable model off non-distilled data, we have no reason to believe their method is replicable.
Extraordinary claims require extraordinary evidence ( or really just concrete, replicable, evidence), and we don’t have that, at least not yet.