I work in higher education making online courses. It’s really stressing everyone out.
Stressing out in what way? For the viability of your job being lost to this ai bullshit? For the outcomes of students who will just try to chatGPcheaT their way through everything?
Likely both. I used to be involved in creating educational material for employers. First, voiceover artists were replaced with shockingly low quality AI models. This was about two years ago. Training prices didn’t drop and no employers complained.
The industry experts we’d pay for consultation were increasingly replaced with ChatGPT queries. Information was sometimes wrong but the employers purchasing these trainings would catch and correct it (for free) in the proofing process. Prices stayed the same, employers still didn’t complain.
After launching trainings, we’d monitor engagement. When asking relatively simple questions that anyone who paid attention would be able to answer immediately, the average response time was initially about 2-3 minutes, then about 60-90s for subsequent questions. They were likely finding ChatGPT and using it to answer the questions. We shared these findings and, you guessed it, employers didn’t care.
AI is probably the worst invention sense the atom bomb.
Congrats you are now the old person saying “young bad”
BRO, what an opinion. Love how out of touch and sarcastic you are. Really adds to the “I am not aware of the damage tech billionaires are vomiting out into society” vibe.
If the goal of school was to learn then people wouldn’t cheat the learning process.
When the goal is a piece of paper that makes you more likely to be employed then people will find the easiest path to get the piece of paper.
Congrats! You are now the tech bro implying “Chatgpt good”
Bet the squirt of dopamine you got hitting send was orgasmic
That’s a clean gotcha
Same goes for you
I’m sorry you don’t like being reminded you are old.
My point is that it’s a somewhat outdated skill, and these kids have enough to figure out without the encumbrance of a paper dictionary. Most of my kids have never used one before, and yes, I can show them how to use it, but it’s not a functional testing accommodation. Testing accommodations should not include learning skills that are only tangentially related, especially not when there is a reasonable alternative.
Going to have generations of people unable to think analytically or creatively, and just as bad, entering fields that require a real detailed knowledge of the subject and they don’t. Going to see a lot of fuck ups in engineering, medicine, etc because of people faking it.
Why do you need to learn reams of facts when you can get an answer in the fraction of a second ? Seems pointless anyway.
Lmao. I’m guessing you don’t work in any of those fields. Got some bad news for ya bud. It’s been that way for decades. Probably centuries.
Don’t tell them it applies pretty damn perfectly to journalism and online commentators that both heavily shape their worldview even indirectly (because even if you don’t believe it your homies will and you get peer pressured) because they’ll go into a loop.
I am having flashbacks to the scene in Idiocracy where the doctor is talking about his wife.
She’s a pilot now.
…or will be in 480 years.
We’ve been needing to rework education for years now anyway. At least this will force the teachers to change & adapt, whether they like it or not.
Oh no, maybe teachers will have to put effort into their students beyond assigning homework that an AI can do.
Do you believe that the point of such assignments is because the teacher desires to read a couple dozen nigh-identical essays on the topic at hand?
This is my point exactly. They don’t desire that. Nor should they. And so they shouldn’t do that.
Can you conceive of other motivations?
Oh, do your regional school districts let teachers design their own curriculum?
My point exactly. You won’t find good teachers in public education.
You will, they’re just overworked and undersupported.
Yes, they might have been good teachers if they hadn’t decided to support a mediocre institution that prevents teaching.
So they should go into private education do only the rich or lucky can get a good education?
Or do you think that teachers are the ones directing public education policy?
Or are you saying that somehow not participating in a flawed system will somehow fix it?
The entire purpose behind policies that hinder quality education is to drive skilled educators away. To choose not to participate is the best way to expedite the goals of those who benefit from poor quality education.
We cannot keep with this horrible system that does nothing but torture kids and teachers alike. It’s not working. It’s clear it’s not working. Change will never come from the top, because like you said, they will never voluntarily change it. So it must come from the bottom.
So in the mean time we should just abandon students to the people fucking up the system?
One can’t just snap their fingers and make everything better, reality does not work that way.
You know that this is a global forum, right?
The article is specifically about the USA. That’s what I’m talking about too.
Is it that uniformly bad? I guess the exceptions to the rule stand out starkly then.
When I asked him why he had gone through so much trouble to get to an Ivy League university only to off-load all of the learning to a robot, he said, “It’s the best place to meet your co-founder and your wife.”
Yikes.
The fact people can’t even use their own common sense on Twitter without using AI for context shows we are in a scary place. AI is not some all knowing magic 8 ball and puts out a ton of misinformation.
I’m thinking the only way people will be able to do schoolwork without cheating now is going to be to make them sit in a monitored room and finish it there.
I really hope I’m dead before we have androids.
How is this kind of testing relevant anymore? Isn’t it creating an unrealistic situation, given the brave new world of AI everywhere?
Because it test what you actually retained, not what you can convince an AI to tell you.
But what good is that if AI can do it anyway?
That is the crux of the issue.
Years ago the same thing was said about calculators, then graphing calculators. I had to drop a stat class and take it again later because the dinosaur didn’t want me to use a graphing calculator. I have ADD (undiagnosed at the time) and the calculator was a big win for me.
Naturally they were all full of shit.
But this? This is different. AI is currently as good as a graphing calculator for some engineering tasks, horrible for some others, excellent at still others. It will get better over time. And what happens when it’s awesome at everything?
What is the use of being the smartest human when you’re easily outclassed by a machine?
If we get fully automated yadda yadda, do many of us turn into mush-brained idiots who sit around posting all day? Everyone retires and builds Adirondack chairs and sips mint juleps and whatever? (That would be pretty sweet. But how to get there without mass starvation and unrest?)
Alternately, do we have to do a Butlerian Jihad to get rid of it, and threaten execution to anyone who tries to bring it back… only to ensure we have capitalism and poverty forever?
These are the questions. You have to zoom out to see them.
but what good is that if AI can do it anyway?
It can’t. It just fucking can’t. We’re all pretending it does, but it fundamentally can’t.
Creative thinking is still a long way beyond reasoning as well. We’re not close yet.
It can and it has done creative mathematical proof work. Nothing spectacular, but at least on par with a mathematics grad student.
Specialized AI like that is not what most people know as AI. Most people reffer to it as LLMs.
Specialized AI, like that showcased, is still decades away from generalized creative thinking. You can’t ask it to do a science experiment with in a class because it just can’t. It’s only built for math proof.
Again, my argument is that it won’t never exist.
Just that it’s so far off it’d be like trying to regulate smart phone laws in the 90s. We would have only had pipe dreams as to what the tech could be, never mind its broader social context.
So tall to me when it can, in the case of this thread, clinically validated ways of teaching. We’re still decades from that.
Show me a human that can do it.
The faulty logic was supported by a previous study from 2019
This directly applies to the human journalist, studies on other models 6 years ago are pretty much irrelevant and this one apparently tested very small distilled ones that you can run on consumer hardware at home (Llama3 8B lol).
Anyway this study seems trash if their conclusion is that small and fine-tuned models (user compliance includes not suspecting intentionally wrong prompts) failing to account for human misdirection somehow means “no evidence of formal reasoning”. Which means using formal logic and formal operations and not reasoning in general, we use informal reasoning for the vast majority of what we do daily and we also rely on “sophisticated pattern matching” lmao, it’s called cognitive heuristics. Kahneman won the Nobel prize for recognizing type 1 and type 2 thinking in humans.
Why don’t you go repeat the experiment yourself on huggingface (accounts are free, over ten models to test, actually many are the same ones the study used) and see what actually happens? Try it on model chains that have a reasoning model like R1 and Qwant and just see for yourself and report back. It would be intellectually honest to verify things since we’re talking about critical thinking in here.
Oh add a control group here, a comparison with average human performance to see what the really funny but hidden part is. Pro-tip: CS STEMlords catastrophically suck when larping being cognitive scientists.
So you say I should be intellectually honest by doing the experiment myself, then say that my experiment is going to be shit anyways? Sure… That’s also intellectually honest.
Here’s the thing.
My education is in physics, not CS. I know enough to know what I try isn’t going to be really valid.
But unless you have peer reviewed searches to show otherwise, because I would take your home grown experiment to be as valid as mine.
And here’s experimental verification that humans lack formal reasoning when sentences don’t precisely spell it out for them: all the models they tested except chatGPT4 and o1 variants are from 27B and below, all the way to Phi-3 which is an SLM, a small language model with only 3.8B parameters. ChatGPT4 has 1.8T parameters.
1.8 trillion > 3.8 billion
ChatGPT4’s performance difference (accuracy drop) with regular benchmarks was a whooping -0.3 versus Mistral 7B -9.2 drop.
Yes there were massive differences. No, they didn’t show significance because they barely did any real stats. The models I suggested you try for yourself are not included in the test and the ones they did use are known to have significant limitations. Intellectual honesty would require reading the actual “study” though instead of doubling down.
Maybe consider the possibility that a. STEMlords in general may know how to do benchmarks but not cognitive testing type testing or how to use statistical methods from that field b. this study being an example of a few “I’m just messing around trying to confuse LLMs with sneaky prompts instead of doing real research because I need a publication without work” type of study, equivalent to students making chatGPT do their homework c. 3.8B models = the size in bytes is between 1.8 and 2.2 gigabytes d. not that “peer review” is required for criticism lol but uh, that’s a preprint on arxiv, the “study” itself hasn’t been peer reviewed or properly published anywhere (how many months are there between October 2024 to May 2025?) e. showing some qualitative difference between quantitatively different things without showing p and using weights is garbage statistics f. you can try the experiment yourself because the models I suggested have visible Chain of Thought and you’ll see if and over what they get confused about g. when there are graded performance differences with several models reliably not getting confused at least more than half the time but you say “fundamentally can’t reason” you may be fundamentally misunderstanding what the word means
Need more clarifications instead of reading the study or performing basic fun experiments? At least be intellectually curious or something.
It’s already capable of doing a lot, and there is reason to expect it will get better over time. If we stick our fingers in our ears and pretend that’s not possible, we will not be prepared.
If you read, it’s capable of very little under the surface of what it is.
Show me one that is well studied, like clinical trial levels, then we’ll talk.
We’re decades away at this point.
My overall point of it’s just as meaningless to talk about now as it was in the 90s. Because we can’t convince of what a functioning product will be, never mind it’s context I’m a greater society. When we have it, we can discuss it then as we have something tangible to discuss. But where we’ll be in decades is hard to regulate now.
Alpha Fold. We’re not decades away. We’re years at worst.
If you want to compare a calculator to an LLM, you could at least reasonably expect the calculator result to be accurate.
Why. Because you put trust into the producers of said calculators to not fuck it up. Or because you trust others to vet those machines or are you personally validating. Unless your disassembling those calculators and inspecting their chips sets your just putting your trust in someone else and claiming “this magic box is more trust worthy”
A combination of personal vetting via analyzing output and the vetting of others. For instance, the Pentium calculation error was in the news. Otherwise, calculation by computer processor is understood and the technology is acceptable to be used for cases involving human lives.
In contrast, there are several documented cases where LLM’s have been incorrect in the news to a point where I don’t need personal vetting. No one is anywhere close to stating that LLM’s can be used in cases involving human lives.
How exactly do you think those instances got into the news in the first place. I’ll give you a hint. People ARE vetting them and reporting when they’re fucking up. It is a bias plain and simple. People are absolutely using Ai in cases involving humans.
https://www.nytimes.com/2025/03/20/well/ai-drug-repurposing.html
https://www.advamed.org/2024/09/20/health-care-ai-is-already-saving-lives/
https://humanprogress.org/doctors-told-him-he-was-going-to-die-then-ai-saved-his-life/
Your opinions are simply biased and ill-informed. This is only going to grow and become a larger and larger dataset. Just like the auto driving taxis. Everyone likes to shit on them while completely ignoring the truth and statistics. All while acting like THIS MOMENT RIGHT NOW is the best they’re ever going g to get.
It often is. I’ve got a lot of use out of it.
Because if you don’t know how to tell when the AI succeeded, you can’t use it.
To know when it succeeded, you must know the topic.
The calculator is predictable and verifiable. LLM is not
I’m not sure what you’re implying. I’ve used it to solve problems that would’ve taken days to figure out on my own, and my solutions might not have been as good.
I can tell whether it succeeded because its solutions either work, or they don’t. The problems I’m using it on have that property.
The problem is offloading critical thinking to a blackbox of questionably motivated design. Did you use it to solve problems or did you use it to find a sufficient approximation of a solution? If you can’t deduce why the given solution works then it is literally unknowable if your problem is solved, you’re just putting faith in an algorithm.
There are also political reasons we’ll never get luxury gay space communism from it. General Ai is the wet dream of every authoritarian: an unverifiable, omnipresent, first line source of truth that will shift the narrative to whatever you need.
The brain is a muscle and critical thinking is trained through practice; not thinking will never be a shortcut for thinking.
That says more about you.
There are a lot of cases where you can not know if it worked unless you have expertise.
This still seems too simplistic. You say you can’t know whether it’s right unless you know the topic, but that’s not a binary condition. I don’t think anyone “knows” a complex topic to its absolute limits. That would mean they had learned everything about it that could be learned, and there would be no possibility of there being anything else in the universe for them to learn about it.
An LLM can help fill in gaps, and you can use what you already know as well as credible resources (e g., textbooks) to vet its answer, just as you would use the same knowledge to vet your own theories. You can verify its work the same way you’d verify your own. The value is that it may add information or some part of a solution that you wouldn’t have. The risk is that it misunderstands something, but that risk exists for your own theories as well.
This approach requires skepticism. The risk would be that the person using it isn’t sufficiently skeptical, which is the same problem as relying too much on their own opinions or those of another person.
For example, someone studying statistics for the first time would want to vet any non-trivial answer against the textbook or the professor rather than assuming the answer is correct. Answer comes from themself, the student in the next row, or an LLM, doesn’t matter.
What do you think testing is for? It’s to show what you know/have learned
Education and learning are two different things. School tests are to repeat back what has been educated to you. Meaningful learning tends to be internally motivated and AI is unlikely to fulfill that aspect.
This is the purpose of essay questions.
Produce army of people that rely on corporate products to stay alive. What can go wrong ?
I teach at a community college. I see a lot of AI nonsense in my assignments.
So much so that I’m considering blue book exams for the fall.
For anyone who is also not from the US:
A blue book exam is a type of test administered at many post-secondary schools in the United States. Blue book exams typically include one or more essays or short-answer questions. Sometimes the instructor will provide students with a list of possible essay topics prior to the test itself and will then choose one or let the student choose from two or more topics that appear on the test.
EDIT, as an extra to solve the mystery:
Butler University in Indianapolis was the first to introduce exam blue books, which first appeared in the late 1920s.[1] They were given a blue color because Butler’s school colors are blue and white; therefore they were named “blue books”.
Importantly it is hand written, no computers.
Biggest issue is that kids’ handwriting often sucks. That’s not a new problem but it’s a problem with handwritten work.
Man, the US has a handwriting problem. It sucks sooo much. In other countries it seems to be only doctors, but in the US? Fucking everyone.
Give them typewriters
There is test-taking software that locks out all other functions during the essay-writing period. Obviously, damn near anything is hackable, but it’s non-trivial, unlike asking ChatGPT to write your essay for you in the style of a B+ high student. There is some concern about students who learn differently or compose less efficiently, but as father to such a student, I’m still getting to the point where I’m not sure what’s left to do other than sandbox “exploitable” graded work in a controlled environment.
I love this idea.
Speaking from a life of dyspraxia - no, not everyone with sucky handwriting is lazy, many of us would spend 95% of our capacity on making the writing legible and be challenged to learn the actual topic as a result.
This is why we have accommodations offices at colleges.
No problem giving an alternative for those who need it.
In the 1980s that wasn’t really a thing. Besides, it taught me a valuable skill: I partnered with someone who was good at taking notes and I was good at paying attention without taking any notes - she, too, had a problem understanding what she was writing down while writing it down, but took beautiful copies of the lecture. So, afterwards we’d get together and I’d explain her notes to her - which helped me to cement the concepts in my head, at least long enough to get through the exam, and she got her notes explained.
deleted by creator
keep at it. it is worth the pain.
Computers with some encyclopedia, but no GPTs are fine, no?
If a kid can write and train a mini-GPT trainable on that encyclopedia, then maybe they deserve the mark for desperation and ingenuity and being a fucking new Leonardo.
GPTs are fine, if you learn to disrespect their output and fix it before presenting it as your own.
Actually, taught that way, GPT may be a tool for teaching critical thinking - if the professors aren’t too lazy to mark down the garbage output.
deleted by creator
A lot of people “have trouble getting started” - in all kinds of endeavors. Once you get them rolling, they can see the pattern and do it for themselves next time. If the AI glop gets lucky and copied decent argument from beginning to end (something I’ve seen it fail spectacularly at many times), then that can help jumpstart people who are stuck, but only if they can recognize when it’s just a bunch of glop.
Really, if would be better for them to read a bunch of samples for themselves (which is what the AI does) and hopefully they can get the pattern. What I think is a horrible approach is to sit in a lecture hall and listen to a little guy down front drone in a monotone about the theory of what you are supposed to do, then try to synthesize from the fragments of what you understood from that what is expected. Samples to work from are much more efficient.
Open book and calculators would seem reasonable. No communication or searching devices.
No communication - of course, but about search - I don’t think having a Wikipedia snapshot with search is bad.
Oh. Hate that. You have a list of subjects, prepare for them as good as you can, then get one you know and one you don’t, start with one you don’t know - not be in time or mood to finish one you know, get something shitty, the other way around - do the one you know and then be interrupted while you just probably remembered something about the one you don’t, get something shitty.
I have a friend who has taught Online university writing for the past 10 years. Her students are now just about 100% using AI - her goal isn’t to get them to stop, it’s to get them to recognize what garbage writing is and how to fix it so it isn’t garbage anymore.
I teach Philosophy.
I need them to think for themselves, which just isn’t happening if they turn in work that isn’t theirs.
So, I’m pretty harsh on anyone using AI. Even if it’s for a discussion post, I’m reporting it to the Academic Integrity office.
Fair distinction. Arguably, writing isn’t about thinking.
her goal isn’t to get them to stop, it’s to get them to recognize what garbage writing is and how to fix it so it isn’t garbage anymore.
Sadly, that may be the best we can hope for.
her goal isn’t to get them to stop, it’s to get them to recognize what garbage writing is and how to fix it so it isn’t garbage anymore.
I wish English teachers did this instead of… Whatever TF they’re doing instead.
This is something they should’ve been doing all along. Long before the invention of LLMs or computers.
This is the inevitable result of “No Child Left Behind” linking school funding to how students performed on standardized tests. American schools haven’t been about education for the last 20+ years. They are about getting as much funding as possible.
American schools haven’t been about education for the last 20+ years. They are about getting as much funding as possible.
Not just American schools, all the way back to Leonardo DaVinci and beyond it has been all about the funding.
deleted by creator
😮💨😮💨
Honest question: how do we measure critical thinking and creativity in students?
If we’re going to claim that education is being destroyed (and show we’re better than our great^n grandparents complaining about the printing press), I think we should try to have actual data instead of these think-pieces and anecdata from teachers. Every other technology that the kids were using had think-pieces and anecdata.
As far as I can tell, the strongest data is wrt literacy and numeracy, and both of those are dropping linearly with previous downward trends from before AI, am I wrong? We’re also still seeing kids from lockdown, which seems like a much more obvious ‘oh that’s a problem’ than the AI stuff.
Honest question: how do we measure critical thinking and creativity in students?
The only serious method of evaluating critical thinking and creativity is through peer evaluation. But that’s a subjective scale thick with implicit bias, not a clean and logical discrete answer. It’s also not something you can really see in the moment, because true creativity and critical thinking will inevitably produce heterodox views and beliefs.
Only by individuals challenging and outperforming the status quo to you see the fruits of a critical and creative labor force. In the moment, these folks just look like they’re outliers who haven’t absorbed the received orthodoxy. And a lot of them are. You’ll get your share of Elizabeth Holmes-es and Sam Altmans alongside your Vincent Van Goghs and Nikolai Teslas.
I think we should try to have actual data instead of these think-pieces and anecdata from teachers.
I agree that we’re flush with think-pieces. Incidentally, the NYT Op-Ed section has doubled in size over the last few years.
But that’s sort of the rub. You can’t get a well-defined answer to the question “Is Our Children Creative-ing?” because we only properly know it by the fruits of the system. Comically easy to walk into a school with a creative writing course and scream about how this or that student is doing creativity wrong. Comically easy to claim a school is Marxist or Fascist or too Pro/Anti-Religion or too banal and mainstream by singling out a few anecdotes in order to curtail the whole system.
The fundamental argument is that this kind of liberal arts education is wasteful. The output isn’t steady and measureable. The quality of the work isn’t easily defined as above or below the median. It doesn’t yield real consistent tangible economic value. So we need to abolish it in order to become more efficient.
And that’s what we’re creating. A society that is laser-focused on making economic numbers go up, without stopping to ask whether a larger GDP actually benefits anyone living in the country where all this fiscal labor is performed.
I think it’s fine for this to be poorly defined; what I want is something aligned with reality beyond op-eds. Qualitative evidence isn’t bad; but I think it needs to be aggregated instead of anecdoted. Humans are real bad at judging how the kids are doing (complaints like the OP are older than liberal education, no?); I don’t want to continue the pattern. A bunch of old people worrying too much about students not reading shakespear in classes is how we got the cancel culture moral panic - I’d rather learn from that mistake.
A handful of thoughts: There are longitudinal studies that interview kids at intervals; are any of these getting real weird swings? Some kids have AI earlier; are they much different from similar peers without? Where’s the broad interviews/story collection from the kids? Are they worried? How would they describe their use and their peers use of AI?
A bunch of old people worrying too much about students not reading shakespear in classes is how we got the cancel culture moral panic - I’d rather learn from that mistake.
The “old people complaining about Shakespeare” was the thin end of the wedge intended to defund and dismantle public education. But the leverage comes from large groups of people who are sold the notion that children are just born dumb or smart and education has no material benefit.
A lot of this isn’t about teaching styles. It’s about public funding of education and the neo-confederate dream of a return to ethnic segregation.
There are longitudinal studies that interview kids at intervals; are any of these getting real weird swings?
A lot of these studies come out of public sector federal and state education departments that have been targeted by anti-public education lobbying groups. So what used to be a wealth of public research into the benefits of education has dried up significantly over the last generation.
What we get instead is a profit-motivated push for standardized testing, lionized by firms that directly benefit from public sector purchasing of test prep and testing services. And these tend to come via private think-tanks with ties back to firms invested in bulk privatization of education. So good luck in your research, but be careful when you see something from CATO or The Gates Foundation, particularly in light of the fact that more reliable and objective data has been deliberately purged from public records.