- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
This seems to be based on a racist assumption. Why is speaking improper English labelled as “African American english”?. I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech, to compare to the assumptions made for the African American english version.
Speaking with slang / incorrect grammar is of course, in general, inversely correlated with education level and/or preference for shorthand forms of speech over writing/speaking the full grammatically correct form. The LLM is saying speaking in slang = stupid/lazy.
The researcher is labelling slang as specifically African American speak, therefore interpreting the LLM response as assuming African Americans are stupid/lazy.
This [the article?] seems to be based on a racist assumption.
No, it isn’t based on an assumption. The written features that were analysed are associated with AAE. From the article:
- use of invariant ‘be’ for habitual aspect;
- use of ‘finna’ as a marker of the immediate future;
- use of (unstressed) ‘been’ for SAE [standard American English] ‘has been’ or ‘have been’ (present perfects);
- absence of the copula ‘is’ and ‘are’ for present-tense verbs;
- use of ‘ain’t’ as a general preverbal negator;
- orthographic realization of word-final ‘ing’ as ‘in’;
- use of invariant ‘stay’ for intensified habitual aspect; and
- absence of inflection in the third-person singular present tense.
Why is speaking improper English labelled as “African American english”?.
Flip the question - why are those features associated with AAE labelled “improper English”?
I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech
The article tackles this: “Furthermore, we present experiments involving texts in other dialects (such as Appalachian English) as well as noisy texts, showing that these stereotypes cannot be adequately explained as either a general dismissive attitude towards text written in a dialect or as a general dismissive attitude towards deviations from SAE”
I always love when cough “educated” people (usually just what they like to say when they mean “not black”) go on about how “black people don’t speak proper English!” because certain vowels can be dropped here or there, grammar shifts, the works. Most of us have heard AAE (also maybe heard it called “Ebonics” if you’re a little older) at one point or another, and likely don’t have an issue understanding what anyone is saying. A few things that skew more metaphorical or slang words might slip by but you get the gist.
That’s the point of language. Convey information. If the information is conveyed, then language has done its job. Yay language.
If anyone wants to continue saying “it’s not PROPER English” well… I have bad news for you. Neither is any other modern form of English. So many words have been borrowed, or stolen, sentence structures have changed, entire words change meaning. And that’s just in the last 100 years.
English is an amalgamation of many different root languages, and has so many borrowed words and phrases, along with nearly every other modern language, can any of them still be said to be “proper”?
When I think of the difference between “proper English” and “improper English” I’m reminded of My Fair Lady. “The rain in Spain stays mainly on the plain” Eliza vs Henry Higgins (or 'enry 'iggins if you’re feeling improper)
I agree with most of what you said so I’ll focus only on a specific point, OK?
That’s the point of language. Convey information. If the information is conveyed, then language has done its job. Yay language.
There’s another point of language, besides conveying information, that is relevant here: identity. Different people speak different varieties because this allows them to express “this is who I am, I speak like my peers”.
That’s why AAE speakers use their varieties on first place - it’s a way to tell the world “I’m black, this is who I am, I identify myself with other black people”. And also why those varieties get such a stigma - because in USA, they want people to feel bad for being who they are, if they’re black. (NB: I’m not from USA, but this is so fucking obvious that even an external observer gets it.)
“educated” people (usually just what they like to say when they mean “not black”)
Please don’t be racist. Education level is unconnected to race.
Please don’t call other people racist, unless slander is your thing.
The whole point of that was to hilight how racist people like to call someone “educated” when what they really mean is “wow you sound white”
It was mocking the exact kind of person you seem to believe I am.
Habitual be is one of the greatest gifts any language has ever received and we are all richer for it.
Of course there is a proper english. As defined by standard grammatical rules of the English language. A dialect is a variation upon that. I am not saying “black people don’t speak proper English”. There are plenty of black people who speak proper English, the same as there are plenty of white people who speak proper English and plenty of white people who speak dialects. I am saying that any and all dialects are not formal English, by definition of what the grammatical rules of the language are
What is the governing body of your alleged “proper English”?
You do understand that English has grammatical rules, right? That’s not up for debate. It’s not controversial to say that standard English would be following the standard spellings and grammatical rules for the language. If you genuinely don’t know what those are, then please buy a dictionary or hire an English teacher. You could also look up what the rules are for the English language exam required to become a citizen. These are taught in schools around the world. About the extent that the rules vary is if you are learning American English or British English. Each of those two has the backing of a nation state
I have a degree in linguistics. The most important thing it taught me is that there is a widely believed fiction, almost like a religion, underlying prescriptivist grammar. For the sake of social advancement, if you have both the means and the talent, it’s generally necessary to learn a list of arbitrary but extremely complicated prestige markers for your language, to earn the approval of the self-appointed priestly caste of grammarians, in order to rub shoulders with the rich and powerful. An overly complex shibboleth.
It’s a mechanism to oppress the lower classes while maintaining the pretense of pure meritocracy, by declaring arbitrarily that the dialect which is already spoken and written in the homes of the upper class children is proper, and all other dialects are improper, then implying that the “failure” of lower class children to acquire the prestige markers is an intellectual shortcoming, rather than the absence of privilege.
Can you buy books and hire tutors to learn these prestige markers? Of course. Is there general agreement among members of this cult about what their own rules are? Sure. If you choose not to use them, is your English “improper”? Absolutely not. It’s different but equal, as long as your meaning is clear. I would wager that more than 90% of people do not go even one day without saying or writing some example of “improper” English, which is nevertheless understood perfectly well by the recipient. Successful transmission of the message is the only true test of linguistic legitimacy. Everything else is performative.
By the way, while it doesn’t change much about this more fundamental basis for my opinion that “standard English” is an offensive fiction, neither British nor American English actually have the backing of a nation state. This is in contrast to, for example, French, which does. According to this article on language regulators, “The English language has never had a formal regulator anywhere, outside of private productions such as the Oxford English Dictionary.” Prompting my rhetorical question to you earlier: Who is the governing body? There is none.
I would argue there is a distinction between the prestige markers you’re talking about and the more general grammatical rules that are followed. You can use grammatically correct English without subscribing to the over complexity that is, yes, possible.
A dialect can work locally, but there is a reason why many who have travelled abroad find themselves deliberately softening their dialect and shifting closer to proper English /queen’s English / whatever we’re calling it. It’s because their dialect is hard to understand for people who are not used to it. My girlfriend always comments how my dialect comes out of the woodwork when I’m back home with family.
Saying “I be lit” or “gimme a pint guvna” just will not be understood outside of the area or cultural group that it is spoken in. Therefore for business and/or travel you need a more standard form of English that everyone can understand. That’s not some overly complex cultural dance we play to keep the powerful powerful, that’s just basic requirements of communication.
There is no set of standard grammatical rules for any language. There are current standards for existing dialects, and they change all the time. The strict and steadfast rules of a Londoner’s English are different from those of a Bostonian’s English or Californian’s English. And go back fifty years and those rules in all of those places were different still. Your prescriptivist nonsense is not based on material reality, and you are using it to justify nothing short of your racist prejudices
Did they test jive?
No, only grammar.
I don’t know or hang around with many black people, but I do hear all of the stuff pointed out here on the regular any time I see a group of rednecks at the local farm supply.
Plus, internet meme culture has vastly changed the language landscape where, for example, phrases like “you don’t think it be like it is, but it do” are used by people from all walks of life.
A lot of AAE features are actually shared with Dixie English as spoken by non-black people. So I’m not surprised that you hear “rednecks” using a few of them.
The association between those features and African-American speakers is still there, though. If you see someone on the internet saying stuff like “I be working”, the typical person won’t picture a redneck, they’re going to picture a black person, you know?
The internet does seem to have changed the language landscape a fair bit, but I think that those features slowly leaking into the speech of non-AAE speakers is more about social changes than just tech.
Really good reply, thanks for the effort you put in. Its good to see they did compare with other dialects. It’s interesting that the same bias was not seen.
I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent). A dialect by definition is an adaptation of the language from the standard ‘proper’ grammatical rules.
Sorry beforehand for the wall of text.
I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent).
The reason why AAE is considered less acceptable than SAE (Standard American English) is not “within” the AAE varieties. It’s solely social factors - people point to “he is working” and say “this is right”, then they point at “he working” and say “this is wrong”.
Dictionaries are only part of that. We (people in general) assign authoritativeness to them to dictate what’s the standard is supposed to be, but that authority is not intrinsic either. For example if people mass decided to ditch the Oxford English dictionary, suddenly it stops being a reference to what’s “correct” vs. “wrong” English.
A dialect by definition is an adaptation of the language from the standard ‘proper’ grammatical rules.
Emphasis mine. That’s incorrect.
There are multiple definitions of dialect. Plenty focus on mutual intelligibility - if speakers of two varieties can communicate just fine, their varieties are a dialect of the same language, independently of what you consider standard.
The nearest of what you’re saying would be the ones referring to the standard as an asbau variety, with the dialects being the varieties “roofed” by that standard, but not undergoing the same process by themselves.
However, not even in the later the dialect needs to be “an adaptation” of the standard. Sometimes both originated independently from the same source, like French (standard) and Norman (dialect), both from Late Latin; sometimes the standard itself is an “adaptation” of a dialect, like Standard Italian (basically a spin-off of the Tuscan dialect). And sometimes the standard was formed from multiple dialects, like Standard German did.
Focusing on AAE, it’s disputed where it comes from, but it’s certainly not from SAE. Some claim that it’s a divergent form of Dixie English, some claim that it’s a decreolised creole, but in neither case the origin is SAE, they simply developed side-to-side.
Why is speaking improper English labelled as “African American english”?.
Oh no, you’re in the picture. It’s a real dialect, just as valid as what they speak on the BBC, which I’m guessing is itself different from how you speak.
To be clear, I don’t think you meant to be unkind here. I’m not trying to make you feel bad.
Yeah it turns out when your entire tech industry is dominated by cishet white techbros and the entire foundation of their education and the production of such models is based on that then you get racist as fuck outcomes from any given algorithm that is a product of that same set of normative standards.
If you have the time I highly recommend reading Palo Alto by Malcolm Harris, it’s a great primer on how all this shit got started and why we should frankly just burn Silicon Valley to the ground.
Although nonstandard English and pidgins often demonstrate the same level of nuance and complexity as standard English, it’s very common for there to be negative stereotypes. One has to wonder whether the LLMs generated from (stolen en masse) written output say as much about us as they do about their creators.
Pretty much, it was trained on human writing, then people are all surprised when it has human biases.
An LLM needs to evaluate and modify the preliminary output before actually sending it. In the context of a human mind that’s called thinking before opening your mouth.
Who among us couldn’t benefit from a little more of that?
Humans aren’t always very good at that, and LLMs were trained on stuff written by humans, so here we are.
Exciting new product from the tech industry: Fruit from the poisoned tree!
LLMs are racist… Pay us 59.99 in 3 easy payments to find out how! I love paywalled articles.
And don’t worry, the people that did the research and wrote the article, and the person that reviewed the article aren’t going to see a single cent of it.
shit goes in, shit comes out
Would you like the opportunity to explain why African American English is “shit” and comparable to racism?
People be downvoting things or no reason 😑
No, the ‘shit’ is the prejudice in the training data that claims negative stereotypes about people who speak in African American English.
i meant shit as in racist internet writings, which the llms are taught with
Ohh sorry. Like the model was trained on bad inputs
<- the input
Crap, I left my $199 yearly subscription info inside my butler’s Lamborghini. Could your personal valet sky-write your login credentials for nature.com above my Tuscan estate? Specifically, above the Eastern alpaca pens—this Murano glass monocle of mine isn’t a bi-focal. Cheers.
Brilliant, ol’ sport! There’s a mallet and horse waiting for you at West Egg this weekend—I simply won’t take no for an answer.
deleted by creator
Okay this has to be a new hexbear site tagline
Excuse me, but it’s only 3.90 for each issue…
Of course I get my money’s worth by reading every single one
The actual scientific article is open-access: https://www.nature.com/articles/s41586-024-07856-5
References weren’t paywalled, so I assume this is the paper in question:
Hofmann, V., Kalluri, P.R., Jurafsky, D. et al. AI generates covertly racist decisions about people based on their dialect. Nature (2024).
Abstract
Hundreds of millions of people now interact with language models, with uses ranging from help with writing1,2 to informing hiring decisions3. However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans4,5,6,7. Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement8,9. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models’ overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.
Thanks, and yes, you’re correct