17 may 2026

Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S. (2021) ‘On the dangers of stochastic parrots: Can language models be too big?’, Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623. doi: 10.1145/3442188.3445922.

Bender, Gebru, McMillan-Major and Shmitchell’s ‘On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?’ offers a decisive critique of the dominant trajectory in natural language processing, arguing that the pursuit of ever larger language models produces environmental, epistemic, social and political harms that cannot be justified by benchmark performance alone. The article’s central proposition is that scale is not a neutral technical achievement: it intensifies existing asymmetries in computation, data ownership, linguistic representation and social power. Large language models are trained on vast Internet-derived corpora whose apparent diversity conceals deep exclusions, since the web overrepresents hegemonic voices while marginalised communities are unevenly present, misrepresented, harassed into silence or filtered out through crude dataset-cleaning practices. The authors therefore challenge the assumption that “more data” automatically produces better or fairer systems, showing instead that uncurated scale generates documentation debt, entrenches historical bias and obscures accountability. Their environmental argument is equally significant. Training and deploying very large models requires enormous energy and financial resources, yet the ecological burden is often borne by communities least likely to benefit from English-centred language technologies. This makes model scaling not merely inefficient but ethically distributive: its costs and benefits are unevenly allocated across race, geography, class and language. The paper’s most influential concept, the stochastic parrot, names a system that can produce fluent and apparently coherent language by statistically recombining patterns from training data, without communicative intention, grounded understanding or responsibility for meaning. This distinction between linguistic form and meaning is crucial. The authors argue that language models do not perform genuine natural language understanding; rather, they manipulate form in ways that human readers are predisposed to interpret as meaningful. This creates serious risks when synthetic text is deployed at scale, because biased, abusive, misleading or extremist language may be amplified while appearing authoritative or socially situated. The paper further warns that model outputs can reinforce stereotypes, automate discrimination, support disinformation, enable extremist recruitment, expose memorised private information and misdirect research away from more accountable approaches. Its case against indiscriminate scaling is therefore not anti-technology, but a demand for careful, situated and justice-oriented design. The authors recommend assessing environmental costs before development, curating and documenting datasets, engaging stakeholders through value-sensitive design, conducting pre-mortem risk analysis, and pursuing research directions beyond larger models and artificial leaderboards. Ultimately, the article reframes language technology as a socio-technical system embedded in material infrastructures, political economies and human interpretive practices. Its conclusion is clear: the future of NLP should not be governed by size, speed and competitive spectacle, but by accountability, sustainability, linguistic justice and a rigorous understanding of what language models can and cannot do.