Menu

Large language models: challenges and opportunities

by | Oct 9, 2024 | AI Safety

Large language models (LLMs) have revolutionised natural language processing (NLP) with their ability to generate human-like text at an unprecedented scale. These models enable advancements in fields ranging from customer service to education and content generation. However, the widespread adoption of LLMs also introduces significant challenges, including environmental impacts, biases in training data, and ethical concerns.

The article “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” by Bender et al. outlines these concerns, highlighting the drawbacks associated with LLMs. While the authors raise important points, they also overlook some broader perspectives, particularly around the potential benefits of LLMs. In this article, I critique their work, offering a balanced view of the opportunities and challenges of LLMs.

Setting the context

The article by Bender et al. was submitted to the ACM Conference on Fairness, Accountability, and Transparency in 2020 and became widely known due to the controversies surrounding the authors, particularly the dismissal of Timnit Gebru from Google. The paper critiques LLMs across three main dimensions: their environmental and financial costs, inherent biases, and their potential to mislead due to lack of true comprehension.

Bender et al. emphasise that the size and complexity of LLMs, which require billions of parameters, result in significant environmental and financial burdens. They argue that the outputs of these models reflect the biases present in their training data, which is often sourced from the internet and hence includes gender, racial, and socio-economic biases. Additionally, the paper critiques the notion that LLMs truly “understand” language, instead labeling them as “stochastic parrots” that merely mimic language patterns without comprehension.

The article calls for more responsible approaches to NLP research, including energy-efficient models, careful dataset curation, and greater attention to ethical concerns.

While Bender et al. make valuable points, there are areas that could benefit from further analysis. In some instances, their critique conflates personal opinions with empirical evidence, and the paper does not sufficiently explore contrary perspectives, particularly concerning the potential advantages of LLMs.

Environmental and financial costs

Bender et al. rightly point out the significant environmental and financial costs associated with LLMs, including the large carbon footprint generated by training these models. They advocate for energy-efficient AI and suggest that LLMs disproportionately benefit privileged communities while marginalising others.

However, the assertion that larger models are inherently more resource-intensive is an oversimplification. Research has shown that larger models, such as the Switch Transformer, can actually be more computationally efficient due to their architecture, offsetting some of their size-related costs. The relationship between model size and environmental impact is not as straightforward as Bender et al. suggest. Some smaller models can be less efficient and, therefore, more wasteful in terms of computational resources.

On the financial side, LLMs also offer considerable benefits, such as improving efficiency in customer service, automating repetitive tasks, and driving innovation in various industries. These economic gains need to be considered alongside the environmental costs when assessing the overall impact of LLMs.

Training data issues

The authors highlight the biases inherent in the training data used for LLMs, particularly biases related to race, gender, and other protected characteristics. They argue that internet data does not provide a representative view of human experience, thus amplifying biases in the models trained on such data.

While this is a valid concern, the internet’s demographics have changed significantly over time. In many developed countries, internet access is now widespread across various racial, gender, and socio-economic groups, which mitigates some of the biases in online data. The authors’ claim that online platforms overrepresent harmful ideologies like white supremacy and misogyny, while plausible, lacks sufficient empirical support in the paper.

Moreover, addressing these biases by eliminating offensive content from training data presents its own challenges. For example, a model trained without exposure to offensive language may be unable to recognise or respond appropriately to harmful content in real-world applications, such as detecting toxic language in online forums. Thus, LLMs need to reflect the reality of human language, including its negative aspects, to be effective in certain tasks like content moderation.

LLMs as stochastic parrots

Bender et al. describe large language models as “stochastic parrots” that produce fluent text without truly understanding the meaning behind it. They argue that these models cannot grasp the nuances of human communication, making their outputs potentially misleading.

While it is true that LLMs do not “understand” language in the way humans do, they are capable of generating text that is contextually appropriate and meaningful based on their training data. For example, AI chatbots can answer questions effectively, and AI translators can facilitate communication across languages. Although these systems do not have true comprehension, their ability to mimic human communication patterns allows them to be useful in many practical applications.

Recent advances in models like GPT-4 challenge the view that LLMs are merely stochastic parrots. While these models still lack full linguistic comprehension, their ability to generate coherent and contextually relevant text suggests that they possess more than just superficial pattern-matching capabilities. Dismissing their potential based on earlier model limitations may underestimate their future capabilities.

Cheap AI labour

One critical issue that Bender et al. neglect is the role of cheap labour in training LLMs. Many of these models rely on human-in-the-loop processes, where low-paid workers, often from developing countries, perform tasks like data annotation and feedback collection. These workers play a crucial role in refining and improving the models, yet they are often undercompensated for their contributions.

This practice raises ethical concerns about digital labour exploitation, as the workers who help improve these advanced technologies do not receive fair compensation or recognition. Addressing this issue requires a more equitable approach to labour in AI development, ensuring that workers are paid fairly and that their role in the process is acknowledged.

A way forward

AI realism spectrum

In discussions around AI, two polarised views often emerge. On one end are the AI skeptics, who dismiss LLMs as mere tools incapable of genuine intelligence. On the other end are AI enthusiasts, who believe that LLMs represent the dawn of artificial general intelligence (AGI). In between lies a more pragmatic perspective—AI realism—which acknowledges both the limitations and the transformative potential of these models.

Bender et al. lean toward the skeptic side of the spectrum, focusing heavily on the risks and limitations of LLMs. However, a more balanced approach would recognise the immense progress these models have made while remaining cautious about their limitations.

*Does it matter if LLMs “understand”?

A key question raised by Bender et al. is whether it matters that LLMs do not truly understand language. From a practical standpoint, the answer depends on the context. In many applications, the functionality of LLMs—whether answering questions, generating content, or translating text—may be more important than their “understanding” of the material.

Human education systems often emphasise memorisation and pattern recognition, mirroring how LLMs function. While deep understanding is valuable, there are many tasks where simply recalling and applying information is sufficient, and LLMs can excel in these areas. Thus, while comprehension is important for certain applications, the current capabilities of LLMs are sufficient for many practical uses.

LLMs as tools, not replacements

Rather than viewing LLMs as potential replacements for human intelligence, it is more constructive to see them as tools that can augment human capabilities. LLMs are well-suited to automating routine tasks, processing large volumes of information, and providing new insights based on data patterns. However, they lack the creativity, empathy, and ethical judgment that are uniquely human traits.

When used appropriately, LLMs can serve as valuable collaborators, enhancing human productivity and allowing individuals to focus on higher-order cognitive tasks.

Addressing challenges

While LLMs offer tremendous opportunities, their challenges cannot be ignored. These include environmental concerns, training data biases, labour exploitation, and the potential spread of misinformation. Tackling these issues requires ongoing research, ethical frameworks, and industry-wide collaboration to ensure that LLMs are developed and deployed responsibly.

LLMs and AGI

While LLMs represent significant advancements in AI, they are not a direct path to AGI. Achieving AGI will likely require the integration of various AI systems that specialise in different tasks, rather than relying solely on large language models. Progress toward AGI will be incremental, with LLMs playing one role in a much broader AI ecosystem.

The challenges posed by LLMs, including environmental impacts, biases, and ethical concerns, are significant but not insurmountable. By adopting a balanced approach that recognises both the limitations and the opportunities of these models, we can harness their potential for positive impact while mitigating the associated risks.

References

1. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). *On the dangers of stochastic parrots: Can language models be too big?* In *Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency* (pp. 610-623).

2. Crawford, K. (2021). *The atlas of AI: Power, politics, and the planetary costs of artificial intelligence*. Yale University Press.

3. Fedus, W., Zoph, B., & Shazeer, N. (2022). *Switch transformers: scaling to trillion parameter models with simple and efficient sparsity*. Journal of Machine Learning Research, 23, 5232-5270.

4. Goldberg, Y. (2021). *A criticism of “On the Dangers of Stochastic Parrots: Can Language Models be Too Big.”* Retrieved from [https://gist.github.com/yoavg/9fc9be2f98b47c189a513573d902fb27](https://gist.github.com/yoavg/9fc9be2f98b47c189a513573d902fb27)

5. Dhar, P. (2020). *The carbon impact of artificial intelligence*. Nature Machine Intelligence, 2(8), 423–425.

6. Patel, R., & Pavlick, E. (2021). *Mapping language models to grounded conceptual spaces*. In *International Conference on Learning Representations*.

7. Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). *A survey on bias and fairness in machine learning*. ACM Computing Surveys, 54(6), 1–35.