Understanding the Role of AI Text-to-Image Models in the Perpetuation of Social Biases

This article is based on the thesis for the Istituto Marangoni’s Master in Digital Art Direction course, presented on March 6th, 2024. It’s intentionally oversimplified for easy comprehension. Check out the full paper for the academic text.

Let’s start with the basics

A text-to-image model is an artificial intelligence system trained to take a textual description and generate an image from it.

How it works

It takes a dataset with images, text descriptions, and tags into a machine learning algorithm.

The algorithm analyzes the data and identifies patterns between words and images to learn how to make predictions. Thus creating a model capable of generating images from text.

Do the outputs of text-to-image models exhibit social biases?

Models learn everything from the data you feed them, including undesired attributes and correlations.

Studies by (Cho, Zala, and Bansal, 2022) tested text-to-image models for social biases. They found that some models tend to depict male figures between 1.6x to 2.4x more often when prompted for terms referring to professions— except in stereotypical “women’s jobs“ like nurses, secretaries, or flight attendants.

Another study showed that text-to-image models disproportionally depict White figures for professions like "software developer," despite real-world diversity statistics. While the models display white figures at a 99% rate, only 56% of software developers in the US identify as whites. It’s a fair comparison to make considering the models are trained mostly on American data. This indicates that the tested models not only reflect but also amplify societal disparities (Bianchi et al., 2023).

Text-to-image models are evidently biased.

FINDING #1

Why is this tech biased to begin with?

There’s a saying in data science “garbage in, garbage out”. If an algorithm is fed faulty data, the outcome will be flawed.

All-purpose text-to-image models require vast amounts of data, often sourced from the internet with minimal curation. For instance, LAION is an organization that provides image datasets with over 5B images and text pairs, for free— they do it for research and experimentation purposes. They acknowledge the presence of inadequate, disturbing, uncurated images; and discourage their use for commercial products (Beaumont, 2022). Despite these warnings, datasets like these have trained leading models such as Midjourney, DALL-E, and Stable Diffusion.

On the other hand, algorithms can introduce bias through… well, through a whole lot of technical ways, hard to explain easily. But essentially, their workings affect the weighting of classes such as “male”, “female”, “Asian“, “Black“ or “Caucasian“. Although the weighting could be coded, it raises an important concern: we don’t have a social consensus on what ethical fairness is. While mathematical fairness may seem appropriate, like a 50/50 split between male and female figures, it’s not the right solution in all contexts.

So, how is this impactful?

In May 2023, an AI-generated photo of a Pentagon explosion caused confusion and panic on social media, impacting verified news accounts and even the stock market (Haddad, 2023). 

A benchmark study revealed a 38.7% misclassification rate by humans when distinguishing between AI-generated and real photographs (Lu et al., 2023).

With over 50 million users generating 15.5 billion images in just 18 months, a task that took photographers 150 years (Valyaeva, 2023). AI tools are expected to flood media sooner than later. Bear in mind, that currently, there is no legislation prohibiting the distribution or use of biased AI models.

Having biased tools playing this role in society implicates the perpetuation of harmful social narratives that marginalize, discriminate against, and misrepresent social groups.

AI generated image of Pentagon Explosion. Fake news.

Haddad, M. (2023). Fake Pentagon explosion photo goes viral: How to spot an AI image. [online] Aljazeera. [Accessed 6 Jan. 2024].

Text-to-image models will impact visual culture substantially.

FINDING #2

Let’s understand the risks.

To comprehend the risk of AI text-to-image models, I went on to study the relationship between biases, beliefs, and images.

Beliefs, often perceived as subjective, are actually substantiated by cognitive science. For instance, the placebo effect, which has been studied for centuries. As demonstrated in Gu et al.'s (2015) study on smokers. It observed that a test group smoking placebo cigarettes without nicotine activated the same regions of the brain as the other group that smoked real cigarettes. This shows that beliefs influence the triggering of physiological responses, thus demonstrating that the influence of beliefs in behavior is scientifically observable.

In terms of forming and maintaining beliefs, philosophical theories support the importance of evidence. Diverse models explain it differently, but they all agree that what we experience from our environment is evidence that can either endorse or reject an existing belief. Generally speaking, evidence can be anything we experience cognitively, like what we see, hear, taste, feel, read, think, say or discuss.

Let’s make that clear. Exposure to images and the narratives they reinforces, inform our beliefs.

So every image we encounter that shows or tells that Black people aren’t educated or law-abiding; that beautiful women are exclusively White, young, slim, and show themselves as available; that leaders are old, White men; that Muslims are terrorists; that developing countries, and their people, are precarious and uncivilized; or that happy families have to be heteronormative. Every image is a chance to shape and sustain beliefs in the viewer. Beliefs that affect their behavior, to the point of discriminating or violating the rights of a social group or individual. That is the power of images and representation.

Images, as evidence, either endorse or reject existing beliefs.

FINDING #3

It’s no coincidence that

In 2023, Black suspects in the U.S. were

2.6 times

more likely to be killed by police enforcement compared to White suspects (Levin, 2024).

For centuries, social institutions have supported the fallacious narrative that Black people are uneducated and unreliable. From establishing slaves as commodities, as non-humans lacking autonomy, during the colonial era, to how advertising in the 19th century sustained stereotypes such as Black people being “dirty”, “lazy”, or “unintelligent”. Even in the 2000s, TV news focused on crime within Black communities disproportionally to their representation in other aspects of life (Entman and Rojecki, 2001). The negative representation of Black people in visual culture contributes to sustaining beliefs that lead to their disproportionate deaths.

Colonial period advertisement featuring slaves for sale. 

Cream of Wheat advertising poster (1921).

Advert for Pears' Soap (19th century).

Advert for Lautz Bro's & Co's Stearine Soap (ca. 1870–1900).

It’s no coincidence that

Working women worldwide make an income

17% less

than men on average per hour, while also facing the glass ceiling (Haan, 2023).

For centuries, visual culture has portrayed female bodies as objects of desire or subjects to be looked at rather than as individuals with agency and autonomy, submissive to the male gaze (Berger, 1972).

Lauzen's (2013) analysis of top-grossing films revealed disparities in on-screen gender representation, with male characters depicted in work settings and pursuing work-related goals more frequently than female characters. This kind of stereotype sustained by the media implies that women aren’t fit to be professionals as competent as men, nor occupy leadership positions. The representation of females as objects that lack autonomy contributes to sustaining beliefs that lead to their marginalization and unfair treatment as professionals.

Jean Auguste Dominique Ingres, La Grande Odalisque, 1814

Jean Auguste Dominique Ingres’ La Grande Odalisque, 1814.

Kate Moss for Liu Jo Fall/Winter 2012 Womenswear Ad Campaign photographed by Inez & Vinoodh,

If we let biased AI systems impact visual culture, they will perpetuate harm to underprivileged social groups.

FINDING #4

Is this whole thing unavoidable?

Thankfully, no. There are indications that different stakeholders are working towards fairer AI. First off, biases are widely acknowledged. Developers often refer to them in their release notes and the media; consistently showing more efforts to mitigate them from a technical perspective.

Emerging legislation policies, such as the EU AI Act passed in December 2023, signal a more regulated future with increased supervision over the development and distribution of these systems. Although legislation moves slowly, discussions are latent.

Organizations are developing frameworks, such as guidelines and checklists, to address ethical concerns in the development of the tools without significant investment.

There are hopeful signs that these systems won’t persist in their current state.

FINDING #5

And what can we do? Especially those of us professionals creating images, or as academics instructing the talent of the future, we have the chance to pave the way for a future where technology helps us build more fair social narratives. As users and active participants in the conversation, we must acknowledge the faults of these systems and be critical of how our work becomes part of visual culture. We should leverage technology to prevent unfairness, not to uphold it.

Thank you for caring about these things.

For further information check out the complete thesis.

Reference list

Beaumont, R. (2022). LAION-5B: A NEW ERA OF OPEN LARGE-SCALE MULTI-MODAL DATASETS | LAION. [online] laion.ai. Available at: https://laion.ai/blog/laion-5b/.

Berger, J. (1972). Ways of Seeing. London: British Broadcasting Corporation and Penguin Books.

Bianchi, F., Kalluri, P., Durmus, E., Ladhak, F., Cheng, M., Nozza, D., Hashimoto, T., Jurafsky, D., Zou, J. and Caliskan, A. (2023). Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale. 2023 ACM Conference on Fairness, Accountability, and Transparency, [online] pp.1493–1504. doi:https://doi.org/10.1145/3593013.3594095.

Cho, J., Zala, A. and Bansal, M. (2022). DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models. arXiv (Cornell University). [online] doi:https://doi.org/10.48550/arxiv.2202.04053.

Entman, R.M. and Rojecki, A. (2001). The Black Image in the White Mind. Media and Race in America. doi:https://doi.org/10.7208/chicago/9780226210773.001.0001.

Gu, X., Lohrenz, T., Salas, R., Baldwin, P.R., Soltani, A., Kirk, U., Cinciripini, P.M. and Montague, P.R. (2015). Belief about nicotine selectively modulates value and reward prediction error signals in smokers. Proceedings of the National Academy of Sciences, [online] 112(8), pp.2539–2544. doi:https://doi.org/10.1073/pnas.1416639112.

Haan, K. (2023). 52 Gender Pay Gap Statistics In 2023 – Forbes Advisor. [online] Forbes. Available at: https://www.forbes.com/advisor/business/gender-pay-gap-statistics/ [Accessed 3 Jan. 2024].

Haddad, M. (2023). Fake Pentagon explosion photo goes viral: How to spot an AI image. [online] Aljazeera. Available at: https://www.aljazeera.com/news/2023/5/23/fake-pentagon-explosion-photo-goes-viral-how-to-spot-an-ai-image [Accessed 6 Jan. 2024].

Lauzen, M. (2013). It’s a Man’s (Celluloid) World: On-Screen Representations of Female Characters in the Top 100 Films of 2013. On--Screen Representations, [online] 1. Available at: https://womenintvfilm.sdsu.edu/files/2013_It%27s_a_Man%27s_World_Report.pdf [Accessed 13 Jan. 2024].

Lauzen, M. (2023). Boxed In: Women On Screen and Behind the Scenes on Broadcast and Streaming Television in 2022-23. [online] The Center for the Study of Women in Television and Film. Available at: https://womenintvfilm.sdsu.edu/wp-content/uploads/2023/10/2022-23-Boxed-In-Report.pdf [Accessed 14 Jan. 2024].

Levin, S. (2024). 2023 saw record killings by US police. Who is most affected? The Guardian. [online] 8 Jan. Available at: https://www.theguardian.com/us-news/2024/jan/08/2023-us-police-violence-increase-record-deadliest-year-decade [Accessed 17 Jan. 2024].

Lu, Z., Huang, D., Bai, L., Liu, X., Qu, J. and Ouyang, W. (2023). Seeing is not always believing: A Quantitative Study on Human Perception of AI-Generated Images. arXiv (Cornell University). [online] doi:https://doi.org/10.48550/arXiv.2304.13023.

Valyaeva, A. (2023). AI Image Statistics: How Much Content Was Created by AI. [online] Everypixel Journal. Available at: https://journal.everypixel.com/ai-image-statistics [Accessed 28 Dec. 2023].