LibGuides: Data Justice: What Is Data Justice?

What is data justice?

Data justice is a set of ideas that addresses the way in which people are represented and/or harmed through their data being made visible to others, or conversely, excluded from the public eye, sites of power, and the process of decision-making. It confronts and challenges structural biases in the ways that we think about, collect, steward, and use data.

Data are not neutral

Data are not and have never been neutral; the ways that we collect, represent, steward, and hide or make visible data about people and communities reflects bias. Critical media scholars have written about this from a variety of perspectives. Safiya Noble explores the ways that biased training datasets reinforce racism in search engine results in her book Algorithms of Oppression (2018), and Catherine D'Ignazio and Lauren Klein discuss the invisibility of missing and murdered women in Mexico because of the government's refusal to collect data on them in Data Feminism (2020).

The UN Women's ad campaign "Women Should" from 2013 highlights the sexism of search engine results, which Safiya Noble points out are underpinned by biased training data in "Algorithms of Oppression". Source: UN Women.

Several scholars and communities have ennumerated principles of data justice that can help us explore and challenge where power is held within systems that collect and hold data, and bring the rights of people and communities to the forefront.

Linnet Taylor defines data justice as "fairness in the way people are made visible, represented, and treated as a result of their production of digital data" (Taylor, 2017). She identifies three pillars of data justice:

(in)visibility: people have the right to representation through their data, but also have a right to informational privacy.
technological (dis)engagement: people have the freedom not to engage with certain technologies, become represented in commercial databases, or otherwise be forced to engage with data markets.
non-discrimination: people have the right to identify and challenge bias in data-driven decision making, and have the freedom not to be discriminated against as a result of their data.

The Coalition of Communities of Color addresses the ways that power imbalances in data practices have harmed entire communities, defining data justice as an approach that intentionally "redresses ways of collecting and disseminating data that have invisibilized and harmed historically marginalized communities" (Research and Data Justice, n.d.). They too identify three pillars of data justice, stating that data justice should:

make visible community-driven needs, challenges, and strengths;
be representative of community, and;
treat data in ways that promote community self-determination.

As researchers and scholars, we have a responsibility to and an impact on the communities we conduct research in, with, and about. Our collection, usage, and storage of data about them affects the way that they are represented in the scholarly record and how policymakers enact laws that affect them at all levels of government. This guide will examine the ways that structural imbalances in data practices are affecting research and education, and suggest ways to intentionally practice data justice principles in one's own research.

Coalition for Communities of Color. (n.d.). Research and Data Justice. https://www.coalitioncommunitiescolor.org/-why-research-data-justice

D’Ignazio, C., & Klein, L. F. (2020). Data feminism. The MIT Press.

Taylor, L. (2017). What is data justice? The case for connecting digital rights and freedoms globally. Big Data & Society, 4(2), 2053951717736335.

Noble, S. U. (2018). Algorithms of oppression : how search engines reinforce racism. New York University Press.

This guide uses content and formatting adapted from the University of British Columbia Library's "Citation Justice" guide and the University of Maryland University Libraries "Diversity, Equity, and Inclusion in Research" guide.

Big data is increasingly becoming a topic of research and is being used in almost every discipline. Lindebaum, Moser, and Islam discuss the ways in which borrowing corporate data hurts theorizing in management research, while Yu and Fang celebrate the accessibility of modeling data for urban studies. However, data technologies are also making their way onto campuses and into the lives of student communities in perhaps unwelcome ways.

Consider, for example, how some instructors have been compelled to adopt surveillance software that sacrifices privacy in the name of academic integrity in accordance with university or departmental policy. These remote-testing softwares- including Proctorio, Honorlock, and ProctorU- claim to reduce cheating, but invade privacy, allow for discrimination, and reduce accessibility. They have been pulled from several universities; UIUC discontinued its use of Proctorio in response to student outrage. There have been many complaints that the software is discriminatory, perpetuating racial bias in its facial detection algorithms, as well as an unprecedented invasion of privacy.

In another bizarre example, facial recognition software was discovered installed on a vending machine on the University of Waterloo's campus, monitoring and capturing students' data without their knowledge. In this case, no one using the vending machine, or even walking by, knew until a student noticed a revealing error message.

We should be aware of the impact of datafication - the rampant collection of data that aims to transform aspects of our lives into quantifiable units that can be measured and sold- on our scholarly lives. The collection of data by and in the educational sphere can give us new topics to study and increase our efficiency, but also quantifies irreducible aspects of our lives and can lead to generalizations, unfair biases, and in the worst cases, outright discrimination.

The very first step to incorporating equity into your data and the way you interact with it is to acknowledge that data are not objective or neutral. From collection to use and reuse, every step is filtered through each researchers' preconceived ideas and notions about what should be counted and how it should be counted. The following questions and considerations from UMD's Diversity, Equity, and Inclusion in Research Guide will help you think through the ways you interact with data throughout the research process.

When you collect data from marginalized/under-represented groups, you may not have a second chance. Losing data is always bad, but breaking trust with these communities means long-term repercussions for both you and them, particularly when it comes to trust. How will you protect the data of the individuals you collect?
Are there issues of translation/context in your data where a non-expert could come away with the wrong conclusion? If so, were these issues addressed in any supplementary documentation (e.g. code book, data dictionary, readme file) that provide adequate cross-walks, translations, or context?
Sharing and publishing data, especially when those data represent people who may take issue with how they are presented or how their data are reused, requires a clear understanding of the terms under which that data were collected. Did you make it clear that you are obligated to share the data with the larger research community? Did you present an option to flag certain variables that the studied populations may feel comfortable giving to you, but might not want made freely available? Are the data unnecessarily precise in terms of geographic location or indirect identifiers?

Think about consent and ownership of data early to conduct research responsibly- your research participants will thank you!