Data justice is a research framework that aims to integrate social justice concerns into research about data and algorithms. [1] It is an overarching term that is used to describe the implications of activism, policy, research, and governance under conditions of datafication. Data justice investigates how the development of smart technologies involves social and political considerations that may disproportionately impact marginalized groups. [2] This research examines how justice concerns play a role in the way people are made visible, represented and treated through the production and use of their data. [3] As an umbrella term, data justice investigates the impacts of how data is used, as well as responses to data harms and misuse. [4]
Data justice originated as a response to a concern regarding datafication, the practice of turning behaviour into specific data points through the use of easily accessible information and communications technology. [5] [6] [7]. A key element in this process involves the feeding of data to users in the form of advertising and specific content recommendations. [8]. In response to the increased use of behavioural data, critics questioned how this information was used and sold in the data economy. [9] [10] [11] A critique of datafication emerged as a sub-criticism of how surveillance may or may not be employed to track and modify human behaviour . [12] [13] [14] [15] Shoshana Zuboff argues that these practices have become central to the modern economy creating the condition of surveillance capitalism. [14] This process is enabled by the existence of a platform economy, where infrastructures are developed to enable the mass collection of data. [16] [13]
Data justice sees technology as embedded within a political economy that involves a series of social and political decisions related to monetization and datafication. [17] [18] Within these practices, there has also been an increase in public-private interface use, where technology is developed for simultaneous use in the public and private sectors. [3] Critical perspectives on the data economy gained prominence following the Snowden leaks and the Cambridge Analytica data scandal, with critics focussing on the purpose of datafication and big data practices. [10] As a research practice, data justice examines how the shift to data centric infrastructures may or may not create unequal social conditions based on class, race, socioeconomic status, and/or gender. This involves investigating the intersectional consequences of the increased use of automated decision making systems. [15] [19]
Data justice grows out of practices in information studies which considers information as a beneficial resource that should be distributed equally. [20] This involves analyzing the justices and injustices relating to datafication through a consideration of how technology can enable or hinder fairness and equity. [1] [2] This perspective aims to determine ethical pathways for the continued development of information technology. [3]
Data justice work is concerned with integrating social justice into data science. This focus attempts to balance distributive justice and structural justice to understand how data can play a role in upholding or challenging systemic injustices. [21] Distributive justice involves arbitrating opposing claims through the issuing of material goods, political power, and/or social and political rights [22] [23] [24]. Structural justice reflects the degree to which efforts towards self-determination and self-development are enabled or constrained by institutional conditions and values. [25] [26] Data justice attempts to balance both of these practices under a research agenda that affirms the need for equal distribution while acknowledging the presence of systemic injustices within datafication. [21] Conducting data justice research involves situating technology, data and algorithms as having the potential to reinforce systemic forms of oppression. [27] [28]
Data justice was designed with a goal of developing alternatives to unequal participation in datafication by encouraging the use of data technology as a tool to fulfill aims of social justice and political participation. [29] To accomplish this goal, data justice examines the impacts of institutional uses of data, while also responding to potential data harms and misuses. [4] This bridges research from data ethics, algorithmic governance and social justice together under one framework. A focus on justice enables social movements to build alliances with algorithms to pursue social justice objectives. [1]
Data justice presumes that data is collected under pre-existing conditions related to power. [15] [30] [31]. This perspective draws from an account of the historical imperatives of data collection, in which data was collected and used as a tool to consolidate knowledge about certain populations. [15] The collection and use of historical data has been cited as exploitative for groups such as Indigenous communities, populations from the global south and Black people in North America due to unequal power distributions between researchers and subjects. [32] [33] [34] The prominence of data universalism has been presented as a key factor that ignores the presence of power imbalances in data collection and use. [15]
There are current calls to re-examine the role power plays in the production and use of data in these areas that have been historically exploitative. [35] [16]. Data justice encourages this re-examination, as it sees data and technology as part of a longer histories of structural inequality and systemic violence. [36] Through investigating the presence of data discrimination within algorithms and datasets, data justice aims to investigate the role that data play in the production of meaning. [20] This process involves an examination of how the unequal dissemination of power in data collection and deployment may lead to the reinforcement of certain discursive frames over others. [27] As such, data justice rejects technological utopianism and data universalism in favour of a framework that accounts for how technology may or may not be used to reflect a dominant system of power. [37]
Data justice research often involves a specific focus on intersectionality as it relates to the way data is used and collected in ways that are informed by direct experiences and systemic values. [15] Catherine D’Ignazio and Lauren Klein employ intersectionality within their definition of data feminism as a way to examine how unequal power relations may be replicated with data, and a commitment to developing data practices that are oriented around feminist principles of accountability and equity. [15] This perspective builds from the theory of intersectionality as a mode of critique that examines how individuals are impacted by dominant and interlocking systems of power within society. [38] [39] It investigates how one person can hold multiple different race, gender, class, and ability identities, leading to different experiences of systemic oppression. There is a strong body of research that investigates how artificial intelligence systems and Big Data can exacerbate this systemic oppression, bias and inequality. [40] [41] [33] A data justice perspective argues that data is not objective or neutral, but often can enact systemic bias that can worsen inequalities.
Data agency involves connecting agency and democratic processes to data organization. [42] A focus on agency connects to a focus on power. When there is an increasing reliance on data to complete everyday activities, these data practices acquire more power. [5] This increase in power has led scholars to discuss the importance of citizen agency in relation to data structures. [8] [42] Agency involves the way an individual or collective may take action based on a reflection about the conditions of the world around them. [43] Data agency draws from theories of agency to understand how citizens can become involved in the everyday processes of datafication, to further supplement the role played by tech companies and government organizations. [44]
A key element of data agency involves integrating data literacy practices into citizen participation initiatives. This perspective argues that in order to engage citizens with data in meaningful ways, data justice work must help citizens understand and reflect on the collection of their data and the implications that come with it. [45] [46] A focus on literacy demonstrates that algorithmic transparency may not provide those without a technical background with a full understanding, limiting the potential for citizen accountability to be fully realized. [47]
Projects working to build data literacy and agency include:
Data activism involves civic engagement as an alternative to opaque datafication. [29] [50] It is characterized by mobilizations against existing data uses and practices. Data activism involves both reactive and proactive responses to the use of data. Proactive responses occur when activists appropriate open data practices to promote social justice and broaden participation in decision making. [51] A reactive approach aims to challenge the presence of algorithmic control that may be utilized by government agencies and corporations. [51] Both of these approaches emphasize the important structural role of intermediary organizations as channels through which citizens can provide feedback and organize collectively. [31]
Examples of Data Activism Include:
De-westernization promotes the inclusion of a decolonial lens in data studies to acknowledge the importance of including perspectives from the global South. [55] [1] By including specific and varied information from the Global South, it is argued that data can more accurately reflect the needs and experiences of marginalized groups. This perspective reflects the claim that data gathering has historically existed as a form of oppression in the Global South. [34] [35]
Through integrating the perspectives of historically marginalized communities, data justice works to empower communities to use data in safe and ethical ways. Practices such as counter-mapping in Indigenous communities and partnering with trusted community partners demonstrate how de-westernization can work within data justice research projects. [34] [30] [32] Throughout the 1970s, Indigenous groups in Canada utilized counter-mapping, alongside visual and oral media, in formal land claim negotiations as evidence of territorial sovereignty. [32] [56] Recently, projects such as the Decolonial Atlas Project have engaged in counter mapping to demonstrate a wide range of socio political issues using data. The Big Data From the South initiative calls for a de-westernization of critical data studies through a research agenda that integrates the perspectives of marginalized communities into data studies. The initiative is oriented around five conceptual operations that relate to data justice practices: [55]
Data justice involves a broad and interdisciplinary research agenda encompassing various perspectives in the disciplines of critical data studies, health, international development, machine learning, and public policy. These research agendas are united under a commitment to investigating the role that politics and power play in the collection and use of data. [3] Data justice research can take many forms, and has been conceptualized using different frameworks, reflecting the broad nature of the term.
Richard Heeks and Satyarupa Shekhar developed a framework for data justice research that aims to understand the different levels in which data can interact with social justice goals. [31] Their framework encourages researchers to analyze datafication through five different levels of critique:
Linnet Taylor depicts data justice research as a project that can integrate three “pillars”. [3] This framework was developed in response to the three original conceptions of data justice: as a concern for governance, as a matter of distributive justice, and as a way to connect to social justice organizations. [21] [59] [12] The three pillars of data justice combine these approaches under a set of key concepts:
Ben Green proposed four steps for data scientists to engage with data justice research. [60] This perspective argues that data ethics efforts are ill-equipped to generate data science that avoids social harms and promotes social justice. For data scientists to recognize themselves as political actors, Green argues that researchers must complete four stages:
Groups currently working within this framework include Black in AI, LatinX in AI, Queer in AI, and Women in Machine Learning, which seek to integrate the perspectives of marginalized communities into AI development.
Data stewardship is an oversight practice relating to the governance of data. It involves an understanding of developing practices according to the principles of FAIR data use. [61] As a data justice practice, data stewardship involves a focus on participation and community engagement by rejecting practices of data collection, storage and sharing in ways that are opaque. [62] It builds from the idea that public participation involves a ladder that involves increasing control at each level - inform, consult, involve, collaborate and empower. [62] [63] Integrating citizen participation into data stewardship enables citizens to gain insight into whether the integration of data activism may provide increasing levels of accountability and control. [64] In this framework, members of a datafied system are able to access and understand their data, enabling them to improve upon it as necessary. [34] The integration of participatory data stewardship can complement existing legal and rights-based approaches to improve digital literacy and agency. [65]
Data justice is a research framework that aims to integrate social justice concerns into research about data and algorithms. [1] It is an overarching term that is used to describe the implications of activism, policy, research, and governance under conditions of datafication. Data justice investigates how the development of smart technologies involves social and political considerations that may disproportionately impact marginalized groups. [2] This research examines how justice concerns play a role in the way people are made visible, represented and treated through the production and use of their data. [3] As an umbrella term, data justice investigates the impacts of how data is used, as well as responses to data harms and misuse. [4]
Data justice originated as a response to a concern regarding datafication, the practice of turning behaviour into specific data points through the use of easily accessible information and communications technology. [5] [6] [7]. A key element in this process involves the feeding of data to users in the form of advertising and specific content recommendations. [8]. In response to the increased use of behavioural data, critics questioned how this information was used and sold in the data economy. [9] [10] [11] A critique of datafication emerged as a sub-criticism of how surveillance may or may not be employed to track and modify human behaviour . [12] [13] [14] [15] Shoshana Zuboff argues that these practices have become central to the modern economy creating the condition of surveillance capitalism. [14] This process is enabled by the existence of a platform economy, where infrastructures are developed to enable the mass collection of data. [16] [13]
Data justice sees technology as embedded within a political economy that involves a series of social and political decisions related to monetization and datafication. [17] [18] Within these practices, there has also been an increase in public-private interface use, where technology is developed for simultaneous use in the public and private sectors. [3] Critical perspectives on the data economy gained prominence following the Snowden leaks and the Cambridge Analytica data scandal, with critics focussing on the purpose of datafication and big data practices. [10] As a research practice, data justice examines how the shift to data centric infrastructures may or may not create unequal social conditions based on class, race, socioeconomic status, and/or gender. This involves investigating the intersectional consequences of the increased use of automated decision making systems. [15] [19]
Data justice grows out of practices in information studies which considers information as a beneficial resource that should be distributed equally. [20] This involves analyzing the justices and injustices relating to datafication through a consideration of how technology can enable or hinder fairness and equity. [1] [2] This perspective aims to determine ethical pathways for the continued development of information technology. [3]
Data justice work is concerned with integrating social justice into data science. This focus attempts to balance distributive justice and structural justice to understand how data can play a role in upholding or challenging systemic injustices. [21] Distributive justice involves arbitrating opposing claims through the issuing of material goods, political power, and/or social and political rights [22] [23] [24]. Structural justice reflects the degree to which efforts towards self-determination and self-development are enabled or constrained by institutional conditions and values. [25] [26] Data justice attempts to balance both of these practices under a research agenda that affirms the need for equal distribution while acknowledging the presence of systemic injustices within datafication. [21] Conducting data justice research involves situating technology, data and algorithms as having the potential to reinforce systemic forms of oppression. [27] [28]
Data justice was designed with a goal of developing alternatives to unequal participation in datafication by encouraging the use of data technology as a tool to fulfill aims of social justice and political participation. [29] To accomplish this goal, data justice examines the impacts of institutional uses of data, while also responding to potential data harms and misuses. [4] This bridges research from data ethics, algorithmic governance and social justice together under one framework. A focus on justice enables social movements to build alliances with algorithms to pursue social justice objectives. [1]
Data justice presumes that data is collected under pre-existing conditions related to power. [15] [30] [31]. This perspective draws from an account of the historical imperatives of data collection, in which data was collected and used as a tool to consolidate knowledge about certain populations. [15] The collection and use of historical data has been cited as exploitative for groups such as Indigenous communities, populations from the global south and Black people in North America due to unequal power distributions between researchers and subjects. [32] [33] [34] The prominence of data universalism has been presented as a key factor that ignores the presence of power imbalances in data collection and use. [15]
There are current calls to re-examine the role power plays in the production and use of data in these areas that have been historically exploitative. [35] [16]. Data justice encourages this re-examination, as it sees data and technology as part of a longer histories of structural inequality and systemic violence. [36] Through investigating the presence of data discrimination within algorithms and datasets, data justice aims to investigate the role that data play in the production of meaning. [20] This process involves an examination of how the unequal dissemination of power in data collection and deployment may lead to the reinforcement of certain discursive frames over others. [27] As such, data justice rejects technological utopianism and data universalism in favour of a framework that accounts for how technology may or may not be used to reflect a dominant system of power. [37]
Data justice research often involves a specific focus on intersectionality as it relates to the way data is used and collected in ways that are informed by direct experiences and systemic values. [15] Catherine D’Ignazio and Lauren Klein employ intersectionality within their definition of data feminism as a way to examine how unequal power relations may be replicated with data, and a commitment to developing data practices that are oriented around feminist principles of accountability and equity. [15] This perspective builds from the theory of intersectionality as a mode of critique that examines how individuals are impacted by dominant and interlocking systems of power within society. [38] [39] It investigates how one person can hold multiple different race, gender, class, and ability identities, leading to different experiences of systemic oppression. There is a strong body of research that investigates how artificial intelligence systems and Big Data can exacerbate this systemic oppression, bias and inequality. [40] [41] [33] A data justice perspective argues that data is not objective or neutral, but often can enact systemic bias that can worsen inequalities.
Data agency involves connecting agency and democratic processes to data organization. [42] A focus on agency connects to a focus on power. When there is an increasing reliance on data to complete everyday activities, these data practices acquire more power. [5] This increase in power has led scholars to discuss the importance of citizen agency in relation to data structures. [8] [42] Agency involves the way an individual or collective may take action based on a reflection about the conditions of the world around them. [43] Data agency draws from theories of agency to understand how citizens can become involved in the everyday processes of datafication, to further supplement the role played by tech companies and government organizations. [44]
A key element of data agency involves integrating data literacy practices into citizen participation initiatives. This perspective argues that in order to engage citizens with data in meaningful ways, data justice work must help citizens understand and reflect on the collection of their data and the implications that come with it. [45] [46] A focus on literacy demonstrates that algorithmic transparency may not provide those without a technical background with a full understanding, limiting the potential for citizen accountability to be fully realized. [47]
Projects working to build data literacy and agency include:
Data activism involves civic engagement as an alternative to opaque datafication. [29] [50] It is characterized by mobilizations against existing data uses and practices. Data activism involves both reactive and proactive responses to the use of data. Proactive responses occur when activists appropriate open data practices to promote social justice and broaden participation in decision making. [51] A reactive approach aims to challenge the presence of algorithmic control that may be utilized by government agencies and corporations. [51] Both of these approaches emphasize the important structural role of intermediary organizations as channels through which citizens can provide feedback and organize collectively. [31]
Examples of Data Activism Include:
De-westernization promotes the inclusion of a decolonial lens in data studies to acknowledge the importance of including perspectives from the global South. [55] [1] By including specific and varied information from the Global South, it is argued that data can more accurately reflect the needs and experiences of marginalized groups. This perspective reflects the claim that data gathering has historically existed as a form of oppression in the Global South. [34] [35]
Through integrating the perspectives of historically marginalized communities, data justice works to empower communities to use data in safe and ethical ways. Practices such as counter-mapping in Indigenous communities and partnering with trusted community partners demonstrate how de-westernization can work within data justice research projects. [34] [30] [32] Throughout the 1970s, Indigenous groups in Canada utilized counter-mapping, alongside visual and oral media, in formal land claim negotiations as evidence of territorial sovereignty. [32] [56] Recently, projects such as the Decolonial Atlas Project have engaged in counter mapping to demonstrate a wide range of socio political issues using data. The Big Data From the South initiative calls for a de-westernization of critical data studies through a research agenda that integrates the perspectives of marginalized communities into data studies. The initiative is oriented around five conceptual operations that relate to data justice practices: [55]
Data justice involves a broad and interdisciplinary research agenda encompassing various perspectives in the disciplines of critical data studies, health, international development, machine learning, and public policy. These research agendas are united under a commitment to investigating the role that politics and power play in the collection and use of data. [3] Data justice research can take many forms, and has been conceptualized using different frameworks, reflecting the broad nature of the term.
Richard Heeks and Satyarupa Shekhar developed a framework for data justice research that aims to understand the different levels in which data can interact with social justice goals. [31] Their framework encourages researchers to analyze datafication through five different levels of critique:
Linnet Taylor depicts data justice research as a project that can integrate three “pillars”. [3] This framework was developed in response to the three original conceptions of data justice: as a concern for governance, as a matter of distributive justice, and as a way to connect to social justice organizations. [21] [59] [12] The three pillars of data justice combine these approaches under a set of key concepts:
Ben Green proposed four steps for data scientists to engage with data justice research. [60] This perspective argues that data ethics efforts are ill-equipped to generate data science that avoids social harms and promotes social justice. For data scientists to recognize themselves as political actors, Green argues that researchers must complete four stages:
Groups currently working within this framework include Black in AI, LatinX in AI, Queer in AI, and Women in Machine Learning, which seek to integrate the perspectives of marginalized communities into AI development.
Data stewardship is an oversight practice relating to the governance of data. It involves an understanding of developing practices according to the principles of FAIR data use. [61] As a data justice practice, data stewardship involves a focus on participation and community engagement by rejecting practices of data collection, storage and sharing in ways that are opaque. [62] It builds from the idea that public participation involves a ladder that involves increasing control at each level - inform, consult, involve, collaborate and empower. [62] [63] Integrating citizen participation into data stewardship enables citizens to gain insight into whether the integration of data activism may provide increasing levels of accountability and control. [64] In this framework, members of a datafied system are able to access and understand their data, enabling them to improve upon it as necessary. [34] The integration of participatory data stewardship can complement existing legal and rights-based approaches to improve digital literacy and agency. [65]