Grant Program
Research Grants
Grantee Name
Center for Democracy and Technology
Grant Start Date
1 November 2023
Grant End Date
30 April 2025
Amount Funded
$500,000.00
City
Washington
Country
United States
Region
Global
RESEARCH QUESTION
The main research question of the project is: How do content moderation systems operate in indigenous and other language contexts of the Majority World? To address this, the project explores three related sub-questions: (1) How are content moderation systems designed and implemented for social media users in South America, Sub-Saharan Africa, and South and Southeast Asia? (2) How do these systems function across other online services such as e-commerce, dating apps, and live streaming in these regions? (3) How are automated tools integrated into content moderation systems for select indigenous and other languages of South America, Sub-Saharan Africa, and South and Southeast Asia? The research examines the degree of transparency, accountability, access to information, and privacy that the services afford to users through their content moderation systems.
WHY IS THIS RESEARCH IMPORTANT?
While there is a growing body of research on content moderation systems and their impacts, much of this approaches the problem with a Western lens for various reasons. Another challenge in research on content moderation is the opacity of the technologies used to support content moderation systems. Given the scale of user generated content globally, many online services rely on machine learning tools to help analyze and detect content. Low-resource languages are more likely to be indigenous or from Majority World countries. There is little or no research that examines how online services use automated tools such as NLP or LLMs in their content moderation systems for Majority World and indigenous language contexts, and how they manage these limitations.
The underlying problem addressed by the research is the historical and current colonization of non-European peoples, cultures, and languages. Online services may be replicating or maintaining historical harms associated with the subjugation of indigenous and Majority world languages. It is important to research on the potential impacts of content moderation systems in these contexts, which can also inform policies and other solutions to design more equitable systems, and empower local and indigenous groups to run their own user-generated content services that are better tailored to their communities’ needs. The goal is to produce detailed reports that provide actionable recommendations to improve content moderation practices in the Majority World.
METHODOLOGY
The research focuses on three categories of online services: social media, dating apps, and live streaming platforms, examining them across three regions: South America, Sub-Saharan Africa, and South and Southeast Asia. A comparative case study analysis is used to understand how content moderation systems are designed and implemented across different language contexts and services in these regions. The study involves selecting specific language contexts, with a focus on “low-resource” languages that have significant speaker populations.
The goal is to examine content moderation systems with a view of protecting the rights of the users of those online services. Towards this, the research will examine the degree of transparency, accountability, access to information, and privacy that the services afford to users through their content moderation systems. This involves examining the key components of these services’ content moderation systems, including the role of both human moderators and automated systems (e.g., machine learning models such as NLP tools and LLMs).
Local research partners (university and civil society organizations who conduct research and advocacy on content moderation issues in their respective regions, and who either work in or with relevant language communities, or are connected to organizations that do so) are engaged to help identify relevant “low resource” language contexts, specific language communities and online services, support data collection, and review findings. Data collection methods include semi-structured interviews, focus groups, and virtual roundtables with various stakeholders, including end-users, online service representatives, third-parties who develop and sell content moderation tools to online services, and NLP/LLM researchers. The analysis explores different aspects of content moderation systems, such as content definition, detection, evaluation, enforcement, appeals, and user education.