Paper Title: Dark citizen science

Author(s) and Year: James Riley and Will Mason-Wilkes (2024)

Journal: Public Understanding of Science (Open Access)

TL;DR: The paper explores how certain practices can resemble citizen science but in fact involve engaging participants in scientific tasks without transparency, informed consent, or benefits for the public. The authors compare two cases — traditional citizen science, Galaxy Zoo, and ‘dark’ citizen science, reCAPTCHA — to distinguish the two, and draw attention to a broader discussion of the ethics of work without pay and how citizen science outputs are used to train large-scale AI models.

Why I chose this paper: I am a researcher in astronomy and AI, so it was interesting for me to see the other, non-obvious ways these fields interact. I also found the comparison of the two case studies to be a novel take on Internet tasks we engage in daily.

The Background

The 5Ps of Citizen Science

Citizen science is a concept familiar to many — think “crowdsource” galaxy classification, air and water quality monitoring, and flora and wildlife censuses. Citizen science is known to improve public engagement with and perception of science, and sometimes serves as a necessary step for groundbreaking discoveries. For example, it allowed scientists to challenge a century-long belief that the shape of a galaxy’s spiral arms depends on how big its center is.

Defining citizen science is no easy task, though; it is instead best understood through its characteristics, the “Five Ps”:

1. Purpose: citizen science generates new knowledge;

2. Process: citizens are central to the production of that knowledge;

3. Perceptibility: everyone involved in citizen science can clearly identify the goal of the project;

4. Power: citizen science is voluntary and open; and

5. Public Effect: citizen science produces knowledge that otherwise could not have been created.

Some public technoscientific practices resemble citizen science, but a closer look reveals that participants are usually not aware they are involved in data analysis benefiting tech businesses. These practices, dubbed “dark citizen science” by the authors, remain under-researched, despite more people engaging in them than in traditional citizen science. Spoiler alert: You are taking part, too.

The Research Question

Are You a Robot?

The authors explore how certain science-society interactions operate outside of the traditional citizen science frameworks. They do so by investigating two central themes:

1. How does dark citizen science differ from genuine citizen science in terms of perceptibility, power, and public effect?

2. What are the implications of these differences for the broader understanding of what is considered citizen science and how it affects the labor market?

The Methods

Get in Citizen, We’re Going Classifying

The authors use a comparative case study methodology, looking at two Internet-based categorization processes through the prism of the “5Ps” framework: the traditional citizen science project — Galaxy Zoo — and the dark citizen science project — Google’s reCAPTCHA.

Galaxy Zoo is a classification tool for images of galaxies. It was created to help scientists analyze the growing volumes of astronomical data. Galaxy Zoo’s participants are asked to assess images of galaxies in terms of their characteristics. The project is hosted on Zooniverse, which has several other citizen science projects, ranging from the natural sciences and medicine to arts and humanities.

Like Galaxy Zoo, Google also relies on user input for classification, particularly for their reCAPTCHA feature. A CAPTCHA is a test used to verify whether a user trying to access an online service is human. Chances are, you did one today! Examples of CAPTCHAs include transcribing symbols or words, selecting images belonging to a certain class, and completing puzzles. The first version of Google’s reCAPTCHA required users to decipher distorted text, while the second introduced checkboxes or image-based challenges to verify humans.

The current third version runs in the background with no challenges displayed if the user is deemed “low risk,” or presenting a challenge for a “high risk” user. However, in addition to fulfilling a security purpose, reCAPTCHA has a hidden back-end: User’s responses are used to transcribe data that computers can’t easily process (v1) or label data (v2 and v3), constructing datasets used to solve machine learning problems for profit.

The Results

I’m not a Robot

The authors argue that the first two of the 5Ps, namely process and purpose — data collection or knowledge creation through citizen labor — are the same for citizen science and dark citizen science. The other three Ps — perceptibility, power, and public effect — set these two paradigms apart. Particularly, traditional citizen science is carried out for the public good and is a public good, as it makes possible research that can’t be (practically) done without involvement of the public, all while being open, voluntary, and equitable.

This contrasts with dark citizen science, where there is an inequity in profit distribution due to the lack of positive public effect, with tech companies benefiting instead. Additionally, comparison between traditional and dark citizen science shows that the purpose of the latter is imperceptible to the participant. Users are engaged in practices involuntarily and unequally, tipping the balance of power in favor of the tech industry.

By drawing comparisons between “dark” and “light” citizen science, the authors do not imply that genuine citizen science is without problems. Even celebrated traditional citizen science projects have the controversial side of replacing paid scientific jobs with free citizen labor: Crowdsourced citizen participation is then used as inputs to train AI models, gradually replacing the need for both scientific and citizen contribution as models get smarter. Galaxy Zoo’s model, for example, is already capable of classifying a galaxy with 99% accuracy.

The Impact

Putting the Citizens Back into Citizen Science

Not all hope is lost for dark citizen science as it can be transformed into genuine citizen science. The researchers argue that making the process of knowledge creation explicit, obtaining user consent, and giving an opt-out option, would shift reCAPTCHA and similar projects from dark to traditional citizen science.

By introducing the term “dark citizen science,” the authors encourage scholars in science and society to critically examine certain of today’s technoscientific practices, ultimately questioning what should count as citizen science at all. Rather than treating citizen science as a fixed category, the authors see it as a flexible phenomenon, shaped by power, visibility, and economic context. These reflections are a starting point for a broader and much needed ethical discussion of the socio-economic impact of doing scientific work without pay, generating profits for big tech companies, and replacing human contributions in science.

Future research on (dark) citizen science could focus on interactions between science and society on other online platforms and explore more complex citizen science paradigms, for example by considering whether citizen science is active or passive. An example of passive citizen science is when the public posts wildlife images to social media that scientists later use to produce scientific datasets.

My key takeaway: The “5Ps” framework can be used by online platforms as a checklist to assess whether a citizen science pipeline is genuine citizen science, and private companies should aim to be more transparent when engaging the public in classification tasks. As participants in (dark) citizen science, each of us should be mindful of who benefits from it and think of the consequences free work has on the labor market.

—

Written by Mykyta ‘Nik’ Kliapets

Edited by Mariella Mestres-Villanueva and Madeline Fisher

Featured image credit: Otto Rascon at Pexels (CC0 license from Creative Commons)