How to remember the human: Recommendations for ethical Reddit research

Casey Fiesler
4 min readApr 11, 2024
“The reddit website” by Alpha Photo on Flickr (CC-BY-NC 2.0)

The first rule of etiquette on the social platform Reddit is “Remember the human.” This rule should apply to researchers, too.

It can be easy for people commenting online to forget that there are actual humans on the other end of the screen, and that online interactions can have an impact in the “real world.” Similarly, for the countless researchers collecting data from Reddit and other public websites, it can be easy to forget that this “data” comes from lived experiences of actual humans who might be impacted by its inclusion and dissemination in scientific research. We (myself, Michael Zimmer, Nicholas Proferes, Sarah Gilbert, and Naiyan Jones) have been thinking a lot about this problem.

In 2020, we read over 700 papers that used Reddit as a data source. Of those, less than 200 included any mention (generously defined) of research ethics. Based on inclusion criteria, we conducted additional qualitative, thematic analysis on 134 papers in order to understand what ethical considerations researchers are raising and ethical decisions they make. The result, published in early 2024, is a description and set of recommendations based on current practices.

First, we should note that for the collection of public data (which would be the case for most Reddit content), institutional ethics review boards such as IRBs at U.S. universities typically consider this research to be outside of their purview. This is because (in the U.S.) IRBs govern “human subjects research” which is defined by U.S. Code 45 CFR §46.102 as work that involves either direct interaction with humans or the collection of data that is both personally identifiable and private. For a number of the papers in our dataset, this was the extent of their ethical discussion: the data is public, and therefore we were not required to undergo ethics review. However, other papers engaged in a deeper discussion of their ethical obligations even if the research did not meet the criteria for formal review.

Other ethical considerations and decisions researchers highlighted related to issues such as consent, privacy, and risk mitigation. These included, for example: anonymization of usernames (e.g., use of pseudonyms), obfuscation or fabrication of quotes, analysis reported in the aggregate, ethical decisions based on the sensitive nature of the subreddit or content, content excluded on ethical grounds (for example, a researcher mentioning they explicitly did not collect certain types of content), possible misuse or unanticipated consequences of the research, risk versus benefit analysis of research, and discussion of whether user consent was obtained.

Drawing from our analysis of this corpus of papers, as well as existing literature on research ethics and best practices, we make the following recommendations:

  • Regardless of how “public” data is, researchers should always consider the fundamental research principles of respect for persons, beneficence, and justice when making decisions about data collection and dissemination.
  • Remember that ethical review is about more than just “checking the box” that the research protocol was screened by a required ethics review body, and it is important to document ethical decisions in the publication itself.
  • “Publicness” is not the only ethical context that matters. Researchers should take a contextual approach to consider the ethics of research design beyond solely considering whether content is public (e.g., the context of the community, the sensitivity of the content, the goals of the research). Researchers should learn the norms of a community, which can help them understand community contextual expectations for information flow.
  • Researchers should consider not just what potential risks and consequences are (including thinking beyond intended outcomes), but also what steps might be taken to mitigate negative outcomes.
  • In considering the impact on and benefits to a community, researchers should share back with the community when they can — while also taking into account that some research findings might be harmful or put the researcher at risk.
  • Researchers should holistically consider harms and benefits as they manifest to a community, and not just individuals.
  • Researchers should consider explicitly including community members in their work, such as through methods like participatory/action research and collaborative ethnography, reaching out to moderators about data collection where appropriate, and reflecting on their own positionality relative to the population, even when conducting quantitative research.

Our work shows that many internet researchers have made positive steps to ensure attention to research ethics becomes more prominent in our practices and publications, and we offer these recommendations as a continuation of those efforts. We believe that by “remembering the human” Reddit researchers can continue to engage in important research in an ethical manner.

For more details on this study, our findings, and our recommendations, see our paper published in the most recent issue of Proceedings of the ACM on Human-Computer Interaction. This work will also be presented at the ACM SIGCHI Conference on Supporting Group Work in January 2025. It was conducted as part of the broader NSF-funded PERVADE project.

Casey Fiesler, Michael Zimmer, Nicholas Proferes, Sarah Gilbert, and Naiyan Jones. 2024. “Remember the Human: A Systematic Review of Ethical Considerations in Reddit Research.” Proceedings of the ACM on Human-Computer Interaction 8, GROUP, Article 5 (January 2024), 33 pages. https://doi.org/10.1145/3633070

--

--

Casey Fiesler

Faculty in Information Science at CU Boulder. Technology ethics, social computing, women in tech, science communication. www.caseyfiesler.com