2 Motivation for reproducible science

beginner

basics

Summary

In this chapter, you learn about typical arguments in the Open Science debate. You will learn why some people do or do not engage in a reproducible workflow.

Learning Objectives

💡 You can argue why some researchers do not engage in reproducible research.
💡 You can argue why some researchers do engage in reproducible research.
💡 You can argue how Open Science practices influence classical incentives in a research career.

Reproducibility is a big topic, we spent a lot of work and time in writing a book about it and you will probably spend a lot time learning about it. If you are asking the question “What for? Why should I do reproducible research?”, that’s completely reasonable. We decided to dedicate a whole chapter to this question, because it is very important to us describing different facets of the topic of reproducibility. Most of the arguments we will point out are not restricted to reproducibility but cover the whole topic of Open Science.

2.1 Reproducibility: Simple and unnecessary?

After the definition of reproducibility in the previous chapter, you might have one of two thoughts: Either you think, that getting the same results from the same data and code has to be common practice and super simple. Just to ruin your day, you can search for a paper with public code and data and try to reproduce the results from the article. While you are doing this (or wisely not) you may also start to figure out that many problems can occur when trying to reproduce scientific results. The amount of time spent on making your research reproducible and eventually prepare solutions for many different problems that can occur, can give you a headache. Or you think that spending your time on making your research reproducible is simply not worth it. When you want to succeed as a researcher (PhD, Post-doc) and get a professorship at a university you need to get published in journals with high impact factors, find significant results to get published in such journals and receive a lot of citations from other researchers (see Note 2.1). Where is the time for reproducibility? Isn’t the reward system of science directly against good research practices? If you do not publish your code you can (un)intentionally and undetectedly make code errors that might confirm your hypothesized results. You can selectively report outcomes that confirm your hypotheses and climb the career ladder. Making your research reproducible and spending time with this learning resource is not valuable for your research career.

Note 2.1: Evaluation of selection criteria in appointment procedures in psychology

In a study by Abele-Brehm and Bühner (2016), about 1,450 researchers in Psychology (members of the German Psychological Association) answered an online questionnaire, in which they rated the desired relevance of 41 selection criteria as well as their actual relevance in hiring procedures in psychology (66% of the respondents were actually members of a professorship hiring committee at least once). While some qualitative indicators like the fit of research profile to the appointing institute (Rank 2) and the quality of the research presentation (Rank 3) were also deemed relevant, several of the actually relevant indicators were quantitative and considered the number of peer-reviewed publications (Rank 1), the number of publications (Rank 4), the volume of third-party funding acquired to date (Rank 5) as well as the number of first-author publications (Rank 6). Also notably, “indicators of research transparency” took the last rank of 41 selection criteria.

Arguments against Open Science practices

time consuming
not important for climbing the career-ladder
possible data misuse of shared data
against innovation-driven research
against business model of private research companies

But there is also a different narrative:

2.2 Reproducibility: Hard and worth-it

Goodhart’s Law: When a measure becomes a target, it ceases to be a good measure. Goodhart (1984)

Figure 2.1: “Goodharts Law” by xkcd.com (Randall Munroe) (License: CC BY-NC 2.5; reused without modifications)

Let’s take another perspective. First of all, ask yourself: What is your primary goal as a researcher? Is it climbing the career ladder by publishing in high impact factor journals or conducting high quality research or something else? The focus on performing well on measures that quantify research quality (like Impact-Factor and h-index) can lead to a loss of the original objective for which these measures were constructed, the quality of research itself (see Figure 2.1). Does one of them exclude the other? Why do you think so?

A short anecdote on believing in most citations = best

For the introductory section of this book, we were searching for the prevalence of reproducible and non-reproducible psychological research. We typed the search string (reproducib* and psycholog*).ti into ‘Web of Science’. The database found 64 articles related to the search string. 4 articles had a “Highly cited paper” badge. We clicked on them with the feeling of getting the most important articles. We fell into the citation trap. In the end, when we had screened all 64 articles, 2 articles of the highly cited papers and 2 of the not highly cited papers gave rise to the research question. As this example illustrates, citations can be used as a measure but one can not only rely on them to make the decision on which papers will be worthwhile.

In a highly-debated blog post, Tal Yarkoni stated that as researchers, we are not helpless prisoners in the reward system of science. We are an active part of our respective scientific community. We have the ability to shape our work and craft our jobs in the direction of reproducibility (which can also improve mental well-being and job performance (Bakker, Demerouti, and Sanz-Vergel 2023)). No matter what, we have to take responsibility for our actions, if we want to or not, and we cannot blame the incentives of science and lose our responsibility for our work (Yarkoni 2018). Apart from that, reproducibility makes knowledge and scientific processes easier for everyone to access (The Turing Way Community 2022).

However, there are some arguments by McKiernan et al. (2016) that Open Science practices (e.g. reproducible data and code) can even enhance your visibility as a researcher and give you a headstart at the academic market. When you publish the code and data of your published articles, others can work with your data. They may rerun your analysis, try to answer different research questions with your data or apply new methods to it. Especially the conduction of meta-analyses is simplified if your data and code are available and reproducible. This could increase your citations in comparison to not making your code and data publicly available (McKiernan et al. 2016). And further along you could be cited in important review-articles which also increases your visibility as researcher. You can also benefit the other way around. If there is a data set or calculation that suits your research question, you can use open data sets, saving a lot of time on data acquisition.

Furthermore, making your research reproducible makes your research more efficient from the second project on. There is no doubt that making your first project reproducible is more time-consuming than ignoring reproducibility efforts. However, imagine you submitted a paper in whatever discipline and in your research you conclude that a slight change of the experimental setup or the statistical analyses would deepen the understanding of your topic. For example, you conducted a Stroop task (Stroop 1935) aiming to investigate an influencing factor that explains the stroop effect. You did your research, published a paper and now you want to investigate whether your findings also apply to the emotional stroop task (Williams, Mathews, and MacLeod 1996). If you set up your first study reproducible, you have easy access to the experiment, all your code that constitutes your experiment is already there. You just have to make minor changes to convert your initial stroop task to an emotional stroop task. Thus, reproducibility saves you a lot of time in programming tasks. Of course, the analyses pipelines to test your hypotheses would also already exist. This example shows how you can save a lot of time from your second project on.

Note

You can even save a lot of time for your first project, when you build your work on previous OpenAccess work. For example, the initial study was not conducted by yourself, but was shared in a reproducibility archive such as osf.io.

The example also illustrates that reproducibility and Open Science practices foster collaboration among researchers (Poldrack 2019). With openly available experimental materials, you can collaborate with others also interested in your topic and can conduct well-powered multilab studies. Or the other way around: If in a multilab study one laboratory does not share the project materials in a way that the other labs can reproduce them, how easy will collaboration in this project be? A nice side benefit might be that you get job opportunities from other labs you collaborated with.

But how do you convince your supervisor to give you the time to make your research reproducible? Supervisors need the research and jobs to be funded by different organisations (as the Horizon Europe funding pool from the European Commission). These organisations now require open access practices including a call for reproducible research (for Horizon Europe see here and here). That means your supervisor would lose funding opportunities, if they don’t engage with you in Open Science practices and reproducible research.

Arguments for Open Science practices

scientific integrity
make knowledge public
visibility as researcher
- more citations
- more citations in high-impact articles
faster research output
collaboration
job opportunities
funding opportunities

2.3 Reproducibility pays off in the long run

During this book, you will learn about project management, folder structure, version control, docker environments, coding practices and more. At first, all these tools intend to improve your reproducibility but may slow down your current research process. In the long run, you will save a lot of time once you get used to the proposed working process. Maybe you can remember your first research project during your undergraduate program. Do you still have access to it? Can you reproduce what you have done there? Do you even understand what you have done there? In your research career (and also outside of research) you will have to deal with much larger projects. It is of great importance to keep your projects as tidy as possible - for your colleagues and collaborators and for future you.

2.4 Extent of reproducibility

As you will learn throughout this book, reproducibility is not a binary construct. There is no checklist waiting for you to tick off the open bullet points. It’s not that you either do or don’t do reproducible research. You can think of reproducibility more as a scale (see Figure 2.2) and you are shifting your reproducibility upwards by applying the tools you learn in this online resource.

Figure 2.2: “Reproducibility Scale” by Heidi Seibold and Rabea Müller and The Digital Research Academy Community and The BERD Academy (License: CC-BY 4.0; reused without modifications)

So let’s get started to improve your reproducibility.

2.5 Resources

If you want to dive deeper in the discussion of the value of reproducibility and Open Science and how to change behavior, here are some resources: