Science as distributed open-source
knowledge development

Digital Total at University of Hamburg

Slides | Source

License: CC BY 4.0 DOI

October 10, 2023

About

Me

🔬 Position: I am a Postdoctoral Research Scientist in the Research Group “Mechanisms of Learning & Change” at the Institute of Psychology at the University of Hamburg

🎓 Education: BSc Psychology & MSc Cognitive Neuroscience (Technische Universität Dresden), PhD Cognitive Neuroscience (Freie Universität Berlin)

đź”— Contact: You can connect with me via email, Twitter, Mastodon, GitHub or LinkedIn

ℹ️ Info: Find out more about my work on my website, Google Scholar and ORCiD

This presentation

đź’» Slides: https://lennartwittkuhn.com/digital-total

Source: https://github.com/lnnrtwttkhn/digital-total

📦 Software: Open & reproducible slides built with Quarto and deployed to GitHub Pages using GitHub Actions

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Contact: I am happy for any feedback or suggestions in person at this event, via email or GitHub Issues. Thanks!

Agenda

  1. Digital Research & Digital Teaching
  2. Science as distributed open-source knowledge development

Research on “Mechanisms of Learning and Change”

How does the brain use past experience to guide future decisions?


taken from Lake et al. (2016): “Building machines that learn and think like people”

Digital Research

Castellum: Digital, privacy-compliant participant management system

Faculty of Psychology & Human Movement Science

“We optimize the digital recruitment of study participants. We have access to a broad participant database through which participants can be recruited according to specific criteria.” (see Digital Strategy, 2022)

  • free to use, digital, open-source
  • GDPR-compliant data protection & security
  • developed at the Max Planck Society
  • growing international user community

Castellum UHH Task Force

MRI Total: Transparent and reproducible MRI data processing

1. Neuroimaging Data Collection

2. Standardization of human neuroimaging data

Brain Imaging Data Structure (BIDS, Gorgolewski et al., 2016)

3. Automated MRI data quality control & processing

  • High-performance computing (using Hummel)
  • Distributed data management (using DataLad)
  • Data storage on UHH’s Object Storage and RDR
  • Containerized computational environments

Digital Teaching

Teaching: Reproducible & FAIR open educational resources (OERs)

Source: National Library of Medicine (NIH), see also Wilkinson et al. (2016)

Source: Wiegers & Gelder (2019) (License: CC BY 4.0) via Garcia et al. (2020)
  • Findable / Accessible: Ensure long-term preservation and get a persistent identifier (e.g., DOI) via data repositories, journal articles or OER registries
  • Interoperable: Use plain-text formats (e.g., Markdown) or commonly used formats (e.g., PowerPoint)
  • Reusable: Add documentation, metadata and share under an open license (e.g., Creative Commons licenses)

See Plomp & Wittkuhn (2023) for an approach using Quarto & Git (Slides)

Digital Literacy: A course on “Version Control of Code and Data”

Summary: A hands-on seminar about version control of code and data using Git with curated online materials, interactive discussions, quizzes and exercises, targeted at (aspiring) researchers in Psychology & Neuroscience

Why we need version control …

“notFinal.doc” by Jorge Cham (phdcomics.com)

What is version control?

“Version control is a systematic approach to record changes in a set of files, over time. This allows you and your collaborators to track the history, see what changed, and recall specific versions.” (Turing Way)

Science as distributed open-source
knowledge development

Computational Reproducibility

“… when the same analysis steps performed on the same dataset consistently produce the same answer.” 1

by Scriberia for The Turing Way Community (2022) (Link, CC BY 4.0)

The problem

  • about more than half of research is not reproducible 2
    • research data, code, software and materials are often not available “upon reasonable [sic] request”
    • if resources are shared, they are often incomplete
  • 90% of researchers: “reproducibility crisis” (N = 1576) 3

Why?

  • computational reproducibility is hard
  • researchers lack training
  • incentives are not (yet) aligned 4
  • “natural selection of bad science” 5

“… accumulated evidence indicates that there is substantial room for improvement with regard to research practices to maximize the efficiency of the research community’s use of the public’s financial investment.” (Munafò et al., 2017)

We need a professional toolkit for digital scientific outputs!

Science as distributed open-source knowledge development 6

How can we do better science?

The long-term challenges are largely non-technical

  • open-source, avoiding commercial vendor lock-in
  • adopting new practices and upgrading workflows
  • moving towards a “culture of reproducibility” 7
  • changing incentives, policies & funding schemes

Technical solutions already exist!

  • Version control of digital research outputs (e.g., Git, DataLad)
  • Integration with flexible infrastructure (e.g., GitLab)
  • Systematic contributions & review (e.g., pull / merge requests)
  • Automated integration & deployment (e.g., CI/CD)
  • Reproducible computational environments (e.g., Docker)
  • Transparent execution and build systems (e.g., GNU Make)
  • Project communication next to code & data (e.g., Issues)

Summary

Digital Research

  1. Castellum: We are setting up a digital, privacy-compliant participant management system
  2. MRI Total: We are working on an automated and reproducible pipeline for MRI data processing & quality control

Digital Teaching

  1. FAIR & Reproducible Teaching: We are developing workflows for FAIR, reproducible & open educational resources
  2. Digital & Data Literacy: We are teaching version control to the next generation of researchers

Science as distributed open-source knowledge development

  1. We need a professional toolkit for digital research outputs, inspired by open-source software development

References

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 452–454. https://doi.org/10.1038/533452a
Crüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger, S. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S., Zaneva, M., & Brown, N. J. L. (2023). What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in One Issue of Psychological Science. Psychological Science, 34(4), 512–522. https://doi.org/10.1177/09567976221140828
Esteban, O., Birman, D., Schaer, M., Koyejo, O. O., Poldrack, R. A., & Gorgolewski, K. J. (2017). MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites. PLOS ONE, 12(9), e0184661. https://doi.org/10.1371/journal.pone.0184661
Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., Kent, J. D., Goncalves, M., DuPre, E., Snyder, M., Oya, H., Ghosh, S. S., Wright, J., Durnez, J., Poldrack, R. A., & Gorgolewski, K. J. (2018). fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature Methods, 16(1), 111–116. https://doi.org/10.1038/s41592-018-0235-4
Garcia, L., Batut, B., Burke, M. L., Kuzak, M., Psomopoulos, F., Arcila, R., Attwood, T. K., Beard, N., Carvalho-Silva, D., Dimopoulos, A. C., Angel, V. D. del, Dumontier, M., Gurwitz, K. T., Krause, R., McQuilton, P., Le Pera, L., Morgan, S. L., Rauste, P., Via, A., … Palagi, P. M. (2020). Ten simple rules for making training materials FAIR. PLOS Computational Biology, 16(5), e1007854. https://doi.org/10.1371/journal.pcbi.1007854
Gorgolewski, K. J., Auer, T., Calhoun, V. D., Craddock, R. C., Das, S., Duff, E. P., Flandin, G., Ghosh, S. S., Glatard, T., Halchenko, Y. O., Handwerker, D. A., Hanke, M., Keator, D., Li, X., Michael, Z., Maumet, C., Nichols, B. N., Nichols, T. E., Pellman, J., … Poldrack, R. A. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1). https://doi.org/10.1038/sdata.2016.44
Hardwicke, T. E., Bohn, M., MacDonald, K., Hembacher, E., Nuijten, M. B., Peloquin, B. N., deMayo, B. E., Long, B., Yoon, E. J., & Frank, M. C. (2021). Analytic reproducibility in articles receiving open data badges at the journal Psychological Science : an observational study. Royal Society Open Science, 8(1). https://doi.org/10.1098/rsos.201494
Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2016). Building machines that learn and think like people. Behavioral and Brain Sciences, 40. https://doi.org/10.1017/s0140525x16001837
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1). https://doi.org/10.1038/s41562-016-0021
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of Open Data and Computational Reproducibility in Registered Reports in Psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872
Plomp, E., & Wittkuhn, L. (2023). Reproducible and FAIR teaching materials. Zenodo. https://doi.org/10.5281/ZENODO.8296951
Poldrack, R. A. (2019). The Costs of Reproducibility. Neuron, 101(1), 11–14. https://doi.org/10.1016/j.neuron.2018.11.030
Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384. https://doi.org/10.1098/rsos.160384
The Turing Way Community. (2022). The turing way: A handbook for reproducible, ethical and collaborative research. Zenodo. https://doi.org/10.5281/zenodo.3233853
Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. https://doi.org/10.1037/0003-066x.61.7.726
Wiegers, L., & Gelder, C. W. G. van. (2019). Illustration for "ten simple rules for making training materials FAIR". https://doi.org/10.5281/ZENODO.3593258
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Silva Santos, L. B. da, Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1). https://doi.org/10.1038/sdata.2016.18

Thank you!

đź’» Slides: https://lennartwittkuhn.com/digital-total

Source: https://github.com/lnnrtwttkhn/digital-total

📦 Software: Reproducible slides build with Quarto and deployed to GitHub Pages using GitHub Actions (details in the Quarto docs)

🖲️ DOI: 10.25592/uhhfdm.13467

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Feedback: In person at this event, via email or GitHub Issues

Reproducibility Crisis

N = 1576; Baker (2016), Nature

Footnotes

  1. The Turing Way Community (2022), see “Guide on Reproducible Research”

  2. for example, in Psychology: CrĂĽwell et al. (2023); Hardwicke et al. (2021); Obels et al. (2020); Wicherts et al. (2006)

  3. see Baker (2016), Nature

  4. see e.g., Poldrack (2019)

  5. see Smaldino & McElreath (2016)

  6. inspired by Richard McElreath’s “Science as Amateur Software Development” (2023)

  7. see “Towards a culture of computational reproducibility” by Russ Poldrack, Stanford University