Project & Data Organization

Research Data Management for Psychology and Neuroscience
Course at University of Hamburg, RTG 2753: Emotional Learning and Memory
Slides | Source
License: CC BY 4.0

10:00

Schedule

Day Date Time Title
1 2026-02-05 09:30 - 10:00 Welcome and Introduction to Research Data Management
1 2026-02-05 10:00 - 11:00 Project & Data Organization
1 2026-02-05 11:00 - 12:00 Data Management Plans (DMPs)
1 2026-02-05 12:00 - 13:00 Lunch Break
1 2026-02-05 13:00 - 14:00 Command Line
1 2026-02-05 14:00 - 15:00 Best practices for rectangular data
1 2026-02-05 15:00 - 16:30 Brain Imaging Data Structure (BIDS)

This session: Project & Data Organization

Objectives

πŸ’‘ You understand the importance of well-structured data organization for research.
πŸ’‘ You can design logical and intuitive folder structures.
πŸ’‘ You can apply file naming best practices and unique identifiers.
πŸ’‘ You understand ISO 8601 timestamps and proper sorting methods.
πŸ’‘ You can choose appropriate file formats for preservation.
πŸ’‘ You can implement effective document versioning strategies.
πŸ’‘ You understand ASCII/UTF-8 encoding advantages for text files.
πŸ’‘ You can identify and solve common file organization problems.

1 Folder structure

Research data organization basics

FAIR principles:

  • Clear folder structure makes data findable and interoperable
  • Plan your structure beforehand to avoid later renaming
  • Consistency is key for data reuse

Documentation matters:

  • Make conventions intuitive
  • Document everything in a README file
  • Think of your future self: β€œWhy did I do this?”

Folder structure

Planning your folder structure

Before you start:

  • Decide how to arrange files and folders early
  • Consider your data and documentation structure

Common trade-offs:

  • Files per folder vs. folder depth
  • Intuitive names vs. strict conventions
  • Structure by processing level, access, or file size

Folder Structure Best Practices

  • Avoid deeply nested folders (SubSubSubSubFolders)
  • Limit files per folder for easy browsing
  • Well-named files (e.g., Image003.tif) can handle more items
  • Thousands of files slow down file explorers

Keep File Paths Short

  • Operating systems limit path length (e.g., 255 characters)
  • Long paths can cause sync and backup errors
  • Copies may have even longer paths

Examples:

  • ❌ X:/Projects/Microscopy_Project/Microscopy_Projects_2024/October_2024/RawData_October2024/Microscopy_RawData_Image003.tif
  • βœ… X:/Projects/Microscopy/2024-10/RawData/Image003.tif

Consider access control

  • Structure folders by team member permissions
  • Plan for different access levels

Examples

Organized by file type

Dataset
β”œβ”€β”€ DataTables
β”‚   β”œβ”€β”€ 1_Raw
β”‚   └── 2_Processed
└── Figures
    β”œβ”€β”€ Figure1.tif
    β”œβ”€β”€ Figure2a.tif
    └── Figure2b.tif

Adapted from https://datadryad.org/stash/best_practices#organize

Organized by analysis

Dataset
β”œβ”€β”€ Figure1
β”‚   β”œβ”€β”€ 1_Raw
β”‚   β”œβ”€β”€ 2_Processed
β”‚   └── Figure1.tif
└── Figure2
    β”œβ”€β”€ 1_Raw
    β”œβ”€β”€ 2_Processed
    β”œβ”€β”€ Figure2a.tif
    └── Figure2b.tif

Adapted from https://datadryad.org/stash/best_practices#organize

Project folder structure

Project_Folder
β”œβ”€β”€ 1_Project_Management
β”‚   β”œβ”€β”€ Finance
β”‚   β”œβ”€β”€ Proposals
β”‚   └── Reports
β”œβ”€β”€ 2_Ethics_and_Governance
β”‚   β”œβ”€β”€ Consent_Forms
β”‚   └── Ethical_Approvals
β”œβ”€β”€ 3_Dissemination
β”‚   β”œβ”€β”€ Presentations
β”‚   β”œβ”€β”€ Publications
β”‚   └── Publicity
└── Experiment_01
    β”œβ”€β”€ Data
    β”œβ”€β”€ Data_Analysis
    β”œβ”€β”€ Inputs
    └── Outputs

Adapted from Suse Prejawa (2021, https://hdl.handle.net/21.11116/0000-0008-662A-7)

2 File and folder names

Purpose of File Names

Good file names serve three purposes:

  1. Always: Uniquely identify files within a folder
  2. Often: Describe content clearly: README.txt, MeetingProtocol.docx, Temperature_RawData.tab
  3. Sometimes: Enable logical sorting: 1_RawData, 2_PreProcessed, 3_Processed, 4_Combined

Naming Principles

Key guidelines:

  • Same rules for folders and files (except file extensions)
  • Make names concise and intuitive
  • Help users choose the right file quickly

When names aren’t clear:

  • Avoid cryptic names like XYZ123
  • Document your naming convention in a README file
  • Explain the logic behind your choices

Characters to Avoid in File Names

Never use:

  • Non-ASCII characters: ΓΆΓ€ΓΌΓŸΒ΅Ξ±Ξ΄Β°Β±β€’β‚¬β†’β˜ΊΓ‰
  • Whitespace: File 1.txt (causes batch processing issues)
  • Windows forbidden: \/:*?"<>|
  • Problematic symbols: ,;()[]{} etc.

Safe characters only:

  • Letters: A-Z, a-z
  • Numbers: 0-9
  • Symbols: _ - .

File Extension and Naming Rules

File extensions:

  • Use dots only before file extensions: Notes.txt
  • Avoid starting with dots or underscores: .git, _quarto.yml
  • These are reserved for special system files

Unique names:

  • Make all names unique within a folder
  • Avoid case-only differences: hello.txt vs Hello.txt
  • Prevents cross-platform issues (Linux vs Windows)

Ordering and Numbers

Use leading zeros for proper sorting:

βœ… With leading zeros: Scan01.csv, Scan02.csv, Scan03.csv, Scan10.csv, Scan11.csv

❌ Without leading zeros: Scan1.csv, Scan10.csv, Scan11.csv, Scan2.csv, Scan3.csv

Result: Files sort in logical order, not alphabetical chaos!

Date Formats (ISO 8601)

Format: Year-Month-Day (biggest to smallest)

Examples:

  • ❌ 13Jan2024, 21April2021, 03122025
  • βœ… 2021-04-21, 2024-01-13, 2025-12-03
  • βœ… 20210421, 20240113, 20251203

With time:

  • 20210421T0345 (April 21, 2021 at 03:45)
  • 20240113T1730 (January 13, 2024 at 17:30)

File naming best practices

Include relevant info, but:

  • Don’t make file names your metadata storage
  • Keep names under 32 characters
  • Remember total path length limits

Stability matters:

  • Avoid renaming files once shared
  • Others may have referenced the file path

Document everything:

  • Create a README file explaining your naming system
  • Include acronym meanings and organization logic
  • Store README at the top level

Naming convention styles

Choose one style and stick to it:

  • Kebab-case: The-quick-brown-fox.txt
  • CamelCase: TheQuickBrownFox.txt
  • Snake_case: The_quick_brown_fox.txt

❌ Avoid spaces: The quick brown fox.txt

Naming trade-offs and versioning

Common compromises:

  • Informative vs. short names
  • Specific vs. flexible folder names
  • Names may become outdated over time

File versioning options:

  • Manual: Use naming conventions
  • Automated: Use version control systems (Git)
  • Goal: Track changes and enable rollbacks

Manual versioning best practices

Numbering systems:

  • Consecutive: Handbook_v3.pdf
  • Major vs. minor: v1-1, v1-2 or 1a, 1b
  • Date-based: Handbook_v20240725.pdf

Useful qualifiers:

  • βœ… raw, processed, draft, internal
  • ❌ Avoid β€œfinal” (leads to final_final_really_final.docx)

Document your system:

  • Explain your versioning convention
  • Track essential changes between versions

3 File formats

Choosing file formats

Why format matters:

  • File extensions (.txt, .csv) define data structure
  • Good formats keep data interoperable
  • Enable reading with multiple software tools

Key requirements:

  • Clear, documented structure
  • Publicly available specifications
  • Future-proof for long-term preservation

Rule of thumb: Choose open formats, avoid proprietary ones

Format selection criteria

Ideal properties:

  • Human-readable with simple text editors
  • Compatible with multiple programs
  • Easy to understand and use
  • Small file size and good performance

Reality check:

  • Trade-offs are common
  • Binary files: better performance, harder to read
  • CSV files: worse performance, better for preservation

Avoid proprietary formats:

  • Often lack proper documentation
  • May require expensive commercial software
  • Risk becoming unreadable if company disappears
  • May contain hidden sensitive information

Format notes

PDF considerations:

  • Originally proprietary (Adobe), now widely used
  • Use PDF/A for archival purposes
  • Great for fixed documentation
  • Difficult to edit or extract data from

Spreadsheet files:

  • Colorful formatting looks nice but causes problems
  • Don’t store important info in formatting alone
  • Rule of thumb: .xlsx and .ods files aren’t machine-readable
  • Use .csv for data exchange instead

ASCII: the gold standard

Why ASCII rocks:

  • One byte = one visible character
  • Readable by any text editor or software
  • Works in Excel, Word, browsers (size permitting)
  • Maximum compatibility across systems

ASCII characters only:

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz{|}~`

Modern alternative: UTF-8

  • Supports international characters: ΓΌ, €, ☺
  • ASCII files are automatically valid UTF-8
  • Best of both worlds!

4 Exercises

Exercise 1

Exercise 1: Project template comparison

Task: Compare the following project templates and discuss advantages and disadvantages of each approach.

Turing Way template

.
β”œβ”€β”€ LICENSE
β”œβ”€β”€ README.md
β”œβ”€β”€ CODE_OF_CONDUCT.md
β”œβ”€β”€ CONTRIBUTING.md
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ processed      # Final, canonical datasets
β”‚   └── raw            # Original, immutable data
β”œβ”€β”€ docs               # Sphinx documentation
β”œβ”€β”€ models             # Trained models, predictions, summaries  
β”œβ”€β”€ notebooks          # Jupyter notebooks (numbered)
β”œβ”€β”€ reports            # Generated analysis (HTML, PDF, LaTeX)
β”‚   └── figures        # Generated graphics and figures
β”œβ”€β”€ project_management # Meeting notes, planning resources
└── src                # Source code
    β”œβ”€β”€ data           # Scripts to download/generate data
    β”œβ”€β”€ models         # Scripts to train models
    └── visualisation  # Scripts for visualizations

Repository Structure Template by The Turing Way. Used under the LICENSE CC-BY 4.0. Reused without any modifications.

Heidi Seibold template

.
β”œβ”€β”€ README.md
β”œβ”€β”€ analysis            # All things data analysis
β”‚   └── src             # Functions and source files
β”œβ”€β”€ comm
β”‚   β”œβ”€β”€ internal_comm   # Internal communication, meeting notes
β”‚   └── journal_comm    # Communication with journal, peer review
β”œβ”€β”€ data
β”‚   β”œβ”€β”€ data_clean      # Clean version of data
β”‚   └── data_raw        # Raw data (don't touch)
β”œβ”€β”€ dissemination
β”‚   β”œβ”€β”€ manuscripts
β”‚   β”œβ”€β”€ posters
β”‚   └── presentations
β”œβ”€β”€ documentation       # Data management plan, etc.
└── misc                # Miscellaneous files

Research Project Template by Heidi Seibold. No license use specified. For source code, see here. Reused without modifications.

analysistemplates template

.
β”œβ”€β”€ 01_Data
β”‚   β”œβ”€β”€ 01_Raw
β”‚   └── 02_Clean
β”œβ”€β”€ 02_Analysis
β”‚   β”œβ”€β”€ 01_Scripts
β”‚   β”œβ”€β”€ 02_Results  
β”‚   β”œβ”€β”€ 03_Figures
β”‚   └── 04_Tables
β”œβ”€β”€ 03_Manuscript
β”‚   β”œβ”€β”€ 01_Text
β”‚   └── 02_Final_figures
β”œβ”€β”€ 04_Presentation
β”œβ”€β”€ 05_Misc
β”œβ”€β”€ 06_Analysis_for_publication    # Optional
β”œβ”€β”€ README.md
β”œβ”€β”€ .gitignore                     # Optional
└── renv                           # Optional

analysis template packages by Jonas Hagenbeck. Used under the MIT License. Reused without any modifications.

WORCS template

File/Folder Description Usage
_pkgdown.yml YAML for package website do not edit
DESCRIPTION R-package DESCRIPTION do not edit
LICENSE.md Project license do not edit
README.md Read this file to get started! do not edit
README.Rmd R-markdown source for readme.md human editable
docs/ Package website machine-written
paper/ WORCS paper source files human editable
R/ R-package source code human editable
vignettes/ R-package vignettes human editable

WORCS project structure by Van Lissa et al. (2021). Used under the GNU General Public License. No changes were made.

Exercise 2

Exercise 2: Design your project folder structure

Use command line tools for all tasks where applicable.

  1. Create a new directory called my-research-project in your home directory.
  2. Design and create a folder structure for your own or a fictional research project e.g., studying β€œEffects of Social Media on Sleep Patterns in College Students”. Consider:
    • Where will you store raw survey data?
    • Where will you keep processed analysis results?
    • Where will you organize your documentation?
    • Where will you store your analysis scripts?
  3. Create at least 5-6 folders that reflect good organization principles from the slides.
  4. Navigate through your structure and verify all directories exist.

Exercise 3

Exercise 3: File naming practice

Use command line tools for all tasks where applicable.

  1. In your project’s raw data folder, create the following files using good naming practices (remember: no spaces, use proper date formats, include leading zeros):
    • A survey data file from January 13, 2024
    • A survey data file from April 21, 2024
    • A survey data file from December 3, 2025
    • Sleep tracking data from participant 007
    • Sleep tracking data from participant 023
    • Sleep tracking data from participant 156
  2. List the files and verify they sort in logical order.
  3. Now create the bad versions of these filenames in a separate bad-examples folder:
    • Use spaces in names
    • Use inconsistent date formats
    • Avoid leading zeros
    • Use special characters
  4. Compare how the two folders look when listed.

Exercise 4

Exercise 4: README documentation

Use command line tools for all tasks where applicable.

  1. Create a README.md file in your main project directory.
  2. Document the following in your README:
    • Brief project description
    • Explanation of your folder structure
    • Your file naming convention rules
    • Data collection dates and methods
    • Contact information
  3. Include at least one example of your naming convention with explanation.

Exercise 5

Exercise 5: File format decisions

Use command line tools for all tasks where applicable.

  1. Create a documentation folder in your project.
  2. In this folder, create files representing different types of documentation using appropriate file formats:
    • A data dictionary/codebook
    • A research protocol document
    • A list of participant information
    • Analysis notes
  3. Use the recommended file formats from the slides (.txt, .csv, .md, etc.).
  4. Add a comment in each file explaining why you chose that format.

Exercise 6

Exercise 6: Versioning practice

Use command line tools for all tasks where applicable.

  1. Create a file called analysis-script.R in your scripts folder.
  2. Add some sample R code (even if basic) to the file.
  3. Create 3 versions of this file using proper versioning conventions:
    • An initial draft version
    • A revised version with minor changes
    • A major revision with significant updates
  4. Practice both numbering and date-based versioning approaches.
  5. Document your versioning system in a versioning-notes.txt file.

5 References