Project & Data Organization

Research Data Management for Psychology and Neuroscience
Course at University of Hamburg, RTG 2753: Emotional Learning and Memory
Slides | Source

Dr. Lennart Wittkuhn

lennart.wittkuhn@tutanota.com

10:00

Schedule

Day	Date	Time	Title
1	2026-02-05	09:30 - 10:00	Welcome and Introduction to Research Data Management
1	2026-02-05	10:00 - 11:00	Project & Data Organization
1	2026-02-05	11:00 - 12:00	Data Management Plans (DMPs)
1	2026-02-05	12:00 - 13:00	Lunch Break
1	2026-02-05	13:00 - 14:00	Command Line
1	2026-02-05	14:00 - 15:00	Best practices for rectangular data
1	2026-02-05	15:00 - 16:30	Brain Imaging Data Structure (BIDS)

This session: Project & Data Organization

Objectives

💡 You understand the importance of well-structured data organization for research.
💡 You can design logical and intuitive folder structures.
💡 You can apply file naming best practices and unique identifiers.
💡 You understand ISO 8601 timestamps and proper sorting methods.
💡 You can choose appropriate file formats for preservation.
💡 You can implement effective document versioning strategies.
💡 You understand ASCII/UTF-8 encoding advantages for text files.
💡 You can identify and solve common file organization problems.

1 Folder structure

Research data organization basics

FAIR principles:

Clear folder structure makes data findable and interoperable
Plan your structure beforehand to avoid later renaming
Consistency is key for data reuse

Documentation matters:

Make conventions intuitive
Document everything in a README file
Think of your future self: “Why did I do this?”

Folder structure

Planning your folder structure

Before you start:

Decide how to arrange files and folders early
Consider your data and documentation structure

Common trade-offs:

Files per folder vs. folder depth
Intuitive names vs. strict conventions
Structure by processing level, access, or file size

Folder Structure Best Practices

Avoid deeply nested folders (SubSubSubSubFolders)
Limit files per folder for easy browsing
Well-named files (e.g., Image003.tif) can handle more items
Thousands of files slow down file explorers

Keep File Paths Short

Operating systems limit path length (e.g., 255 characters)
Long paths can cause sync and backup errors
Copies may have even longer paths

Examples:

❌ X:/Projects/Microscopy_Project/Microscopy_Projects_2024/October_2024/RawData_October2024/Microscopy_RawData_Image003.tif
✅ X:/Projects/Microscopy/2024-10/RawData/Image003.tif

Consider access control

Structure folders by team member permissions
Plan for different access levels

Examples

Organized by file type

Dataset
├── DataTables
│   ├── 1_Raw
│   └── 2_Processed
└── Figures
    ├── Figure1.tif
    ├── Figure2a.tif
    └── Figure2b.tif

Adapted from https://datadryad.org/stash/best_practices#organize

Organized by analysis

Dataset
├── Figure1
│   ├── 1_Raw
│   ├── 2_Processed
│   └── Figure1.tif
└── Figure2
    ├── 1_Raw
    ├── 2_Processed
    ├── Figure2a.tif
    └── Figure2b.tif

Adapted from https://datadryad.org/stash/best_practices#organize

Project folder structure

Project_Folder
├── 1_Project_Management
│   ├── Finance
│   ├── Proposals
│   └── Reports
├── 2_Ethics_and_Governance
│   ├── Consent_Forms
│   └── Ethical_Approvals
├── 3_Dissemination
│   ├── Presentations
│   ├── Publications
│   └── Publicity
└── Experiment_01
    ├── Data
    ├── Data_Analysis
    ├── Inputs
    └── Outputs

Adapted from Suse Prejawa (2021, https://hdl.handle.net/21.11116/0000-0008-662A-7)

2 File and folder names

Purpose of File Names

Good file names serve three purposes:

Always: Uniquely identify files within a folder
Often: Describe content clearly: README.txt, MeetingProtocol.docx, Temperature_RawData.tab
Sometimes: Enable logical sorting: 1_RawData, 2_PreProcessed, 3_Processed, 4_Combined

Naming Principles

Key guidelines:

Same rules for folders and files (except file extensions)
Make names concise and intuitive
Help users choose the right file quickly

When names aren’t clear:

Avoid cryptic names like XYZ123
Document your naming convention in a README file
Explain the logic behind your choices

Characters to Avoid in File Names

Never use:

Non-ASCII characters: öäüßµαδ°±•€→☺É
Whitespace: File 1.txt (causes batch processing issues)
Windows forbidden: \/:*?"<>|
Problematic symbols: ,;()[]{} etc.

Safe characters only:

Letters: A-Z, a-z
Numbers: 0-9
Symbols: _ - .

File Extension and Naming Rules

File extensions:

Use dots only before file extensions: Notes.txt
Avoid starting with dots or underscores: .git, _quarto.yml
These are reserved for special system files

Unique names:

Make all names unique within a folder
Avoid case-only differences: hello.txt vs Hello.txt
Prevents cross-platform issues (Linux vs Windows)

Ordering and Numbers

Use leading zeros for proper sorting:

✅ With leading zeros: Scan01.csv, Scan02.csv, Scan03.csv, Scan10.csv, Scan11.csv

❌ Without leading zeros: Scan1.csv, Scan10.csv, Scan11.csv, Scan2.csv, Scan3.csv

Result: Files sort in logical order, not alphabetical chaos!

Date Formats (ISO 8601)

Format: Year-Month-Day (biggest to smallest)

Examples:

❌ 13Jan2024, 21April2021, 03122025
✅ 2021-04-21, 2024-01-13, 2025-12-03
✅ 20210421, 20240113, 20251203

With time:

20210421T0345 (April 21, 2021 at 03:45)
20240113T1730 (January 13, 2024 at 17:30)

File naming best practices

Include relevant info, but:

Don’t make file names your metadata storage
Keep names under 32 characters
Remember total path length limits

Stability matters:

Avoid renaming files once shared
Others may have referenced the file path

Document everything:

Create a README file explaining your naming system
Include acronym meanings and organization logic
Store README at the top level

Naming convention styles

Choose one style and stick to it:

Kebab-case: The-quick-brown-fox.txt
CamelCase: TheQuickBrownFox.txt
Snake_case: The_quick_brown_fox.txt

❌ Avoid spaces: The quick brown fox.txt

Naming trade-offs and versioning

Common compromises:

Informative vs. short names
Specific vs. flexible folder names
Names may become outdated over time

File versioning options:

Manual: Use naming conventions
Automated: Use version control systems (Git)
Goal: Track changes and enable rollbacks

Manual versioning best practices

Numbering systems:

Consecutive: Handbook_v3.pdf
Major vs. minor: v1-1, v1-2 or 1a, 1b
Date-based: Handbook_v20240725.pdf

Useful qualifiers:

✅ raw, processed, draft, internal
❌ Avoid “final” (leads to final_final_really_final.docx)

Document your system:

Explain your versioning convention
Track essential changes between versions

3 File formats

Choosing file formats

Why format matters:

File extensions (.txt, .csv) define data structure
Good formats keep data interoperable
Enable reading with multiple software tools

Key requirements:

Clear, documented structure
Publicly available specifications
Future-proof for long-term preservation

Rule of thumb: Choose open formats, avoid proprietary ones

Format selection criteria

Ideal properties:

Human-readable with simple text editors
Compatible with multiple programs
Easy to understand and use
Small file size and good performance

Reality check:

Trade-offs are common
Binary files: better performance, harder to read
CSV files: worse performance, better for preservation

Avoid proprietary formats:

Often lack proper documentation
May require expensive commercial software
Risk becoming unreadable if company disappears
May contain hidden sensitive information

Recommended file formats

Documentation:

Plain text (.txt), HTML, Markdown
PDF (PDF/A-1 for archiving)
Maybe: Rich Text Format (.rtf), OpenDocument (.odt), .docx

Tabular data:

Comma-separated values (.csv)
Tab-delimited (.tab)
Maybe: OpenDocument Spreadsheet (.ods), .xlsx

Structured data:

JSON, XML
NetCDF, HDF5
Images: PNG, JPG

Format notes

PDF considerations:

Originally proprietary (Adobe), now widely used
Use PDF/A for archival purposes
Great for fixed documentation
Difficult to edit or extract data from

Spreadsheet files:

Colorful formatting looks nice but causes problems
Don’t store important info in formatting alone
Rule of thumb: .xlsx and .ods files aren’t machine-readable
Use .csv for data exchange instead

ASCII: the gold standard

Why ASCII rocks:

One byte = one visible character
Readable by any text editor or software
Works in Excel, Word, browsers (size permitting)
Maximum compatibility across systems

ASCII characters only:

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz{|}~`

Modern alternative: UTF-8

Supports international characters: ü, €, ☺
ASCII files are automatically valid UTF-8
Best of both worlds!

4 Exercises

Exercise 1

Exercise 1: Project template comparison

Task: Compare the following project templates and discuss advantages and disadvantages of each approach.

Turing Way template

.
├── LICENSE
├── README.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── data
│   ├── processed      # Final, canonical datasets
│   └── raw            # Original, immutable data
├── docs               # Sphinx documentation
├── models             # Trained models, predictions, summaries  
├── notebooks          # Jupyter notebooks (numbered)
├── reports            # Generated analysis (HTML, PDF, LaTeX)
│   └── figures        # Generated graphics and figures
├── project_management # Meeting notes, planning resources
└── src                # Source code
    ├── data           # Scripts to download/generate data
    ├── models         # Scripts to train models
    └── visualisation  # Scripts for visualizations

Repository Structure Template by The Turing Way. Used under the LICENSE CC-BY 4.0. Reused without any modifications.

Heidi Seibold template

.
├── README.md
├── analysis            # All things data analysis
│   └── src             # Functions and source files
├── comm
│   ├── internal_comm   # Internal communication, meeting notes
│   └── journal_comm    # Communication with journal, peer review
├── data
│   ├── data_clean      # Clean version of data
│   └── data_raw        # Raw data (don't touch)
├── dissemination
│   ├── manuscripts
│   ├── posters
│   └── presentations
├── documentation       # Data management plan, etc.
└── misc                # Miscellaneous files

Research Project Template by Heidi Seibold. No license use specified. For source code, see here. Reused without modifications.

`analysistemplates` template

.
├── 01_Data
│   ├── 01_Raw
│   └── 02_Clean
├── 02_Analysis
│   ├── 01_Scripts
│   ├── 02_Results  
│   ├── 03_Figures
│   └── 04_Tables
├── 03_Manuscript
│   ├── 01_Text
│   └── 02_Final_figures
├── 04_Presentation
├── 05_Misc
├── 06_Analysis_for_publication    # Optional
├── README.md
├── .gitignore                     # Optional
└── renv                           # Optional

analysis template packages by Jonas Hagenbeck. Used under the MIT License. Reused without any modifications.

WORCS template

File/Folder	Description	Usage
_pkgdown.yml	YAML for package website	do not edit
DESCRIPTION	R-package DESCRIPTION	do not edit
LICENSE.md	Project license	do not edit
README.md	Read this file to get started!	do not edit
README.Rmd	R-markdown source for readme.md	human editable
docs/	Package website	machine-written
paper/	WORCS paper source files	human editable
R/	R-package source code	human editable
vignettes/	R-package vignettes	human editable

WORCS project structure by Van Lissa et al. (2021). Used under the GNU General Public License. No changes were made.