Overview
Teaching: 10 min
Exercises: 0 minQuestionsObjectives
- How should we organize files in a research project?
- Get overview on how to organize research projects
A project directory can look something like this:
project_name/
├── README.md # overview of the project
├── data/ # data files used in the project
│ ├── README.md # describes where data came from
│ └── sub-folder/ # may contain subdirectories
├── processed_data/ # intermediate files from the analysis
├── manuscript/ # manuscript describing the results
├── results/ # results of the analysis (data, tables, figures)
├── src/ # contains all code in the project
│ ├── LICENSE # license for your code
│ ├── requirements.txt # software requirements and dependencies
│ └── ...
└── doc/ # documentation for your project
├── index.rst
└── ...
src/
or source/
directorydata/
.gitignore
processed_data/
$ git tag -a thesis-submitted -m "this is the submitted version of my thesis"
Discussion: How do you collaborate on writing academic papers?
- Are you using version control?
- How do you handle collaborative issues?
- How would you like it to work if you could decide?
Word count - an example project
Let’s look at an example project which follows the project structure guidelines given above.
- Since we’ll continue working with this repo, import it to your GitHub namespace by clicking “Use this template”. This generates a fresh repository from a template.
This project is about counting the frequency distribution of words in a given text, plotting results and testing Zipf’s law. We have subdirectories for raw data, source files, documentation, processsed data and results, and README and LICENSE files.
- What are the
requirements.txt
,Dockerfile
, andSnakefile
files for?- Do you think this project is reproducible?
Key Points
An organized project directory structure can help with reproducibility.