Organize, clean, and back up your collected / created data.
What to do:
- Clean your dataset.
- Make sure you have documented the description of each of the variables
- Ensure that you have documented all your decisions related to file naming and version control
- Aim to create a dataset you would be happy to find
Why do it:
- So you and other researchers might successfully analyze the dataset(s) later.
How to do it:
- Preserve a copy of the raw data before progressing with cleaning and validating
- Clean and validate the data. There are software options such as OpenRefine to help you with that
- Anonymize the data if and where necessary
- Ensure the variable names are clear. DDI is one international standard for data documentation, though there are many more
- Document your version control
- Maintain a reliable and consistent backup strategy. Consider the 3-2-1 rule:
- Have at least 3 copies of your data
- stored on 2 different media
- with 1 backup kept off site
Things to consider:
- How will you manage any ethical or privacy issues before analyzing the data?
- How will you securely store (potentially large and cleaned) data pre-and post-analysis?