After the data are collected, a researcher's responsibility to be transparent and to document actions and decisions does not stop. After data collection, data need to be stored, processed, and analyzed using statistics. Together, this is referred to as data management. Once the data are collected, they need to be recorded. Measurements are stored in a computer file, usually in a data matrix where columns represent variables, and rows represent participants. Such a data file is useless if it's unclear what the recorded data represent. If we have a column representing sex for example, but we don't know whether a 0 means female or male, the information is useless. Information or metadata about what the variables mean is stored in a code book. The code book specifies what property each variable measures, and what the values mean, what the range of possible values is, and what values are used to denote if participant did not supply relevant information, referred to as missing values. Suppose we collect responses to ten items forming a depression questionnaire with three answer options. To correctly interpret these scores, we need to know that for each item, the minimum possible value is 1. The maximum value is 3, and we denote a missing response with the value 9. Without this information, we would not realize that something went wrong if we find a score of 5. We could have made, for example, an error entering the raw data into the computer. Because data entry errors are always made, it's extremely important to always save the original data. With the original data, I mean the paper questionnaires filled out by participants, or the video material used for observational coding. Without the original data, we cannot check whether the data were entered into the file correctly. When data are entered manually, it's always a good idea to let someone else enter a random selection of the data again and check for consistency. If it turns out the original entry is inconsistent, then all the data need to be entered again but more carefully of course. The original data and instrument information are also necessary to check the code book. Sometimes a code book can be confusing or seem to be wrong. For example, when responses to an item are unexpectedly low, this could be a valid pattern, but it could also be an error in the code book. It's possible for example the code book wrongly indicates an item is positively worded, when in fact, it's phrased negatively and therefore should be recoded. The original data file is also very important and should be stored separately. With the original data file, I mean the file that contains the raw data as they were entered originally before any manipulation was performed. Data manipulation refers to computations such as recoding and computing aggregate scores, like a sum score of depression. Another example is calculating age from date of birth. Without the original data file, we cannot check whether we made any errors in manipulating the data. Suppose that for a negatively worded depression item, I change the score of 1 to a score of 3, and 3 to 1. And then I accidentally recode again, changing 3s back into 1s, and 1s into 3s. I end up with negatively scored items without being aware of this, thinking they're scored positively. If I find unexpected results, I can simply check if I made a recoding error by comparing against the original data file. That's why it's important. Not only should we record the original data and data file, we should also record any processing, selection or computations we perform. Otherwise, we might not be able to reproduce processed data that are used to formulate the final conclusions. For example, when I select a subset of my sample, say, only people who completed the depression questionnaire within a month, then my results might be different from results obtained from the entire sample. If I don't record my selection criteria, then in a year from now, I will probably have forgotten the exact criteria and will not be able to reproduce my own results. Both the processing of data, for example recoding and computing some scores, selection of data and the statistical analyses are generally recorded in a syntax file. The syntax file is like a simple programming file that can be used to reproduce all the computations and statistical analyses at the push of a button. This is very useful for checking and replicating results, not just for other researchers, but also for the original researcher.