archives for april 2011

In the last step, we downloaded all of our data and deposited into directories that store this source data, backed it up, and write-protected the files. Now that we have done all of that, it is time to start working with the data! There is only one problem: almost inevitably, the data do not come neat, tidy, and ready to use. Often, the data contain major problems and need to be constructed in order to be usable. In this installment, I will write about managing files for cleaning, constructing and storing datasets.

There are several varieties of how the data are not completely usable. For many items in the ...