posts tagged “workflow”

While writing scripts is one of the most important skills for reproducible quantitative sociology, the typical convention is to pick up the skills through more experienced colleagues in graduate school or at the workplace. Below are a few tips that I have learned from others, picked up on my own, or otherwise accumulated in my arsenal of tricks that I thought that I would pass along. There are great resources out there, but I thought it would be helpful to pass along what I think are the most important and helpful tips.

  1. Include a brief header in all scripts indicating the name of the file, the creator, and a brief ...

In the last step, we downloaded all of our data and deposited into directories that store this source data, backed it up, and write-protected the files. Now that we have done all of that, it is time to start working with the data! There is only one problem: almost inevitably, the data do not come neat, tidy, and ready to use. Often, the data contain major problems and need to be constructed in order to be usable. In this installment, I will write about managing files for cleaning, constructing and storing datasets.

There are several varieties of how the data are not completely usable. For many items in the data ...

After establishing where my root directory resides resides, it is time to actually get to work. As with any endeavor, success begins by laying a solid foundation and with academic work that begins foundation is our data.

The most fundamental skill to academic success is asking good questions and acquiring data to answer those questions. Yet, in quantitative research, that skill is useless without the ability to manipulate data into useful formats that are capable of answering the good questions. Data cleaning, construction, and manipulation constitute well over half of my work on major quantitative projects.

It should go without saying, but there are many different types of data. I ...

In my last post, I explained the value of a directory structure: consistent file management structures a disciplined workflow that increases productivity. The magnitude of its importance was a revelation that occurred largely after graduate school as the result of starting a new job.

When I moved to start my new job, I needed to move my files to my new computer. In transferring my files, I realized that my work that followed my well-defined workflow transfered easily, while the work that didn't follow the workflow did not.

The contrast between the ease with which I started the well-structured work and difficulty getting up to speed on disorganized pieces ...

When I say that one of the most important things that I did in graduate school was set up a directory structure and workflow for my files, I am not kidding. Reading theory, learning statistical methods, and writing literature reviews were all important. However, just as important -- though not nearly as sexy -- is setting up a file structure and working directory.

Despite how trivial it sounds, maintaining a well-designed directory structure not only provides a framework for files, it structures productive work.

Given how important it was for me, I will attempt to explain the directory structure that I developed. Let me begin by saying that I am not an ...