Basic Tips for Writing Statistical Scripts

While writing scripts is one of the most important skills for reproducible quantitative sociology, the typical convention is to pick up the skills through more experienced colleagues in graduate school or at the workplace. Below are a few tips that I have learned from others, picked up on my own, or otherwise accumulated in my arsenal of tricks that I thought that I would pass along. There are great resources out there, but I thought it would be helpful to pass along what I think are the most important and helpful tips.

Include a brief header in all scripts indicating the name of the file, the creator, and a brief description of what the file does. The heading that I use for my Stata .do files (but is by no means necessarily correct or standard) is:
```
// File name: <nameOfDoFile>.do
// File location: /ROOT/Data/MyData/DatasetConstruction
// Created by: Mike Bader
// Created on: 01 Jan 2012
// Description: This file makes the dataset myData.dta
```
The file location should be a path on the directory that begins with the ROOT directory. That way you will be able to find it later.
One common problem among inexperienced and experienced users of statistical software alike is that users get absorbed in the immediate tasks and forget the larger picture. Issues arising from syntax, improper data structures, variable name confusion, etc. take precedence over correctly analyzing the data. To overcome this problem, I think that it is good practice (and am forcing myself to do this more and more) to write out the logic of what you want the script to do. Write this list in the comments, then write the code that corresponds to each item on your to-do list below the heading (and include subheadings for minor steps). This allows you to not only follow a logical path that accomplishes your analytical goal, but automatically comments your code.
Name scripts with verbs to indicate what they do. I mentioned this when discussing the construction of dataset files: I try to always name the Stata script for a dataset make.do. For analysis, this could be analyze.do. By doing this, you will allow yourself to quickly identify what files do, and if you include a header with more detail, then you can open the script file and get more details.
Sometimes it is very helpful to subdivide tasks across multiple scripts, and is in fact a principle tenet in some programming worlds. There are pros and cons to doing this that don't apply in other venues of programming that I won't get into here. But, if you do create a script that depends on other scripts, include a note at the top of your scripts so that you know what is required in order to run the script.

I am sure that I will think of more, but I hope that this might be helpful hints for those "just starting out in the business," as they say.

mike bader

basic tips for writing statistical scripts

Pingbacks

Comments