Programming in Stata is relatively straightforward and this is partly because the programming syntax is both powerful and relatively straightforward. There are, however, a few minor annoyances in Stata's language including using the backtick and apostrophe to indicate local macros (i.e.,`localname'). Among these shortcomings, I would argue that the lack of anything like a list in Stata's language is one of the largest.

In most langauges, you can store a list of items and refer to the item in the list by some sort of index. This is particularly helpful for iterating over the same step multiple times. Lists generally come in two flavors: lists to which you can refer to an item by its position in the list or lists which you can refer to by a keyword (called hash maps in computer science lingo). Stata's matrices can be used for the first, though doing so might become complicated if you want to do something besides storing basic numbers or strings.

Although Stata does not have any built-in hash maps, one can hack something equivalent through what I will call "nested macros." The idea is that Stata works "inside-out" when it interprets macros. Let's start with a basic example to show what I mean:

. local a = 1
. local b1 = "test"
. di "`b`a''"
test

In this simple example, the local macro a is defined as the number 1. We then define a second local macro called b1 that is equal to the string test. You can see what I mean when I say that Stata interprets macros "inside-out" in the third line. We ask Stata to display the macro "`b`a''" (note that `a' is calling a local macro here). Stata interprets this in two steps:

  1. Since the local macro a is embedded inside the larger macro `b`a'' it first intreprets local a and replaces the local macro with its value (i.e., 1)
  2. Now that Stata has interpreted the local macro a, Stata now sees the command di "`b1'" and interprets that value based on the local macro b1

This is a rather esoteric example that serves a pedagological purpose, but not one that is particularly useful in day-to-day programming and, more importantly, it doesn't do anything like what I promised with respect to creating anything akin to a hash map.

Let's use what we learned from the above example and use the same principles to create something that might be more useful. I will use an example that I come across frequently in my work. Often I have the same variable for different racial and ethnic subgroups (e.g., the median income for non-Latino blacks, whites, and Latinos). I want to complete some task for each of those variables, but I don't want to have to write the same code over and over again. Let's say that I have the median income by race and I want to generate a new variable by taking the natural log and label the variable. I could write this out three separate times, or I can use a nested macro like I did above. Let's first look at what this would look like without a nested list:

. local races nhb nhw hsp 
. foreach race in `races' {
.   gen `race'mdhhinc_ln = ln(`race'mdhhinc)
.   }
. label var nhbmdhhinc_ln "Natural log of median hh income for non-Hispanic blacks"
. label var nhwmdhhinc_ln "Natural log of median hh income for non-Hispanic whites"
. label var hspmdhhinc_ln "Natural log of median hh income for Hispanics"

This code assumes that there are three variables in the dataset nhbmdhhinc, nhwmdhhinc, and hspmdhhinc for black, white, and Hispanic median household incomes, respectively. After that, we label each of the new variables. Now, you can see that this final step is rather repetitive, prone to errors, and takes time. Let's use a nested macro to avoid repetitive code that is less prone to errors and eliminates the time it takes to type all of these commands.

. local nhbName non-Hispanic black
. local nhwName non-Hispanic white
. local hspName Hispanic
. local races nhb nhw hsp 
. foreach race in `races' {
.   gen `race'mdhhinc_ln = ln(`race'mdhhinc)
.   label var `race'mdhhinc_ln "Natural log of median hh income for ``race'Name's"
.   }

In the second-to-last line, you can see that we now have a nested local macro. Stata will interpret the local macro race as part of the foreach loop. Subsequently, the interpreted value of race (i.e., either nhb, nhw, or hsp) will be used to interpret the appropriate name for that racial/ethnic group. Now, you will notice that we actually added a line to the code here. When constructing a single variable, it is not as efficient (though potentially less error-prone) to simply write the labels by hand. But, now let's assume we want to create two sets of variables, one for median household income and one for meidan home value. We can start to see the efficiency of using nested macros (and can actually use it for both the variables and the racial/ethnic groups here):

. local mdhhincName median hh income
. local mdhmvalName median home value
. local varList mdhhinc mdhmval
.
. local nhbName non-Hispanic black
. local nhwName non-Hispanic white
. local hspName Hispanic
. local races nhb nhw hsp 
.
. foreach var in `varList' {
.   foreach race in `races' {
.       gen `race'`var'_ln = ln(`race'`var')
.       label var `race`var'_ln "``var'Name' for ``race'Name's"
.       }
.   }

This code now creates six variables and labels them very efficiently without being prone to errors. Labelling, while important for using the data, is a somewhat trivial case for such methods, but serves as a useful and practical example. You can use this method for useful applications of defining appropriate outcomes (i.e., set the nested macro to the type of analysis to execute), define variables in a scale (i.e., set the nested macro to a list), and a number of other useful applications.

Pingbacks

Pingbacks are open.

Comments

Comments are closed.