nesting stata macros, or hacking a hash map

June 6, 2011 in methods

Programming in Stata is relatively straightforward and this is partly because the programming syntax is both powerful and relatively straightforward. There are, however, a few minor annoyances in Stata's language including using the backtick and apostrophe to indicate local macros (i.e.,`localname'). Among these shortcomings, I would argue that the lack of anything like a list in Stata's language is one of the largest.

In most langauges, you can store a list of items and refer to the item in the list by some sort of index. This is particularly helpful for iterating over the same step multiple times. Lists generally come in two flavors: lists to which you ...

structuring work: data cleaning and construction, laying the foundation

April 16, 2011 in work & workflow

In the last step, we downloaded all of our data and deposited into directories that store this source data, backed it up, and write-protected the files. Now that we have done all of that, it is time to start working with the data! There is only one problem: almost inevitably, the data do not come neat, tidy, and ready to use. Often, the data contain major problems and need to be constructed in order to be usable. In this installment, I will write about managing files for cleaning, constructing and storing datasets.

There are several varieties of how the data are not completely usable. For many items in the ...

advice data-management research-process workflow

structuring work: data, the foundation of work

March 14, 2011 in work & workflow

After establishing where my root directory resides resides, it is time to actually get to work. As with any endeavor, success begins by laying a solid foundation and with academic work that begins foundation is our data.

The most fundamental skill to academic success is asking good questions and acquiring data to answer those questions. Yet, in quantitative research, that skill is useless without the ability to manipulate data into useful formats that are capable of answering the good questions. Data cleaning, construction, and manipulation constitute well over half of my work on major quantitative projects.

It should go without saying, but there are many different types of data. I ...

advice data-management research-process workflow

structuring work: the root, where it all begins

February 11, 2011 in work & workflow

In my last post, I explained the value of a directory structure: consistent file management structures a disciplined workflow that increases productivity. The magnitude of its importance was a revelation that occurred largely after graduate school as the result of starting a new job.

When I moved to start my new job, I needed to move my files to my new computer. In transferring my files, I realized that my work that followed my well-defined workflow transfered easily, while the work that didn't follow the workflow did not.

The contrast between the ease with which I started the well-structured work and difficulty getting up to speed on disorganized pieces ...

advice data-management research-process workflow

structuring work

February 4, 2011 in work & workflow

When I say that one of the most important things that I did in graduate school was set up a directory structure and workflow for my files, I am not kidding. Reading theory, learning statistical methods, and writing literature reviews were all important. However, just as important -- though not nearly as sexy -- is setting up a file structure and working directory.

Despite how trivial it sounds, maintaining a well-designed directory structure not only provides a framework for files, it structures productive work.

Given how important it was for me, I will attempt to explain the directory structure that I developed. Let me begin by saying that I am ...

advice data-management research-process workflow

calculating simple power analyses

October 18, 2010 in methods

I am currently preparing a proposal for submission and one piece of information that the agency suggests is the power required to distinguish effects. This is obviously a perfectly reasonable piece of information to request; however, power calculations fall into that class of things that I know that I should know but I don't. It is one of those topics that every statistics book will tell you is important, but either a) glosses over the topic, or b) provides such a deep background that it is impossible to follow what the authors are talking about. Additionally, power calculations are complicated enormously by the fact that sample designs can become very ...

research-design statistics

two presentations on neighborhood change

September 25, 2010 in metros & neighborhoods

This week I gave two presentations on my work exploring the consequences of neighborhood change for the evolution of contemporary metropolitan racial and ethnic segregation. The first was at the University of Pennsylvania Sociology Colloquium, which focused slightly more on the substantive conclusions, and the second was presented at the Quantitative Methods in the Social Science seminar series at Columbia University and focused more on the methodological components of the work.

I did not publish the slides for these talks because I will likely be giving the talk again (no spoilers!); however, feel free to contact me if you would like more information about them.

neighborhoods segregation

mapping moves

September 10, 2010 in miscellaneous

My friend and colleague, Danny Sheehan was interviewed on WNYC's Brian Lehrer Show this week talking a map he designed that tracked the flow of residential mobility among Brian Lehrer listeners. Among 1,600 entries, his was selected as one of the 15 featured, and one of two people interviewed about his design on-air live. You can see a video his map here.

Since my research is about where people move, this is obviously more than of just passing interest to me and Danny's visualization of moves is an incredibly helpful tool to detect patterns of neighborhood change. I know this because Danny helped us with a project that ...

WNYC data-visualization residential-mobility

importing text files with variable names to stata

July 23, 2010 in metros & neighborhoods

I have come across a problem several times that has been relatively frustrating to deal with. I have data that is downloaded from a site (specifically the Census (which is why this comes up consistently) in which the first two lines of the data contain the variable name and variable description respectively. This is incredibly useful for documenting data. Rather than attempting to figure out what variable pct001001 means, the description of the variable is right there.

The problem with data in this format is that Stata imports variables as string variables with the first observation being the variable description. I could pull the first two lines of the data ...

Stata data-management strings

investing in education for the long term

June 22, 2010 in miscellaneous

It is rare when I find myself in agreement with Stanley Fish. But I think in his most recent column, I think that he finds such an unalienable truth among teachers that it is impossible, as someone interested in teaching, to disagree with. In his column, he discusses how disasterous a proposed Texas plan for higher education would be, if enacted, for the education of students.

No, it's not the Texas plan to teach elementary and high school students that Phyllis Schlafly is the second coming of George Washington. No, this plan involves the state's universities, particularly Texas A&M. Essentially, the plan wholeheartedly embraces the idea that ...

This-American-Life teaching