Photo credit: Stephen M. Scott
+   -  text size:

Blog

All entries categorized “Stata”

beckieball, or selecting on skill

Tuesday, Jan. 6th, 2015 2:35p.m.

Over the weekend, jeremy posted about beckieball, a "new sport sweeping the country." The purpose was to show how selection on characteristics affects the correlation between characteristics upon selection. This, as commenter Stuart Buck pointed out, is an example of Berkson's Paradox, though it relates to jeremy's post about height and nba.

Although he left several other exercises to the reader, I thought I would do a simpler one: recreate the code that he used to make his example. I did this a) because it was a semi-useful way to shake the cobwebs from egg nog and yuletides, and b) because I think that it will come in handy teaching someday.

tags: Stata, statistics categories: Programming & Statistics

Basic Tips for Writing Statistical Scripts

Sunday, July 17th, 2011 6:14p.m.

While writing scripts is one of the most important skills for reproducible quantitative sociology, the typical convention is to pick up the skills through more experienced colleagues in graduate school or at the workplace. Below are a few tips that I have learned from others, picked up on my own, or otherwise accumulated in my arsenal of tricks that I thought that I would pass along. There are great resources out there, but I thought it would be helpful to pass along what I think are the most important and helpful tips.

tags: Stata, tips-n-tricks, workflow category: Programming

Nesting Stata Macros, or Hacking a Hash Map

Monday, June 6th, 2011 6:37p.m.

Programming in Stata is relatively straightforward and this is partly because the programming syntax is both powerful and relatively straightforward. There are, however, a few minor annoyances in Stata's language including using the backtick and apostrophe to indicate local macros (i.e.,``localname'`). Among these shortcomings, I would argue that the lack of anything like a list in Stata's language is one of the largest.

In most langauges, you can store a list of items and refer to the item in the list by some sort of index. This is particularly helpful for iterating over the same step multiple times. Lists generally come in two flavors: lists to which you can refer to an item by its position in the list or lists which you can refer to by a keyword (called hash maps in computer science lingo). Stata's matrices can be used for the first, though doing so might become complicated if you want to do something besides storing basic numbers or strings.

tags: data-management, macros, Stata, tips-n-tricks category: Programming

Importing Text Files with Variable Names to Stata

Friday, July 23rd, 2010 1:17p.m.

I have come across a problem several times that has been relatively frustrating to deal with. I have data that is downloaded from a site (specifically the Census (which is why this comes up consistently) in which the first two lines of the data contain the variable name and variable description respectively. This is incredibly useful for documenting data. Rather than attempting to figure out what variable `pct001001` means, the description of the variable is right there.

The problem with data in this format is that Stata imports variables as string variables with the first observation being the variable description. I could pull the first two lines of the data out of the original dataset, transpose the rows and columns, save them in a separate text file, and then import the variable names and descriptions. However, managing two files means that it is more likely that one gets lost or I forget to send one of the files to a colleague working on the paper, or any number of other problems that could be experienced by separating these two files. Having one single file would be far superior and that is what the code below is designed to accommodate.

Data available from the U.S. Census comes in the following format (data is clipped):

tags: data-management, Stata, strings category: Programming

Matching Substrings Entirely Within Stata

Sunday, May 2nd, 2010 7:59p.m.

At Orgtheory, Fabio asked about how to identify substrings within text fields in Stata. Although this is a seemingly simple proposal, there is one big problem, as Gabriel Rossman points out: Stata string fields can only hold 244 characters of text. As Fabio desires to use this field to analyze scientific abstracts, then 244 characters is obviously insufficient.

Gabriel Rossman has posted a solution he has called `grepmerge` that uses the Linux-based program `grep` to search for strings in files. This is a great solution, but it comes with one large caveat: it cannot be used in a native Windows environment. This is because the `grep` command is only native to Linux-based systems (which include Apple computers). Therefore, I set out to find a solution that was a) platform-independent and b) internal to Stata (if possible).

Below is the solution that I developed. The solution, it turns out, is not to rely on Stata's string variables or string functions (both can only handle 244 characters), but instead to rely on Stata's local macros ("macros" are what other programming languages call "variables;" however, this would be confusing given that Stata also has variables, thus Stata calls them "macros"). The second key comes from the extended functions of Stata's macros. These are functions that build in much of the programming functions for Stata. There is no function defined to search for strings that are immediately like regex() or strpos(); however, there is an extended function to substitute within strings that will also provide a count of the number of substitutions made. Since all we really care about is the number of times a string would be substituted, then if we know that the count of substitutions is greater than we have the information that we need.

tags: gabriel-rossman, macros, orgtheory, Stata, strings category: Programming

Front Page

• Information about the purpose and topics of this blog can be found here.

Miscellany

• The views presented here are solely and entirely my own, they do not represent those of my colleagues, employer, or any funding agencies which may support me.
• The writing on this blog is covered by a Creative Commons License (described here). Feel free to distribute or re-post with a link to the original content provided that it is freely available to others.