Photo credit: Stephen M. Scott
 +   -  text size:


All entries categorized “Programming”

Custom Homework Class in LaTeX

Tuesday, Jan. 24th, 2017 11:34a.m.

I have been using the exam class by Philip Hirschhorn in LaTeX to write out homework and tests for my class, The Epidemiology of Everyday Life. It allows me to write out problems with the ease of mathematical notation in LaTeX, provide solutions that can be turned on and off with a single line of code, and to easily tally points on various pages. If you haven't used it and teach a stats-based course, I would definitely recommend checking it out.

I ran into a problem, however, because I am a procrastinating perfectionist. I have had a desired to have all of my teaching handouts to have a similar look. I created a custom syllabus class to do that, and I really liked it. I wanted the homework to have the same look. And what better way to procrastinate than to try to figure out the arcane rules of LaTeX!?

  tags: LaTeX, teaching category: Programming

workflow directory structure that plays nicely with LaTeX

Monday, Jan. 2nd, 2017 10:51a.m.

I really like using LaTeX for writing, both articles and for teaching. I find that it eliminates many of the hassles of using Word. I started trying to figure out how to write my own packages this past year and ran into many problems. One of those problems was figuring out a way to keep the .sty packages in a directory that I could easily access and that would be convenient for my workflow rather than what the LaTeX directory structure enforced.

The LaTeX search path includes two environment variables: TEXMFHOME and TEXMFLOCAL that determine where you would store .sty files to be recognized by LaTeX. If you follow the excellent instructions [here][texmf], you can use the Linux program kpsewhich to find where those paths are. For me they were at /Users/<USERNAME>/Library/texmf and /Users/<USERNAME>/Library/texmf-local, respectively.

  tags: LaTeX, linux, texmf category: Programming

beckieball, or selecting on skill

Tuesday, Jan. 6th, 2015 2:35p.m.

Over the weekend, jeremy posted about beckieball, a "new sport sweeping the country." The purpose was to show how selection on characteristics affects the correlation between characteristics upon selection. This, as commenter Stuart Buck pointed out, is an example of Berkson's Paradox, though it relates to jeremy's post about height and nba.

Although he left several other exercises to the reader, I thought I would do a simpler one: recreate the code that he used to make his example. I did this a) because it was a semi-useful way to shake the cobwebs from egg nog and yuletides, and b) because I think that it will come in handy teaching someday.

  tags: Stata, statistics categories: Programming & Statistics

Importing Text Files with Variable Names to R

Wednesday, Nov. 26th, 2014 10:47a.m.

I am attempting to learn R. This is either a great thing or a terrible, terrible mistake while on the tenure clock. But, all the cool kids are doing -- so even though they might also jump off a bridge, I'm going to jump into R.

The hardest part so far is doing things that now come as second nature to me in Stata. Although R's tools are much better in the long-run, learning what types of objects different functions return and such ends up being a very high learning curve.

A while back (all posts are a while back now), I wrote a post describing how to import data with variable labels in Stata. The idea was that I could keep the variable labels with the data in an text file so that I could always figure out what the variables were. It doesn't require an extra codebook or additional files that could be lost.

I have now replicated that script for R, and it is below:

  tags: data-management, R, stata-to-R category: Programming

Basic Tips for Writing Statistical Scripts

Sunday, July 17th, 2011 6:14p.m.

While writing scripts is one of the most important skills for reproducible quantitative sociology, the typical convention is to pick up the skills through more experienced colleagues in graduate school or at the workplace. Below are a few tips that I have learned from others, picked up on my own, or otherwise accumulated in my arsenal of tricks that I thought that I would pass along. There are great resources out there, but I thought it would be helpful to pass along what I think are the most important and helpful tips.

  tags: Stata, tips-n-tricks, workflow category: Programming

Nesting Stata Macros, or Hacking a Hash Map

Monday, June 6th, 2011 6:37p.m.

Programming in Stata is relatively straightforward and this is partly because the programming syntax is both powerful and relatively straightforward. There are, however, a few minor annoyances in Stata's language including using the backtick and apostrophe to indicate local macros (i.e.,`localname'). Among these shortcomings, I would argue that the lack of anything like a list in Stata's language is one of the largest.

In most langauges, you can store a list of items and refer to the item in the list by some sort of index. This is particularly helpful for iterating over the same step multiple times. Lists generally come in two flavors: lists to which you can refer to an item by its position in the list or lists which you can refer to by a keyword (called hash maps in computer science lingo). Stata's matrices can be used for the first, though doing so might become complicated if you want to do something besides storing basic numbers or strings.

  tags: data-management, macros, Stata, tips-n-tricks category: Programming

Calculating Simple Power Analyses

Monday, Oct. 18th, 2010 6:31p.m.

I am currently preparing a proposal for submission and one piece of information that the agency suggests is the power required to distinguish effects. This is obviously a perfectly reasonable piece of information to request; however, power calculations fall into that class of things that I know that I should know but I don't. It is one of those topics that every statistics book will tell you is important, but either a) glosses over the topic, or b) provides such a deep background that it is impossible to follow what the authors are talking about. Additionally, power calculations are complicated enormously by the fact that sample designs can become very complicated.

In contrast to this traditional treatment, Andrew Gelman and Jennifer Hill's book, Data Analysis Using Regression and Multilevel/Hierarchical Models, provides a very clear description of simple power analyses, which -- thankfully -- is all that I really need for this project. To make sure that I don't forget, I record below how to find the required sample size, n, for varying levels of between-group effect differences, Δ, at 80% power. The formula is relatively easy (see pp. 437-447 for more info): (5.6σ/Δ)2. Therefore, if I measure change in units of standard deviations, sd, then I can estimate the sample size n for each unit of change.

drop _all
range sd 0 1 41
gen n = (5.6/sd)^2

I can then make a graph of the expected sample size required for a standard unit change using the command twoway line n sd; or, alternatively, just print a table of numbers using list.

  tags: research-design, statistics category: Programming

Importing Text Files with Variable Names to Stata

Friday, July 23rd, 2010 1:17p.m.

I have come across a problem several times that has been relatively frustrating to deal with. I have data that is downloaded from a site (specifically the Census (which is why this comes up consistently) in which the first two lines of the data contain the variable name and variable description respectively. This is incredibly useful for documenting data. Rather than attempting to figure out what variable pct001001 means, the description of the variable is right there.

The problem with data in this format is that Stata imports variables as string variables with the first observation being the variable description. I could pull the first two lines of the data out of the original dataset, transpose the rows and columns, save them in a separate text file, and then import the variable names and descriptions. However, managing two files means that it is more likely that one gets lost or I forget to send one of the files to a colleague working on the paper, or any number of other problems that could be experienced by separating these two files. Having one single file would be far superior and that is what the code below is designed to accommodate.

Data available from the U.S. Census comes in the following format (data is clipped):

  tags: data-management, Stata, strings category: Programming

Matching Substrings Entirely Within Stata

Sunday, May 2nd, 2010 7:59p.m.

At Orgtheory, Fabio asked about how to identify substrings within text fields in Stata. Although this is a seemingly simple proposal, there is one big problem, as Gabriel Rossman points out: Stata string fields can only hold 244 characters of text. As Fabio desires to use this field to analyze scientific abstracts, then 244 characters is obviously insufficient.

Gabriel Rossman has posted a solution he has called grepmerge that uses the Linux-based program grep to search for strings in files. This is a great solution, but it comes with one large caveat: it cannot be used in a native Windows environment. This is because the grep command is only native to Linux-based systems (which include Apple computers). Therefore, I set out to find a solution that was a) platform-independent and b) internal to Stata (if possible).

Below is the solution that I developed. The solution, it turns out, is not to rely on Stata's string variables or string functions (both can only handle 244 characters), but instead to rely on Stata's local macros ("macros" are what other programming languages call "variables;" however, this would be confusing given that Stata also has variables, thus Stata calls them "macros"). The second key comes from the extended functions of Stata's macros. These are functions that build in much of the programming functions for Stata. There is no function defined to search for strings that are immediately like regex() or strpos(); however, there is an extended function to substitute within strings that will also provide a count of the number of substitutions made. Since all we really care about is the number of times a string would be substituted, then if we know that the count of substitutions is greater than we have the information that we need.

  tags: gabriel-rossman, macros, orgtheory, Stata, strings category: Programming

Front Page


  • Information about the purpose and topics of this blog can be found here.






  • The views presented here are solely and entirely my own, they do not represent those of my colleagues, employer, or any funding agencies which may support me.
  • The writing on this blog is covered by a Creative Commons License (described here). Feel free to distribute or re-post with a link to the original content provided that it is freely available to others.
  • Creative Commons License