posts in methods

On Twitter, Phil Cohen asked how he might make a plot showing overlapping distributions:

I think that he was on the right track using transparency, but I am not sure that the color was exactly right. The plot reminded me of what Mike Bostock (my generation's Edward Tufte) did to make a population pyramid.

Phil was also working with another disadvantage: he's using Microsoft Excel. Excel (all Microsoft Office products actually), renders the ...

R presents more of a challenge to Stata on many fronts, one of which is basic data management.

I often find myself calculating the value of one observation given the value of an adjacent value. For example, to assess a lagged effect, I would take the value of the preceding interval. Stata makes this really easy, R not so much.

Here's what we would do in Stata:

set obs 1000
gen i = _n
gen val = round(runiform()*10)
gen lag = val[_n-1]

The last command throws the warning, (1 missing value generated) because the first observation has no lagged observation. The first 10 observations look like this:

. list ...

Over the weekend, jeremy posted about beckieball, a "new sport sweeping the country." The purpose was to show how selection on characteristics affects the correlation between characteristics upon selection. This, as commenter Stuart Buck pointed out, is an example of Berkson's Paradox, though it relates to jeremy's post about height and nba.

Although he left several other exercises to the reader, I thought I would do a simpler one: recreate the code that he used to make his example. I did this a) because it was a semi-useful way to shake the cobwebs from egg nog and yuletides, and b) because I think that it will come in ...

I am attempting to learn R. This is either a great thing or a terrible, terrible mistake while on the tenure clock. But, all the cool kids are doing -- so even though they might also jump off a bridge, I'm going to jump into R.

The hardest part so far is doing things that now come as second nature to me in Stata. Although R's tools are much better in the long-run, learning what types of objects different functions return and such ends up being a very high learning curve.

A while back (all posts are a while back now), I wrote a post describing how to import ...

Programming in Stata is relatively straightforward and this is partly because the programming syntax is both powerful and relatively straightforward. There are, however, a few minor annoyances in Stata's language including using the backtick and apostrophe to indicate local macros (i.e.,`localname'). Among these shortcomings, I would argue that the lack of anything like a list in Stata's language is one of the largest.

In most langauges, you can store a list of items and refer to the item in the list by some sort of index. This is particularly helpful for iterating over the same step multiple times. Lists generally come in two flavors: lists to ...

I am currently preparing a proposal for submission and one piece of information that the agency suggests is the power required to distinguish effects. This is obviously a perfectly reasonable piece of information to request; however, power calculations fall into that class of things that I know that I should know but I don't. It is one of those topics that every statistics book will tell you is important, but either a) glosses over the topic, or b) provides such a deep background that it is impossible to follow what the authors are talking about. Additionally, power calculations are complicated enormously by the fact that sample designs can become ...

At Orgtheory, Fabio asked about how to identify substrings within text fields in Stata. Although this is a seemingly simple proposal, there is one big problem, as Gabriel Rossman points out: Stata string fields can only hold 244 characters of text. As Fabio desires to use this field to analyze scientific abstracts, then 244 characters is obviously insufficient.

Gabriel Rossman has posted a solution he has called grepmerge that uses the Linux-based program grep to search for strings in files. This is a great solution, but it comes with one large caveat: it cannot be used in a native Windows environment. This is because the grep command is only native ...