Over the weekend, jeremy posted about beckieball, a "new sport sweeping the country." The purpose was to show how selection on characteristics affects the correlation between characteristics upon selection. This, as commenter Stuart Buck pointed out, is an example of Berkson's Paradox, though it relates to jeremy's post about height and nba.
Although he left several other exercises to the reader, I thought I would do a simpler one: recreate the code that he used to make his example. I did this a) because it was a semi-useful way to shake the cobwebs from egg nog and yuletides, and b) because I think that it will come in handy teaching someday.
The logic that jeremy uses is as follows:
- Draw a sample with a known correlation structure
- Develop a model to pick beckieball players based on measured characteristics
- Calculate correlations between characteristics among different samples
And here is the code:
// 1. CREATE CORRELATION MATRIX AND DRAW MULTIVARIATE NORMAL SAMPLE OF 10,000 matrix C = (1,.57,1,.57,0,1,.57,0,0,1) drawnorm perform height skills desire, n(10000) corr(C) cstorage(lower) clear // 2. CREATE VARIABLE PREDICTING BEING PICKED FOR BECKIEBALL gen pick = .5*height + .5* skills + rnormal()*.5 gsort -pick // 3. CALCULATE CORRELATION MATRICES OF POPULATIONS // Total population pwcorr perform-desire // Premiere League Beckieball pwcorr perform-desire in 1/500 // Minor League Beckieball pwcorr perform-desire in 501/1500 // Semipro Beckieball pwcorr perform-desire in 1501/3000 // Overall correlation of characteristics and being picked pwcorr
A couple of notes here. The matrix,
C, is a lower-triangle matrix and is entered as a single row. The command
drawnorm, which is the command that draws the multivariate sample, knows how to interpret this if the
cstorage(lower) option is specified. Be careful entering this directly in Stata the command will clear current memory (see the
clear option in there).
One can see that the correlation of being picked is higher among those with height and skills and unrelated to desire (since scouts can't measure it, according to jeremy). But those correlations disappear when we look at only those in each of the leagues as he notes in his post.