I am attempting to learn R. This is either a great thing or a terrible, terrible mistake while on the tenure clock. But, all the cool kids are doing -- so even though they might also jump off a bridge, I'm going to jump into R.

The hardest part so far is doing things that now come as second nature to me in Stata. Although R's tools are much better in the long-run, learning what types of objects different functions return and such ends up being a very high learning curve.

A while back (all posts are a while back now), I wrote a post describing how to import data with variable labels in Stata. The idea was that I could keep the variable labels with the data in an text file so that I could always figure out what the variables were. It doesn't require an extra codebook or additional files that could be lost.

I have now replicated that script for R, and it is below:

data.file <- file('R10401937_SL160_withLabels.txt',open='r')
lines <- readLines(data.file)
data.labels <- as.vector(strsplit(lines[1],'\t')[[1]])
data.names <- strsplit(lines[2],'\t')[[1]]
tmp <- tempfile()
writeLines(lines[-2:-1],tmp)
places <- read.delim(tmp,header=FALSE,col.names=data.names)
names(data.labels) <- data.names

Note the last line: this line turns the object data.labels into a look-up table. I can now type:

data.labels['p003001']

And it will return: "\"Total Population\"". This is even more helpful than Stata's variable labeling convention.

UPDATE MAR 2, 2016: After learning a little bit more R, I realized that I could simplify what I had. Below is the code that I managed to make based on the data structure from Social Explorer:

 t <- read.table('test/R11129600_SL140.txt')
 names.labels <- data.frame(varnames = as.vector(t(t[2,])), labels = as.vector(t(t[1,]))) 
 dt <- t[-1:-2,]
 colnames(dt) <- names.labels$varnames
 rm(t)

Columns in the data frame dt will be factor variables. For quantitative analyses, those should be converted to numeric types. To do that, define a vector of strings for column names or indices of column numbers and apply to each column:

varnames <- c('var1','var2')
dt[,varnames] <- sapply(dt[,varnames],function(x){as.numeric(as.character(x))})

Pingbacks

Pingbacks are open.

Comments

Comments are closed.