R presents more of a challenge to Stata on many fronts, one of which is basic data management.

I often find myself calculating the value of one observation given the value of an adjacent value. For example, to assess a lagged effect, I would take the value of the preceding interval. Stata makes this really easy, R not so much.

Here's what we would do in Stata:

set obs 1000
gen i = _n
gen val = round(runiform()*10)
gen lag = val[_n-1]

The last command throws the warning, (1 missing value generated) because the first observation has no lagged observation. The first 10 observations look like this:

. list in 1/10

     +----------------+
     |  i   val   lag |
     |----------------|
  1. |  1     3     . |
  2. |  2     0     3 |
  3. |  3     6     0 |
  4. |  4     4     6 |
  5. |  5     0     4 |
     |----------------|
  6. |  6     9     0 |
  7. |  7     6     9 |
  8. |  8     5     6 |
  9. |  9     8     5 |
 10. | 10     5     8 |
     +----------------+

After my head left an indentation on my desk, here's what we would do in R. But now I can't remember where. The solution is to use the head() command. Anyone who has completed an R tutorial will know this command as a way to view a subset of your data. It turns out that head() has a second optional parameter that allows you to indicate how many observations you want to view. We can use that to obtain the first n-1 values and assign them to rows 2 through N.

dt <- data.frame(i=c(1:1000),val=round(runif(1000)*10))
dt$lag <- c(NA,head(dt$val,nrow(dt)-1))

Remember how Stata told us that it created one missing value? We need to do that ourselves in R. In the second line, we combine that one NA value with the values of dt$lag for values 2 through N.

And we end up with what we want (using head() for its typical purpose):

> head(dt)
  i val lag
1 1   7  NA
2 2   7   7
3 3   3   7
4 4   2   3
5 5   3   2
6 6   9   3

Pingbacks

Pingbacks are open.

Comments

Comments are closed.