R presents more of a challenge to Stata on many fronts, one of which is basic data management.

I often find myself calculating the value of one observation given the value of an adjacent value. For example, to assess a lagged effect, I would take the value of the preceding interval. Stata makes this really easy, R not so much.

Here's what we would do in Stata:

set obs 1000 gen i = _n gen val = round(runiform()*10) gen lag = val[_n-1]

The last command throws the warning, `(1 missing value generated)`

because the first observation has no lagged observation. The first 10 observations look like this:

. list in 1/10 +----------------+ | i val lag | |----------------| 1. | 1 3 . | 2. | 2 0 3 | 3. | 3 6 0 | 4. | 4 4 6 | 5. | 5 0 4 | |----------------| 6. | 6 9 0 | 7. | 7 6 9 | 8. | 8 5 6 | 9. | 9 8 5 | 10. | 10 5 8 | +----------------+

After my head left an indentation on my desk, here's what we would do in R. But now I can't remember where. The solution is to use the `head()`

command. Anyone who has completed an R tutorial will know this command as a way to view a subset of your data. It turns out that `head()`

has a second optional parameter that allows you to indicate how many observations you want to view. We can use that to obtain the first *n-1* values and assign them to rows 2 through *N*.

dt <- data.frame(i=c(1:1000),val=round(runif(1000)*10)) dt$lag <- c(NA,head(dt$val,nrow(dt)-1))

Remember how Stata told us that it created one missing value? We need to do that ourselves in R. In the second line, we combine that one `NA`

value with the values of `dt$lag`

for values 2 through *N*.

And we end up with what we want (using `head()`

for its typical purpose):

> head(dt) i val lag 1 1 7 NA 2 2 7 7 3 3 3 7 4 4 2 3 5 5 3 2 6 6 9 3

## Comments

Comments are closed.