|
Using Stata to Automate Summary Statistics in Longitudinal Data
by William Buchanan
Often, students find Stata to be a difficult program to use due to the command prompt interface; I know I did when I first started using the program. However, one of the greatest benefits of Stata is the flexibility that it gives you to run different statistical procedures and to automate your work. So, I wanted to provide you with some tips that can help you to generate the summary statistics that will be part of your dissertation.
The first problem that you may encounter is the “shape” of the data. Although it might make sense to people to create a single row of data for each subject, it doesn’t make much sense to statistical packages when they try to analyze longitudinal data. The first step is to get your data in the correct shape. For example, your data may like something like this:
SubjectID |
Y12009 |
Y12010 |
Y12011 |
X12009 |
X12010 |
X12011 |
1 |
75 |
85 |
100 |
5 |
10 |
12 |
2 |
10 |
35 |
50 |
8 |
9 |
13 |
3 |
50 |
62 |
76 |
1 |
3 |
7 |
In Stata, this is referred to as data that is “wide.” But, to do the analysis that you want to do, you need to get the data into “long” format like this:
SubjectID |
Year |
Y1 |
X1 |
1 |
2009 |
75 |
5 |
1 |
2010 |
85 |
10 |
1 |
2011 |
100 |
12 |
2 |
2009 |
10 |
8 |
2 |
2010 |
35 |
9 |
2 |
2011 |
50 |
13 |
3 |
2009 |
50 |
1 |
3 |
2010 |
62 |
3 |
3 |
2011 |
76 |
7 |
This is a pretty simple transformation in this case. The following command would take the data from wide format and transform it into long format.
reshape long Y1 X1, i(SubjectID) j(year)
Here, you are telling Stata that variables that start with Y1 or X1 need to be transformed, since they represent data occurring over time; i(SubjectID) tells the program that it can identify how many rows need to be created by finding out the number of unique cases and adding an entry j(year) for every time that data was collected. Once your data is formatted correctly, it is easy to automate your summary statistics using some simple Stata programming commands.
There are a few different commands in Stata that are used to create loops. These will repeat the same process until it runs out of the values that you provide. For longitudinal data analysis, a great advantage is the time variable (Year).
forvalues i=2009/2011{
sum Y1 X1 if Year==`i’
}
The code above will run all of your summary stats for your dependent and independent variables. It does this using the – forvalues – command to loop over the years, or any other numerical value, that you provide. The command forvalues creates a local macro (kind of like an abbreviation) that you can use to create loops. In the example above, the macro is called – i -. The rest of the command tells Stata that your macro (i) is equal to numerical values 2009 through 2011 in increments of 1, so it is an abbreviation for 2009, 2010, 2011. The forvalues loop begins with a curly bracket that opens to the right “{“ and ends with a curly bracket that opens to the left “}”. The commands that you put inside of the brackets are the commands that Stata will loop over. In this case, we told Stata to run the – sum – command to provide summary statistics for your Y1 and X1 variables if the year is equal to (==) the values from your macro.
It’s important to know that when you use a local macro it needs to be enclosed with a left single quotation mark ` and a right single quotation mark ‘ in order to work; in the example above you should see `i’. Macros can also be used elsewhere in the software and can help you to run your analysis more quickly. For example, you could create a local macro with all of your control variables in it:
local controlvariables c1 c2 c3 c4 c5 c6 c7 c8 c9
So that you can save time when running all of your different statistical models:
regress y1 x1 `controlvariables’
regress y1 x2 `controlvariables’
regress y1 x3 `controlvariables’
I hope that these tips are helpful as you begin analyzing your data and as you move forward with your research. If you have any questions please feel free to contact me through the network and I can work with you to develop your understanding of the Stata software package.
|