Often, students find Stata to be a difficult program to use due to the command prompt interface; I know I did when I first started using the program. However, one of the greatest benefits of Stata is the flexibility that it gives you to run different statistical procedures and to automate your work. So, I wanted to provide you with some tips that can help you to generate the summary statistics that will be part of your dissertation.
The first problem that you may encounter is the “shape” of the data. Although it might make sense to people to create a single row of data for each subject, it doesn’t make much sense to statistical packages when they try to analyze longitudinal data. The first step is to get your data in the correct shape. For example, your data may like something like this:
SubjectID
Y12009
Y12010
Y12011
X12009
X12010
X12011
1
75
85
100
5
10
12
2
10
35
50
8
9
13
3
50
62
76
1
3
7
In Stata, this is referred to as data that is “wide.” But, to do the analysis that you want to do, you need to get the data into “long” format like this:
SubjectID
Year
Y1
X1
1
2009
75
5
1
2010
85
10
1
2011
100
12
2
2009
10
8
2
2010
35
9
2
2011
50
13
3
2009
50
1
3
2010
62
3
3
2011
76
7
This is a pretty simple transformation in this case. The following command would take the data from wide format and transform it into long format.
reshape long Y1 X1, i(SubjectID) j(year)
Here, you are telling Stata that variables that start with Y1 or X1 need to be transformed, since they represent data occurring over time; i(SubjectID) tells the program that it can identify how many rows need to be created by finding out the number of unique cases and adding an entry j(year) for every time that data was collected. Once your data is formatted correctly, it is easy to automate your summary statistics using some simple Stata programming commands.
There are a few different commands in Stata that are used to create loops. These will repeat the same process until it runs out of the values that you provide. For longitudinal data analysis, a great advantage is the time variable (Year).
forvalues i=2009/2011{
sum Y1 X1 if Year==`i’
}
The code above will run all of your summary stats for your dependent and independent variables. It does this using the – forvalues – command to loop over the years, or any other numerical value, that you provide. The command forvalues creates a local macro (kind of like an abbreviation) that you can use to create loops. In the example above, the macro is called – i -. The rest of the command tells Stata that your macro (i) is equal to numerical values 2009 through 2011 in increments of 1, so it is an abbreviation for 2009, 2010, 2011. The forvalues loop begins with a curly bracket that opens to the right “{“ and ends with a curly bracket that opens to the left “}”. The commands that you put inside of the brackets are the commands that Stata will loop over. In this case, we told Stata to run the – sum – command to provide summary statistics for your Y1 and X1 variables if the year is equal to (==) the values from your macro.
It’s important to know that when you use a local macro it needs to be enclosed with a left single quotation mark ` and a right single quotation mark ‘ in order to work; in the example above you should see `i’. Macros can also be used elsewhere in the software and can help you to run your analysis more quickly. For example, you could create a local macro with all of your control variables in it:
local controlvariables c1 c2 c3 c4 c5 c6 c7 c8 c9
So that you can save time when running all of your different statistical models:
regress y1 x1 `controlvariables’
regress y1 x2 `controlvariables’
regress y1 x3 `controlvariables’
I hope that these tips are helpful as you begin analyzing your data and as you move forward with your research. If you have any questions please feel free to contact me through the network and I can work with you to develop your understanding of the Stata software package.
About the Author
William Buchanan received his PhD at TUI University where his dissertation research focused on the causal effects of poverty on musical achievement in the US. He makes use of geospatial analysis (i.e., using ArcGIS software to measure resource allocation, concentration, and geographical differences in resource availability), econometrics (i.e., instrumental variable methods, regression discontinuity designs, differences-in-differences estimators, fixed-effects models, and others), and structural equation models (i.e., path analysis of observed data, latent variable modeling, etc…) to conduct quasi-experimental data analysis with a national data set from the US Department of Education. His use of multi-disciplinary approaches to research allowed him to integrate theory and statistical methodology from economics, education, and psychology. He has also provided statistical consulting and research services to a wide variety of public school districts, businesses, and institutions of higher education.
William Buchanan is also an aspiring Stata programmer and has helped his client to automate their data analysis by providing custom written Stata programs (i.e., .do and .ado). He also uses SPSS 16, LISREL 8.8, HLM 6.0, R, G*Power 3.2, ArcGIS, and other software packages to perform data analysis. Additionally, he uses StatTransfer 10 and is able to accept data files in almost any format and can provide data files in the format of your choice.
Specialties: music education, developmental psychology, neuropsychology, cognitive psychology, education, education policy, program evaluation, arts integration in education, econometric methods in educational research, contextual factors in education/developmental psychology, LaTeX typesetting software.
“He was patient and very thorough. I would highly recommend him to others in need of statistics help services. Received: statistical consulting, data analysis, interpretation of results, presentation of results.” Terrence Carter, Chief Learning Officer, AUSL
6. Describe your project: (e.g., book, business document, dissertation)
7. Describe the level of writing or editing required: (e.g., copyediting, proofreading, content editing, fact-checking, ghostwriting, formatting)
8. Current word count of document:
9. Your deadline date:
10. Required manual of style, if any: (e.g., Chicago Manual, APA, MLA, AP, AAA, CBE/CSE)
11. If you are a student, please provide the URL of your university's style manual:
12. Number of charts, tables, and pictures: ____________ Do you need them edited and/or formatted?
13. Do you have a budget for the project? (Please be specific.)
14. Number of footnotes and entries in reference list:
15. Do you want to contact a particular writer/editor?
16. How did you learn about our service?
Attach a sample chapter/section or other important documents related to your project. Please zip large files (max 1MB)
YOUR NAME MUST BE IN YOUR SAMPLE DOCUMENT OR IT CAN BE THE FILE NAME (e.g., johnsmith.doc).
A sample of the material is required to receive a quote for services. If you did not send a sample and/or description of the project with your first submission, please resend.
Once your e-mail is received, the network coordinator will forward it (plus any attached files) to the consultants you selected. If no selection is made, your submission will be forwarded to several consultants who might be a good match. Final choice of consultant is yours.
If you do not receive a response within 3 hours (M-F, 7a-7p Eastern), please RESEND your submission. You may also use the chat button or leave a message: 469-789-3030.
Allow a longer response time if you sent your submission during the weekend or after U.S. business hours.
All of the consultants listed on this site are freelance. They are located throughout the U.S. The coordinator cannot answer cost/timeframe questions for each consultant. You must go through the submission process to receive direct responses from the consultants listed on this site.