

ECON 4400 Project
Due July 29 by 5:00 pm. The paper must be uploaded to Carmen in a .pdf format. All parts of your final paper must be typed. The paper should be double spaced with font of 11 or 12 point. There is no minimum number of pages. There is a strict maximum of 10 pages including title page, tables, graphs, and appendices. Anything beyond 10 pages will not be graded.
The paper should include the following sections: introduction, data, empirical methodology, results, and conclusions.
Read the following instructions carefully.
Carrying Out an Empirical Project with R
Choosing a Topic
• Start with a general area or set of questions
• Make sure you are interested in the topic
• Use on-line services such as Google Scholar or EconLit to investigate past work on this topic
• Narrow down your topic to a specific question or issue to be investigated
• Work through the theoretical issue
Choosing Data
• Want data that includes measures of the things that your theoretical model imply are important
• Investigate what type of data sets have been used in the past literature
• Search for what other data sets are available (for example, ICPSR)
• Consider collecting your own data
Common Data Sources
• Labor/Employment/Wage
o Census: http://www.census.gov/cps
o BLS: http://bls.gov/data/
• For public health data:
o CDC:http://www.cdc.gov/DataStatistics/
• Comprehensive: http://www.nber.org/data/
• Other: World Bank, etc
• Data set in library AER. These data are very limited and I don’t recommend using them for your project. If you end up using these data sets for any reason, I may deduct up to 5% of your project grade. It might be a good last resort.
Using the Data
• Try to download the data from the sources in an accessible format. Download R data format if available. If not you can use Excel, csv or any delimited file and import it into your R environment. Other data types such as Sata, SAS, etc are also importable into R. Be cautious with other formats and make sure your data is properly imported.
• Create variables appropriate for analysis. For example, create dummy variables from categorical variables, create hourly wages, etc.
• Your data set may have many other variables that you don’t need. It’s a good practice to create a new data frame to save the subset of the variables of interest. It would be easier to work with a data frame that only includes the variables you need.
• Check the data for missing values, errors, outliers, etc.
• If you have data on multiple time periods (a panel) you can take a subset of that data set at a particular date you want. You know how to subset data in R using square brackets [ ].
• Make sure to report what you did
Estimating a Model
• Start with a model that is based in your theory (Your baseline specification). This should include the variables of interest and main control variables.
• Include other variables that are theoretically less clear, and test for their significance. (t-test for single variable and f-test for jointly testing multiple variables)
• Check for functional form misspecification. (plots, R-squared, etc)
• Consider reasonable forms: interactions, quadratics, logs, etc.
• Don’t lose sight of your theory and the ceteris paribus interpretation – you need to be careful about including variables that greatly alter the interpretation. For example, effect of bedrooms on house price conditional on square footage
• Once you have a well-specified model, need to worry about the standard errors
• Check for heteroskedasticity (use graphs). There are other tests too. Always safe to use heteroscedasticity-robust tests.
Reporting
• You need to report your results for your base model and different specification (including functional forms, added control variables, etc).
• Need to include the p-values for the tests of significance for the variables in your model.
• We will use stargazer package to produce nice regression results.
Validity
• Think about the omitted variables that are not observed in the data
• See if there’s simultaneous causality problem
• How big of an issue can measurement errors cause in your model
• You need to mention these issues and their importance even if you cannot solve them.
Interpreting your results
• Keep theory in mind when interpreting results
• Be careful to keep ceteris paribus in mind
• Keep in mind potential problems with your estimates – be cautious drawing conclusions
• Can get an idea of the direction of bias due to omitted variables, measurement error or simultaneity
Further issues
• Some problems are just too hard to easily solve with available data
• May be able to approach the problem in several ways, but something could still go wrong with each one
• Provide enough information for a reader to decide whether they find your results convincing or not
• Don’t worry if you don’t “prove” your theory
• With unexpected results, you want to be careful in thinking through potential biases.
• But, if you have carefully specified your model and feel confident you have unbiased estimates, then that’s just the way things are.