The local board of education has conducted a survey in an attempt to understand potential causes and
correlates of students’ math performance. They are particularly interested in a recent claim that high rates of
alcohol consumption in adolescents leads to failing math scores. Some parents stress that alcohol consumption
may be driving up absences, resulting in lower grades. Others see little evidence of a systematic problem,
suggesting that only some students or schools may be affected. Other parents think that alcohol’s effects are
reduced if young people have supportive families. Due to cost limitations, the board could survey only two of
the fifteen schools in their district, but they have collected a wide range of data on each student in their
Use the statistical techniques you’ve learned in class to evaluate the hypothesis that alcohol consumption
results in lower math grades, looking for evidence either to support this claim or to suggest an alternative
interpretation. Additionally, examine the claims and mechanisms suggested by parents, or some potential
hypotheses of your own. Summarize your findings in a report, and keep in mind that the board of education
is not comprised of statistical experts: you’ll need to communicate your findings clearly without relying on
jargon. (Detailed results may be supplied in an appendix, if necessary.) Include in your summary any other
insights you think are helpful in understanding what can and cannot be determined (i.e. what is or is not
causal) from the available data.
HINT: think about what DAGs are implied by each of the parents’ questions, and what regressions you need
for each of those DAGs.
Your report should be no more than 2000 words of text and 3 figures, all of which should be publication-quality.
Only PDF or HTML files will be accepted (i.e. make sure to write in .Rmd or Latex). Please also submit
the COMMENTED code for your analyses. Anything that doesn’t fit in the text can and should go in the
File “student_alc_math.csv” contains the data, and “codebook.csv” contains the codebook.
I am looking for four key things in this assignment:
1. That you can translate hypotheses to DAGs and DAGs to regressions
2. That you can interpret those regressions correctly and explain their results
3. That you can design and produce high-quality data visualizations that each make one key point and
obey Tufte’s rules
4. That you can explain the causal limitations of your analyses.
All your code and all your writing must be YOUR OWN. You may discuss analysis strategies or data
visualization strategies with your classmates, but you may not share code.