Now that I have a fair bit of data on my students’ performance and participation in the class, I’ve been champing at the bit to start analyzing it. I figured this would have to wait until the end of the semester since my plate is pretty full with class prep and other responsibilities, but the other day over lunch, my friend Edward and I had a great idea: Why not combine the two and analyze the data during lecture? It would make a great introduction to multiple regression and hopefully teach students how to use their study time more efficiently.
I started by collecting even more data with yet another survey of the class:
On Tuesday night, while I waited for the data to come in, I prepared Wednesday’s lecture. I would start by showing the correlation between study hours and exam scores using bivariate regression. I expected the relationship to be weakly positive because it was confounded with prior background. That is, those students who had seen some or most of the material before (about half the class) would study less but still perform well on the exam. Then I would show them how to control for prior background and we’d see a decent estimate of the causal effect of studying and everyone would say “Oh! I should study more!”
Wednesday morning at 10am, I closed the survey with a 74% response rate and by 11am all my data was set up nicely. The lecture was completely ready–I just needed to run the regressions and drop them into my slides. Here’s what I got with the first one:
. reg midterm midterm_study_hrs
Source  SS df MS Number of obs = 107
+ F( 1, 105) = 0.62
Model  55.3687445 1 55.3687445 Prob > F = 0.4343
Residual  9436.22004 105 89.8687623 Rsquared = 0.0058
+ Adj Rsquared = 0.0036
Total  9491.58879 106 89.5432904 Root MSE = 9.4799

midterm  Coef. Std. Err. t P>t [95% Conf. Interval]
+
midterm_study_hrs  .2579158 .3285868 0.78 0.434 .9094427 .393611
_cons  81.5407 2.494516 32.69 0.000 76.59454 86.48687

Those of you familiar with regression will immediately see a problem. For each hour of studying, the average exam score drops by a quarter point. It’s not statistically significant, but still. I took a deep breath and controlled for prior exposure to the material:
. reg midterm midterm_study_hrs nostats
Source  SS df MS Number of obs = 107
+ F( 2, 104) = 0.63
Model  113.430728 2 56.715364 Prob > F = 0.5352
Residual  9378.15806 104 90.1745967 Rsquared = 0.0120
+ Adj Rsquared = 0.0071
Total  9491.58879 106 89.5432904 Root MSE = 9.496

midterm  Coef. Std. Err. t P>t [95% Conf. Interval]
+
midterm_study_hrs  .2160917 .3332468 0.65 0.518 .8769325 .4447492
nostats  2.363543 2.945505 0.80 0.424 3.477503 8.20459
_cons  80.98032 2.594512 31.21 0.000 75.83531 86.12534

Uh oh. Studying still has a negative effect and having no background in statistics increases exam scores. Again, nothing’s significant, but God help me if my class interprets these as causal effects. The most interesting result I had was that students who filled out the survey scored a highly significant 5 points higher on the midterm than those who didn’t. After a few more deep breaths, I fleshed out my lecture and walked to the classroom to teach.
I thanked them for filling out the survey and showed them some summary statistics–e.g., There was fairly wide variation in how much time they spend studying regularly, how much they studied specifically for the midterm, and in how much they attended lecture. Then I showed them the results for the first regression and asked for a volunteer to interpret the coefficient on study hours. There was a moment of dead silence followed by nervous laughter. One student eventually stated that more studying was associated with lower exam scores and there was more laughter.
That’s when we stepped back to look at the assumptions that underlie regression and together we identified which ones didn’t hold in this case. We spent the rest of the class extending the regression model and found that there was really nothing we could do with this data to get around the fundamental problem that unobserved confounders prevented us from plausibly estimating a causal effect of studying or lecture attendance. The below model includes everything I had, but results of more parsimonious models look extremely similar:
. reg midterm i.q1 i.q2 i.q3 i.q4 i.q5
Source  SS df MS Number of obs = 107
+ F( 13, 93) = 1.13
Model  1292.13126 13 99.394712 Prob > F = 0.3467
Residual  8199.45753 93 88.16621 Rsquared = 0.1361
+ Adj Rsquared = 0.0154
Total  9491.58879 106 89.5432904 Root MSE = 9.3897

midterm  Coef. Std. Err. t P>t [95% Conf. Interval]
+
q1_bg 
some  3.090785 3.193788 0.97 0.336 9.433014 3.251444
none  2.547597 3.235462 0.79 0.433 8.972583 3.877389

q2_lectures 
15  1.908923 7.591896 0.25 0.802 16.98492 13.16708
610  7.650552 7.231432 1.06 0.293 22.01074 6.709638
1112  6.123254 7.188493 0.85 0.397 20.39818 8.151669
13  2.858527 7.210257 0.40 0.693 17.17667 11.45961

q3_reg_study 
34 hrs  5.663912 2.648944 2.14 0.035 10.92419 .4036348
56 hrs  5.11663 3.149536 1.62 0.108 11.37098 1.137725
7+ hrs  5.584407 4.220263 1.32 0.189 13.96501 2.796198

q4_midterm_study 
48 hrs  .2385502 2.822675 0.08 0.933 5.366723 5.843823
9+ hrs  .562619 2.874504 0.20 0.845 6.270815 5.145577

q5_how_study 
half and half  1.760089 2.395351 0.73 0.464 6.516782 2.996603
mostly friends  6.778929 5.962874 1.14 0.259 18.62001 5.062157

_cons  92.27343 7.876745 11.71 0.000 76.63177 107.9151

During the last twenty minutes of class we broke into groups of 56 to talk about what we could do to remedy the situation. My teaching assistants and I wandered the room and facilitated the discussion. The groups came up with some good ideas. We could try to measure and control for raw mathematical ability and how much sleep students got the night before the exam since these were likely confounders. Another group thought we could collect the same measures before the final exam then see how changes in exam scores correlate with changes in study habits. That group invented individual fixed effects and I’m definitely going to do what they suggested.
In hindsight, I wish I had introduced multiple regression with a case study or two where it worked well. At the same time, I think they learned a healthy respect for the difficulty of data analysis and the limitations of seemingly advanced methods.