Mark Urban-Lurain is the Associate Director for Engineering Education Research at Michigan State University. He’s also the Principle Investigator on an NSF-sponsored project developing methods and software for Automated Analysis of Constructed Responses. Open-ended questions force students to think differently than multiple choice questions, but are much harder to grade. In this episode we talk to Mark about how the project uses machine learning to evaluate and analyze free text answers in order to shed new light on student understanding and misconceptions.
You can subscribe to the Teach Better Podcast through your favorite podcast app or simply subscribe through iTunes if you don’t have one yet.
0:00 ⏯ Intro
0:35 ⏯ Welcome Mark Urban-Lurain! Where pedagogy and technology meet at automated analysis of constructed responses. How Mark got started combining technology and teaching–and a long history at Michigan State University going back to instructional TV. Using mainframes, teletype machines, and statistics to improve teaching. People are sometimes more open with computers than with other humans.
4:30 ⏯ It’s not about the technology, it’s the learning, says a Spartan through-and-through. Starting to analyze short-answer questions with data from a biology professor by adapting a tool for analyzing survey responses. From biology to statistics, chemistry, chemical engineering, and now physiology. K-12 pedagogical content knowledge and middle-school science argumentation.
7:11 ⏯ Why free response vs. multiple choice? Students study differently and using higher-order strategies when studying for an open-response format test and perform better on the m/c questions. Getting more insight into the mix of right and wrong ideas about the topic. You can also discover misconceptions this way. How this is different from analyzing essays, as ETS does. An example from genetics which gets at a central concept in biology. The development of the questions is very labor-intensive. Students are very confused about this topic. Mark and his team have used additional interviews and questions to identify why, and this paved the way for instructional materials which actually improve performance on these questions.
14:46 ⏯ Another example question, this one from cellular biology: cellular respiration and energy from glucose. Explaining a phenomenon vs. demonstrating your knowledge of the discipline’s vocabulary. The software reads hundreds of correct and wrong answers and “learns” to recognize the differences. Using analytical vs. holistic scoring rubrics. The whole process requires a lot of work up front, but eventually the process takes only a few minutes to “understand” the differences between right and wrong answers.
19:19 ⏯ The computer is matching words and groups of words: It works at the lexical level. The goal is to get the same level of inter-rater reliablity between the machine learning and experts as between experts: The kappa should be 0.80 or better. The wrong answers represent conceptual errors that can be addressed through instruction. Preliminary evidence suggests that the software can ignore things like poor language skills. The software is intended for formative assessment.
24:28 ⏯ This machine learning approach has limitations: It can’t evaluate well, for instance, compare/contrast answers. The wording of the question can be critical to getting the software to learn which answers or better. It can’t just be a vocab test. A first-year biology textbook has more vocabulary than a first-year French textbook. The vocabulary is shorthand for ideas.
27:57 ⏯ Instructors get the questions, put them in whatever LMS they’re using, administer it as a low-stakes homework assignment. The instructor loads a spreadsheet of the student responses into the AACR website, and the software gives feedback about the class as a whole. AACR is both an on-going NSF-funded research project, and also a web portal where faculty can use the software as a service. The AACR website can be found by Googling “AACR constructed response.” You can also just go to http://www.msu.edu/~aacr/
32:32 ⏯ Many professors are skeptical at first. Then they often simply want to know how many got the questions right. But after that, professors drill down to what students are NOT understanding, and the instructional strategies that can remedy the problems. AACR as faculty development is not about the technique: it’s about understanding what students DON’T know. One way to address student misconceptions is to drive the student reach a contradiction: i.e., the Socratic method. Faculty have developed exercises for addressing common misconceptions. And these are the kinds of issues that interest faculty, and faculty enjoy talking to peers about them.
38:11 ⏯ This is a research project, so they’re still learning. One next step is to develop heuristics to speed up the process of developing questions and ‘teaching’ the machine. Doug on research: “If you found it the first time, it would just be called ‘search.’” Doug heard this from Desika Narayanan who quoted his PhD advisor on The Pulse. Science as a process rather than a pile of facts.
43:09 ⏯ Thanks and signing off.