Mirjam Neelen & Paul A. Kirschner
We want people to learn better. To this end schools are implementing new curricula, lesson plans, learning management systems and so forth. Companies are delivering new trainings, using virtual reality, or coaching trajectories. And researchers are carrying out cutting edge research where children or adults discuss more, think more deeply, or whatever. And now what? Were these interventions worth their weight in gold or in dung? And how do we even answer that question? The ‘normal’ answer could be: We measure whether they’ve learnt, but… we all know it’s not that simple.
When it comes to measuring learning, and especially as it relates to research on teaching and education, there are at least two problems. First, there are those who claim that they have measured learning when they actually haven’t. The second is that there are people who claim that you can’t measure learning at all because learning ‘happens in your head’ and we can’t measure it, we can only measure behaviours or performance.
Problem 1: Measuring learning using poor proxies
Paul is Editor-in-Chief for the Journal of Computer Assisted Learning. The title says it all: It’s a journal that is a podium for researchers who study learning whereby information and communication technologies play an important role. However, he regularly has to deal with submissions in which learning – no matter in what shape or form – just hasn’t been measured. People submit studies that measure learner engagement or learner well-being, and/or whether learners have worked hard or who say that the lessons were interesting, and/or what learners or their instructors say has been learnt and if so, what and how much. Finally, he sees submissions that measure the extent to which learners enjoyed working with the learning intervention (in ICT studies apparently important).
These are all examples of poor proxies (substitutes) for learning. We have blogged about these before (see here) and of course there’s the original article by Robert Coe (2013). As a reminder, here’s a list of Coe’s poor proxies for learning.
Such studies raise the question – shamelessly stolen from David Merrill:
“And? Have they learnt anything?”
It’s probably not surprising that articles that use poor proxies for learning aren’t considered for review, let alone publication, in ‘his’ journal (and preferably also not for other research journals, although if this was really the case, then this blog probably wouldn’t have been written).
Take a recent article on supporting a self-directed and discovery-based learning environment. In the study, the authors used an instructional design approach based on ‘constructivistic learning theories’ [their words] in order to implement various strategies to support learners. The researchers looked at learner engagement in the self-directed online learning environment. So far, so good.
The results, based on the interactions of learners and their commitment to the available learning modules made it possible, according to the authors, to conceptualise a multimodal supportive strategy for self-directed discovery learning. They also claimed that their recommendations can serve as examples to support self-directed learning in blended learning environments. Huh?
We’re not being sarcastic here when we say that we think it’s great when learners are engaged and committed. That’s what we as teachers, instructional designers, and/or learning experience designers want. What we also, and possibly more importantly, want is for the learners to have learnt. And, thus, the question ‘Did they learn anything?’ remains. Why would anyone follow and/or implement these authors’ recommendations? Wouldn’t that be risky (and probably costly with respect to time, teacher/instructor effort, and student/learner effort) if we don’t know whether the perceived engagement/commitment contributed anything to actual learning?
For educational/learning researchers or anyone who wants or needs to leverage research results for whatever reason (hobby, practice, more research): It’s really fantastic if and when we try to study different things through a wide variety of lenses and/or try to do our work in an evidence-informed manner, but really, make sure you answer the question: And? Have learners learnt anything?
Also, for those of us who want or need to measure learning in whatever way, shape, or form in practice to prove impact: We’re not claiming it’s easy. Often, it’s actually really difficult. There are tons of variables, we’re often dealing with multiple systems with multiple types of data points, and this list of challenges goes on and on. This is a given. Just be real. Don’t claim you’ve measured learning if you actually haven’t. Don’t propagate or implement approaches or innovations if you don’t know if they have led to learning. And don’t claim that we don’t even have to try, which brings us to the second problem.
Problem 2: Claiming that we can’t measure learning
In contrast to those claiming that they’re measuring learning while they’re not, there are also people who claim that we can’t measure learning at all. They say that we can measure performance or behaviours, or knowledge retention, but that learning refers to the ‘thing’ that happens in our heads and often is a long, messy process. Therefore, they conclude, we shouldn’t claim that we can measure learning and if we can’t measure it, then we don’t need to.
Of course, this is a bit of a semantic discussion and simply claiming that you can’t measure learning is a) too generic and b) dangerous because, if we accept that we can’t measure learning, we’ll stop trying and if we do that, we’ll always remain in a field that does things based on intuition, gut-feelings, and beliefs. If you’re a reader of our blogs, then you know us a little bit by now and you understand that in our view this is extremely harmful for learners. It’s definitely possible to measure learning and performance outcomes, effectiveness, efficiency, and (in a research context) learning processes. Put all together: it’s possible to measure learning.
Here’s an example of an article that used eye tracking to study expertise reversal of the multimedia signalling effect (Richter & Scheiter, 2018). Multimedia signals highlight correspondences between text and pictures. This is supposed to support text-picture integration in our brains (learning!) from multimedia. Previous research suggests that learners with little prior knowledge benefit from the multimedia signalling effect while learners with much prior knowledge don’t (hence ‘expertise reversal’). The researchers tracked learners’ eye movements while they used a digital textbook. The textbook had two versions; a basic version with mostly text signals (e.g., bold face) and an extended version with additional multimedia signals supporting text-picture integration (e.g., colour coding of corresponding text and picture elements). The researchers assessed both learning outcomes and learners’ cognitive load and gaze behaviour. The results indeed showed that learners lacking prior knowledge were supported by the multimedia signalling while learners with prior knowledge weren’t (but it wasn’t a disadvantage either so it’s only a partial expertise reversal effect). This study demonstrates that it’s possible to study the learning process using eye tracking and relate the process to the learning outcomes. This is a nice example of how we can learn more about how people approach learning and how their approach impacts learning outcomes.
What to do?
Again, measuring learning isn’t easy and we need to define it well. Will Thalheimer (2018) developed the Learning Transfer Evaluation (LTEM) model and we feel this is a helpful tool to a) consider what ‘kind of learning’ you’re actually measuring or should be measuring, depending on the learning or performance objective so that b) you avoid poor proxies for measuring learning and/or can be realistic about what you’re able to measure (sometimes, in real life, there are only so many possibilities, depending on context, time, stakeholders, systems in place, and so forth).
In Thalheimer’s words, the model is designed “to help organisations and learning professionals determine whether their evaluation methods are effective in providing valid feedback” (p 11). Note that the LTEM model is designed for both workplace learning and education settings.
|Tier||Description||Measures (some examples)|
|1-3||These are completely inadequate and therefore unacceptable to measure learning. These are the ‘poor proxies’ tiers, so to speak.||Surveys, interviews, focus groups.|
|4||This is about knowledge retention (e.g., facts, terminology, meaningful constructs; including concepts, principles, generalisations, processes, and procedures.) and can be measured immediately after training or with a delay. If knowledge retention is required, you also need to think about how to help learners remember long-term (e.g., spaced learning).||Multiple choice, multiple response, or open questions.|
|5||This measures the learner’s ability to make decisions.||Realistic (low or high fidelity) scenarios, asking learners to make realistic decisions based on their interpretation of the scenario information or we can ask learners to make decisions in real-world situations.|
|6||This is about task competence, which means making the right decision in combination with taking the right action. ||Present learners with realistic situations, have them evaluate those, enable them to make decisions, and have them take actions in line with those decisions.|
|7||This refers to actual demonstrated competence at work (i.e., learning transfer).||Select a relevant performance situation to target.|
|8||This is about ‘learning as a means to an end’. For example, “we train nurses not just so they transfer their skills to their work, but because we expect to get better patient outcomes. We train managers in leadership skills not just so they perform better as managers, but because we hope their teams will achieve better results” (p 24).||“(a) consider learning outcomes that affect an array of stakeholders and their environs, (b) look for both benefits and harms, and (c) use rigorous methods to ensure we’re able to draw conclusions about the causal impact of the learning experience itself” (p 25).|
Again, none of this is easy but that shouldn’t be an excuse to not try! Let’s STOP pretending we measure learning when all we do is using poor proxies for learning and let’s STOP saying that we can’t measure learning at all, because we can. As long as we properly define what we mean when we say ‘learning’ and as long as we acknowledge that we can measure at different levels, there are options. Only then can we answer the question: “…annnnd, did they learn anything?”
Coe, R. (2013) Improving Education: A triumph of hope over experience. Inaugural Lecture of Professor Robert Coe, Durham University, 18 June 2013. Essay version available at http://www.cem.org/attachments/publications/ImprovingEducation2013.pdf Video at https://vimeo.com/70471076
Richter, J., & Scheiter, K. (2019). Studying the expertise reversal of the multimedia signaling effect at a process level: evidence from eye tracking. Instructional Science, 47(6), 627-658.
Thalheimer, W. (2018). The Learning-Transfer Evaluation Model. Available at https://www.worklearning.com/wp-content/uploads/2018/02/Thalheimer-The-Learning-Transfer-Evaluation-Model-Report-for-LTEM-v11.pdf
 A proxy or proxy variable is a variable that is not in itself directly relevant, but that serves in place of an unobserved or not actually measured variable.
 For all examples, we have opted to NOT include references to researchers as the intention is not to ‘blame and shame’ but to give illustrative examples
 Note that tier 3 has two tones of grey because there are useful things you can measure at a learner perceptions level (although, we repeat, you’re NOT measuring learning). You can ask learners if they think that they’ll have an opportunity to apply what they’ve learned on the job, if they feel they have the right level of support when back on the job, how they experienced a training, what they found difficult or easy, etcetera.
 Thalheimer (2018) discusses the difference between decision-making and competence extensively in his report. To put it simply, the difference is knowing how to do something well and actually doing it well. Thalheimer distinguishes between task competence (i.e., demonstrating competence to complete a task successfully in or right after training) and transfer (tier 7).
 Thalheimer uses two criteria for learning transfer: 1) people have had to previously engage in some sort of learning experience and 2) they have to use what they’ve learned on some other targeted performance situation (e.g., a job or hobby).