Do You Speak Python? Coding as a Second Language

Paul A. Kirschner & Mirjam Neelen

python1

Should children learn to code[1], starting at a young age? We don’t really think so, but there are others who believe it to be absolutely critical. It really doesn’t matter very much what we or others think as both sides of the discussion are nothing more and nothing less than opinions. As far as we know there hasn’t been any real research on this topic and until we have some evidence on this, no one is right or wrong.

But, no matter whether you believe that it’s important to teach children how to code or not, what does matter is that, if you teach children or young adults how to code, you need to do it well.

Like Tony Jenkins said back in 2002:

If computing educators are ever to truly develop a learning environment where all the students learn to program quickly and well, it is vital that an understanding of the difficulties and complexities faced by the students is developed. At the moment, the way in which programming is taught and learned is fundamentally broken (p 1).

And

Papers describe how visual hooks and props can be used to engage an audience, or how programming can be taught almost by subterfuge through the medium of a game. These are all fine ideas, but there is seldom any sign of any concrete evidence that these ways of teaching have any impact on learning (p 1).

Programming is usually seen as a STEM subject (Science, Technology, Engineering, Mathematics). For example, programming classes in college usually require having completed advanced math courses as prerequisites (Prat et al, 2020). However, interestingly, when we look at some studies on how people learn how to code, it seems that coding actually shares more features with learning to speak and write a second (natural) language than it does with maths or engineering. Maybe, just maybe, there’s a reason that we speak of programming or coding languages?

Such languages have their own words and symbols and also have their own grammars and syntaxes. When you code, you create meaning by stringing the symbols of the language together in a rule-based manner (especially grammar and syntax).

For example, Denny et al’s (2011) research shows that learners who learn how to code – in their case in Java – mostly struggle with the syntax. Their approach was to analyse the correctness of code submitted by learners as part of a drill and practice activity, in which they attempted to solve short Java programming exercises. ‘Drill and practice’ refers to problem-based exercises in which learners are presented with a description of a problem along with a corresponding method header. The learner then submits implementations of the method, which is then evaluated against a set of test cases (see student-authored question example – Figure 6 in the study itself).

Python2

Student-authored question example

Paul Denny and his colleagues looked at learner syntax errors and also analysed to what extent the frequency of errors was related to their overall performance. They found that weaker learners created source code with syntax errors in 73% of the cases and even the best learners did so around 50% of the time.

Another example is a recent study by Chantel Prat and her colleagues (2020) from the University of Washington. Their work shows that an aptitude for learning a second language is a stronger predictor for learning how to code than numeracy skills or fluid cognitive measures, such as fluid reasoning or working memory (the researchers included these because they’re known to relate to complex skill learning more generally).

In the study, people had to learn how to code a game in Python. The researchers say that they chose Python for a reason. Its philosophy aims to be ‘reader friendly’ and many of the ways this is accomplished have linguistic relevance. For example, Python uses indentation patterns that mimic paragraph style hierarchies instead of curly brackets. It also uses words (e.g., ‘not’ and ‘is’) to denote operations, instead of symbols that are used in other programming languages.

Python3

The researchers assessed the learning outcomes in three different ways: 1) learning rate, 2) programming accuracy (creating a Rock, Paper, Scissors Game), and 3) declarative knowledge (accuracy on a 50-item multiple choice test, including questions around the general purpose of functions, semantic knowledge, and syntactic knowledge).

Prat and her colleagues found that language aptitude was the strongest predictor of the learning rate when learning how to code in Python. Although general cognitive abilities and numeracy skills were also predictors for (some) Python learning rate, each of these factors explained less variance than language aptitude did.

In a nutshell, they found that across the various learning outcomes, learners’ language aptitude, fluid reasoning and working memory, and resting-state brain activity were all more robust predictors of Python learning than numeracy.

The researchers think this is because writing code is similar to learning a second language in that it’s about the skill to understand the vocabulary and grammar of that language as well as understanding how they interact to communicate ideas and intentions. In an interview in EurekAlert, Prat says that

Many barriers to programming, from prerequisite courses to stereotypes of what a good programmer looks like, are centred around the idea that programming relies heavily on math abilities, and that idea is not born out in our data.

She also says that

This is the first study to link both the neural and cognitive predictors of natural language aptitude to individual differences in learning programming languages. We were able to explain over 70% of the variability in how quickly different people learn to program in Python, and only a small fraction of that amount was related to numeracy.

Another example. Felienne Hermans, Alaaeddin Swidan, and Efhimia Aivaloglou (2018) studies how we can improve coding instruction by borrowing from instruction approaches that are usually used in language learning. For example, research on how children learn how to read shows that, when novice readers read a text out loud (oral reading or vocalisation), they understand the text better than when they read in silence. The idea behind this is that this is because vocalisation focuses attention on the text. Hermans and her colleagues studied whether reading coding language out loud supports learners in understanding the code as well. The first results are encouraging. Learners who read the code out loud remembered better what was in the code without losing out on their understanding of the code.

Hermans’ next step is the transfer from actual vocalisation to what is called ‘sub vocalisation’ (reading out loud in your head). We see in language education that when learners get older, they trust their ‘inner voice’ more. We know that sub vocalisation also improves understanding (also for proficient adult readers) (e.g., Daneman & Newson, 1992, Slowiaczek & Clifton, 1980, and Prior, Fenwich, Saunders, Ouellette, O’Quinn, & Harvey, 2011). Would this also be the case for learning how to code? That’s the question that Hermans will try to answer next.

If we gather more evidence that learning how to code is actually similar to learning how to learn a second language and that numeracy and math skills are way less important than we might think now, this could be a breakthrough! It could lead to an evidence-informed, completely different approach to learning to code, leading to effectively, efficiently, and enjoyably learning how to code!

Although the question if it would be useful to learn it at a young(ish) age obviously still remains.

References

Daneman, M., & Newson, M., (1992). Assessing the importance of subvocalization during normal silent reading. Reading and Writing, 4(1), 55–77. https://doi.org/10.1007/BF01027072

Denny, P., Luxton-Reilly, A., Tempero, E., & Hendrickx, J. (2011). Understanding the syntax barrier for novices. In ITiCSE ’11: Proceedings of the 16th annual joint conference on Innovation and technology in computer science education (pp. 208–212). Association for Computing Machinery. https://doi.org/10.1145/1999747.1999807

Hermans, F., Swidan, A., & Aivaloglou, E. (2018). Code phonology: An exploration into the vocalization of code. In ICPC ’18: Proceedings of the 26th Conference on Program Comprehension (pp. 308-311). Association for Computing Machinery. https://doi.org/10.1145/3196321.3196355

Jenkins, T. (2002, August). On the difficulty of learning to program. In Proceedings of the 3rd Annual Conference of the LTSN Centre for Information and Computer Sciences (Vol. 4, No. 2002, pp. 53-58).

Prat, C. S., Madhyastha, T. M., Mottarella, M .J., & Kuo, C. H. Relating natural language aptitude to individual differences in learning programming languages. Scientific Reports, 10, 3817 (2020). https://doi.org/10.1038/s41598-020-60661-8

Prior, S. M., Fenwick, K. D, Saunders, K. S., Ouellette, R., O’Quinn, C., & Harvey, S., (2011). Comprehension after oral and silent reading: Does grade level matter? Literacy Research and Instruction, 50(3), 183–194. https://doi.org/10.1080/19388071.2010.497202

Slowiaczek, M. L, & Clifton, C., (1980). Subvocalization and reading for meaning. Journal of Verbal Learning and Verbal Behavior, 19(5), 573–582. https://doi.org/10.1016/S0022-5371(80)90628-3

Swidan, A., & Hermans, F. (2019). The effect of reading code aloud on comprehension: An empirical study with school students. In CompEd ’19: Proceedings of the ACM Conference on Global Computing Education (pp. 178-184), Association for Computing Machinery. https://doi.org/10.1145/3300115.3309504

[1] In this blog, programming (or coding) refers to solving relatively straightforward problems as well as writing and evaluating the code.