Programming is also Teaching

by jottinge on August 20, 2021

Programming can be thought of as something that takes place as part of interacting with a culture – a culture with two very different audiences. One “audience” is the CPU, and the other audience is made of other programmers – and those other programmers are typically the ones who get ignored, or at least mistreated.

Programming has two goals.

One goal is to do something, of course: calculate an amortization table, present a list of updated feeds, snipe someone on Ebay, or perhaps smash a human player’s army. This goal is focused at a computing environment.

The other goal is – or should be – to transfer knowledge between programmers. This has a lot of benefits: it increases the number of people who understand a given piece of code, it frees a developer to do new things (since he’s no longer the only person who can maintain a given program or process), and it often provides better performance – since showing Deanna your code gives her a chance to point out where your code can improve. Of course, this can be a two-edged sword, because Deanna may have biases that affect the code she writes (and therefore, what you might learn.)

The CPU as an Audience

The CPU is an unforgiving target, but its very nature as a fixed entity (in most cases!) means it has well-known characteristics that can be compensated for and, in some cases, exploited.

The language in use has its own inflexible rules; an example can be seen in C, where the normal “starting point” for a program is a function with the signature “int main(int, char **)“. Of course, depending on your environment, you can circumvent that by writing “int _main(int, char **),” and for some other environments, you’re not expected to write main() at all; you’re expected to write an event handler that the library-supplied main() calls when appropriate.

The point is simple, though: there are rules, and while exceptions exist, one can easily construct a valid decision tree determining exactly what will happen given a program’s object code. Any errors can be resolved by modifying the decision tree to fit what is actually happening (i.e., by correcting errors.)

This is crucially important; flight code, for example, can be and has to be validated by proving out the results of every statement. If the CPU was not strictly deterministic, this would be impossible and we’d all be hoping that our pilot really was paying close attention every moment of every flight.

High-level languages like C (and virtually every other language not called “assembler”) were designed to abstract the actual CPU from the programmer while preserving deterministic properties, with the abstractions growing in scale over time.

Virtual machines bend the rules somewhat, by offering just-in-time compilation; Sun’s VM, for example, examines the most common execution path in a given class and optimizes the resulting machine code to follow that path. If the common path changes later in the run, then it can (and will) recompile the bytecode to run in the most efficient way possible.

Adaptive just-in-time compilation means that what you write isn’t necessarily what executes. While it’s possible to predict exactly what happens in a VM during execution, the number of state transitions is enormous and not normally something your average bear would be willing to undertake.

Adaptive JIT also affects what kind of code you can write to yield efficient runtimes. More on this later; it’s pretty important.

The Programmer as an Audience

The other members of your coding team are the other readers of your code. Rather than reading object code like a CPU does, they read source, and it’s crucially important how you write that source – because you have to not only write it in such a way that the compiler can generate good object code, you have to write it in such a way that humans (including you!) can read it.

To understand how people understand code, we need to understand how people understand.

People tend to learn slowly. This doesn’t mean that they’re stupid; it only means that they’re human.

A paper written in the 1950’s called “The Magical Number Seven, Plus or Minus Two” described how people learn: this is a poor summary, and I recommend that you read the original paper to learn more if you’re interested.

Basically, people learn by integrating chunks of information. A chunk is a unit of information, which can be thought of as mirroring how a neuron works in the human brain.

People can generally integrate seven chunks at a time, plus or minus two depending on various circumstances.

Learning takes place when one takes the chunks one already understands and adds a new chunk of information such that the resulting set of information is cohesive. Thus, the “CPU as an Audience” heading above starts with simple, commonly-understood pieces of information (“CPUs are predictable,” “C programs start at this function”) and refines it to add exceptions and alternatives. For some, the “chunk count” of the paragraphs on C’s starting points make up roughly four chunks – easily integrated by most programmers due to a low chunk count.

If the reader doesn’t know what C is, or doesn’t know what C function declarations mean or look like, those become new chunks to integrate, which may prove a barrier to learning.

Adoption of C++

Another example of chunking in action can be seen in the adoption of C++. Because of its similarity to C – in use by most programmers at the time – it was easily adopted. As it grows in features, adding namespaces, templates, and other changes, adoption is slower now because not only is there more to understand to C++ than there was, but it’s different enough from the “normal” language C that it requires a good bit more integration of new chunks than it did.

The result is that idiomatic C++ – where idioms are “the normal and correct way to express things” – is no longer familiar to C programmers. That’s not a bad thing – unless your goal is having your friendly neighborhood C programmer look at your C++ code.

It’s just harder, because there’s more to it and because it’s more different than it used to be.

Here’s the thing: people don’t really want to learn.

This is where things get hard: we have to realize that, on average, people really don’t want to learn all that much. We, as programmers, tend to enjoy learning some things, but in general people don’t want to learn that stop signs are no longer red, but are now flashing white; we want the way things were because they’re familiar. We want control over what we learn.

Our experiences become a chunk to integrate, and since learning is integration of chunks into a cohesive unit, new information can clash with our old information – which is often uncomfortable. Of course, experience can help integrate new information – so the fifth time you see a flashing white stop sign (instead of the octogonal red sign so many are familiar with), you will be more used to it and start seeing it as a stop sign and not something that’s just plain weird.

That said, it’s important to recognize that the larger the difference between what people need to know and what they already know, the harder it will be for them to integrate the new knowledge. If you use closures in front of someone who’s not familiar with anonymous blocks of executable code, you have to be ready for them to mutter that they prefer anonymous implementations of interfaces; named methods are good. It’s what they know. They’re familiar with the syntax. It’s safe.

This is why “Hello, World” is so important for programmers. It allows coders to focus on fairly limited things; most programmers quickly understand the edit/compile/run cycle (which often has a “debug” phase or, lately, a “test” phase, thank the Maven) and “Hello, World” lets them focus on only how a language implements common tasks.

Think about it: you know what “Hello, World” does. It outputs “Hello, World.” Simple, straightforward, to the point. Therefore, you look for the text in the program, and everything else is programming language structure; it gives you an entry point to look for, a mechanism to output text, and some measure of encapsulated routines (assuming the language has such things, and let’s be real: any language you’re working with now has something like them.)

This also gives programmers a means by which to judge how much work they actually have to do to do something really simple. The Windows/C version of “Hello, World,” as recommended by early programming manuals, was gigantic – in simple console-oriented C, it’s four lines or so, and with the Windows API, it turns into nearly seventy. This gives programmers an idea (for better or for worse) what kind of effort that simple tasks will require – even if, as in the case of Windows, a simple message actually has a lot of work to do. (In all fairness, any GUI “Hello World” has this problem.)

So how do you use how people learn to your advantage?

There’s a fairly small set of things to keep in mind when you’re trying to help people learn things quickly.

First, start from a known position – meaning what’s known to the other person. This may involve learning what the other person knows and the terminology they use; they may be using a common pattern, but call it something completely different than you do. Therefore, you need to establish a common ground, using terminology they know or by introducing terminology that will be useful in conversation.

Second, introduce changes slowly. This may be as simple as introducing the use of interfaces instead of concrete classes before diving into full-blown dependency injection and aspect-oriented programming, for example. Small changes give your audience a chance to “catch up” – to integrate what you’re showing them in small, easy-to-digest chunks.

Third, demonstrate idioms. If your language or approach has idiomatic processes that don’t leap off the page (i.e., what most closure-aware people consider idiomatic isn’t idiom at all to people who aren’t familiar with closures), you need to make sure your audience has a chance to see what the “experts” do – because chances are they won’t reach the idioms on their own, no matter how simple they seem to you.

Related to the previous point, try to stay as close to the audience as you can. Idioms are great, but if the person to whom you’re talking has no way to relate to what you’re showing them, there’s no way you’re going to give them anything to retain. Consider the Schwartzian Transform: it creates a map from a set, where the key is the sort field and the value is the element being sorted, sorts based on the keys, then creates a set in the order of the (now-sorted) keys. It uses a function to generate a sortable key in place, which could be a closure.

If your audience doesn’t understand maps well, or the audience is conditioned to think in a certain way (ASM, FORTH, COBOL, maybe even Java?) the Schwartzian Transform can look like black magic; Java programmers have a better chance, but even there it can look very odd because of the mechanism used to generate the sort keys. In Java, it’s not idiomatic, so you’d approach the Schwartzian Transform in small steps to minimize the difficulty integrating for the audience.

Conclusion

Programming is not just telling your CPU what to do, but it’s also teaching your fellow coders how you work, and learning from them how they work. It’s a collaborative effort that can yield excellent programs and efficient programmers, but can also offer confusing user interfaces and discouraged coders.

The difference is in whether the coding team is willing to take the time to code not just for the CPU or for the team, but to both the CPU and the team.

The CPU is usually easier to write for because it’s less judgemental and more predictable, but a CPU will never become great. It’s got a set of capabilities, fixed in place at manufacturing. It will always act the same way.

A team, though… a team of mediocre, inexperienced coders who work together and write for the benefit of the team has the capability to become a great team, and they can take that learning approach to create other great teams.

It all comes down to whether the team sees its work as simply writing code… or writing with the goal of both code and learning.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30