Programming

Educators, generals, dieticians, psychologists, and parents program.
Armies, students, and some societies are programmed.
— ALAN J. PERLIS

from the foreword to Structure and Interpretation of Computer Programs
by Harold Abelson and Gerald Jay Sussman with Julie Sussman (1996)

Contents


Gene P. Buzzard

I think that it’s extraordinarily important that we in computer science keep the fun in computing. When it started out, it was an awful lot of fun.
— ALAN J. PERLIS

following the dedication of Structure and Interpretation of Computer Programs
“... in respect and admiration, to the spirit that lives in the computer.”
by Harold Abelson and Gerald Jay Sussman with Julie Sussman (1996)

Gene Buzzard was our chemistry teacher in high school. He was demanding and something of a curmudgeon, but I vividly remember his smile. He was a fun guy — the type that enjoys a pun, even a threadbare one. Mr. Buzzard, Dr. Buzzard as he was also known by the time I arrived on the scene in the 80s,1 really knew chemistry and how to teach it. Students admired him and worked hard in his classes.

Gene must have worn the same small set of indestructible polyester suits for decades. I think he often wore a tie, but I don’t remember for sure. It was never about the tie.

Occasionally Mr. Buzzard would play this trick on unsuspecting students. With a concerned look on his face he would suddenly inquire, “What’s matter?

Caught with their guards down, perplexed students would take the bait and assure Mr. Buzzard that “Nothing’s the matter.”

Wrong! Matter is anything that has mass and takes up space,” we were reminded.

The team work concept has, in part, allowed Snider High School to be regarded as one of the best public high schools in Indiana and, perhaps, the entire Midwest region. The fact that this has been accomplished in only sixteen years points out the potential involved in department cooperation and team teaching. Throughout this period the goal has remained the same: students should work to their maximum potential and we should not ask less of them. Teachers, as well, must do the same if the concept of team teaching is to remain a valid one and provide answers to many of today's problems.
— GENE P. BUZZARD, “How Can Science Survive in the High School?” (1984)
  1. In May 1983, Gene P. Buzzard was given an Honorary Doctorate in Secondary Science Education from Purdue University at West Lafayette, Indiana. This was the first award of its kind ever given by the University and was based on the recommendation of former students of Mr. Buzzard and faculty members at Purdue. Source: Journal of Chemical Education, Volume 61, Number 10, October 1984, p. 905.

Let’s talk about ...

[Intro: Punch it, (Hurb)]
Yo, I don’t think we should talk about this
(Come on, why not?)
People might misunderstand what we’re tryin’ to say, you know?
(No, but that’s a part of life)
Come on

[...]

Let’s talk about you and me
Let’s talk about all the good things
And the bad things that may be

[...]

Let’s tell it how it is, and how it could be
How it was, and of course, how it should be

— Salt-N-Pepa (1991)

Let’s talk about programming ...

Computer programming is the process of composing programs, which consist of data structures and sequences of operations (value transformations) that solve problems modeled by data.

Let’s talk about programming in general. Let’s also talk about some programming languages, so we can write programs we can execute. Let’s talk about writing programs that work. Let’s talk about expressing programs clearly and concisely. Let’s also talk about how, when confronted with alternate solutions to a problem, we can tell which solution is better, in some way or another, so we can make smart choices about which to use.

Whether we’re using English to talk about programming or discussing the syntax of a particular programming language, we have to be careful, because the languages and symbols and words and statements we employ can trip us up. Words in spoken languages may be ambiguous or have multiple meanings, which can lead to confusion and misunderstandings. On the other hand, the unambiguous meaning of words in computer languages is an opportunity for misuse and error if the one and only true meaning of a piece of code such as a word, an operator, an identifier, or a statement is not properly understood.

C’est la vie.

I need good definitions and good terminology, or else my discussion is just wishy-washy.
So I always try to choose a good name, and a good notation, whenever possible.
I also try to choose one that doesn’t conflict with other names.
— DONALD E. KNUTH, Companion to the Papers of Donald Knuth (2011)

“When I use a word,” Humpty Dumpty said, in rather a scornful tone,
“it means just what I choose it to mean — neither more nor less.”

“The question is,” said Alice,
“whether you can make words mean so many different things.”

“The question is,” said Humpty Dumpty,
“which is to be master — that”s all.”

— LEWIS CARROLL, Through the Looking-Glass (1872)
Lewis Carroll was fully aware of the profundity in Humpty Dumpty’s whimsical discourse on semantics. Humpty takes the point of view known in the Middle Ages as nominalism [...]

Even in logic and mathematics, where terms are usually more precise than in other subject matters, enormous confusion often results from a failure to realize that words men, “neither more nor less” than what they are intended to mean. [...]

[C]onfusion still persists in regard to the multivalued logics in which terms such as “and,” “not,” and “implies” have no common-sense or intuitive meaning; in fact, they have no meaning whatever other than that which is exactly defined by the matrix tables, which generate these “connective” terms. Once this is fully understood, most of the mystery surrounding these queer logics evaporates.

In mathematics equal amounts of energy have been dissipated in useless argumentation over the “meaning” of such phrases as “imaginary number,” “transfinite number,” and so on; useless because such words mean precisely what they are defined to mean: no more, no less.

On the other hand, if we wish to communicate accurately we are under a kind of moral obligation to avoid Humpty’s practice of giving private meanings to commonly used words.

— MARTIN GARDNER, The Annotated Alice (1960)

Problems

Half the battle is knowing what problem to solve.
— ALFRED V. AHO, JOHN E. HOPCROFT, & JEFFREY D. ULLMAN,
Data Structures and Algorithms (1983)
git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful [...] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
— LINUS TORVALDS, Message to Git mailing list (2006)
When using digital computers to solve a problem, programmers model the problem using abstract concepts — the notion of an integer or floating point number, a string of characters, truth values, a list of values, etc. The abstract concepts are then either represented precisely or are sufficiently approximated by patterns of finite numbers of bits (units of data that can assume one of two values) representing finite numbers of values that can appear in various storage areas of computers (memory locations, registers, on disk, in the cloud, ...).

The knapsack problem is a famous, well-studied problem. In one variation, you are given a knapsack with volume and weight limits. The challenge is to find a combination of items from a set of items, each with a weight, volume, and value, that you could put into the knapsack to maximize the combined value of the items in the sack without exceeding the sack’s weight or volume limits.

How, exactly, objects of various shapes can actually be stuffed into a sack notwithstanding, it’s fairly easy to model the knapsack problem.1 Numbers can be used to represent the value, weight, and volume of an item; and the numbers can be transformed (added together) to represent the total value, weight, and volume of items in the knapsack.

Modeling a problem can be very challenging if not impossible when the problem is something like develop a new recipe for chocolate chip cookies that nine out of ten people think is better than any chocolate chip cookie they have ever tasted before.

With an good model of a problem in hand and sensitivity to the limitations of the model, we can begin to think creatively about how to solve the problem by applying a series of operations that transform the values of the model.

[G]ame-playing is usually considered a waste of time and money. Perhaps this is a vestige of Puritanism: We are admonished that it is immoral to have fun while other people are suffering. We should rather be doing something, something useful, so that ... so that ... well, so that we can have some enjoyment in future years when we retire.
— DONALD E. KNUTH, “Are Toy Problems Useful?” (1977)
reprinted in Selected Papers on Computer Science (1996)
  1. As stated, the problem is not without potential pitfalls: Are we talking about selecting from among a few items or trillions of trillions of items? What are the ranges of values, weights, and volumes?

Algorithms

The notion of an algorithm is basic to all of computer programming,
so we should begin with a careful analysis of this concept.
— DONALD E. KNUTH, “The Art of Computer Programming”, Volume 1 (1968)

In section 1.1 of The Art of Computer Programming, Donald Knuth proposes a definition for the word “algorithm,” which he says is very similar to but a little different from a “recipe, process, method, technique, procedure, routine, rigmarole.1

According to Knuth, an algorithm is a finite set of rules that gives a sequence of operations for solving a specific type of problem; furthermore, an algorithm has five important features:
  1. Finiteness. An algorithm always terminates after a finite number of steps.
  2. Definiteness. Each step of an algorithm is precisely defined.
  3. Input. An algorithm has zero or more inputs: values that are given to it before the algorithm begins. The inputs to an algorithm provide a means of expressing a solution that applies to a class of problems rather than just one particular instance of the problem. For example, we can define an algorithm in terms of two input variables, x and y, that computes the value of dividing the value of x by the value of y. We certainly would prefer an algorithm that handles the general case rather than separate algorithms for every possible combination of two input values, even when the domain of possible input values is limited. That said, we might want one algorithm for computing the quotient of two integers and another for computing the quotient of two real numbers where at least one of the numbers is not a whole number.
  4. Output. An algorithm has one or more outputs: values that are related to the inputs.
  5. Effectiveness. One way to explain effectiveness is to say that it must be possible for a person using pencil and paper to perform each step or instruction (i.e. each operation in the sequence) exactly, in a finite amount of time. If a step involved expressing the exact value of an irrational number as a decimal expansion — the value of the number Pi or the square root of 2, for instance — then the operation would not be effective, because the decimal expansion of an irrational number cannot be expressed by a finite number of digits to the right of a decimal point. The following conditional instruction is another example of a noneffective step:
    Step 4. If every problem whose solution can be verified in polynomial time can also be solved in polynomial time, then go to step 1.
    Step 4 would not be effective until someone is able to show that there is an algorithm that determines whether every problem whose solution can be verified in polynomial time can be solved in polynomial time. This famous open problem in computer science is known as the P vs. NP problem. Here is another example:
    Step 1. Put on the ring the Hobbit wore. (Doing so will make you invisible.)
    Step 2. Take a cookie from the cookie jar.
    Step 1 is not effective, because no such ring exists. First, that particular ring was destroyed when Gollum fell into the fire with it at the Cracks of Doom, where the ring was forged. Second, The Lord of the Rings is a work of fiction. And third, rings that make people invisible do not exist. An effective alternative to Step 1 would be to wait until no one is looking.

Knuth points out that for an algorithm to be practical, not only must it be finite but it must be able to finish in a “reasonable” or very finite number of steps. An algorithm that requires some task to be performed a number of times equal to the largest integer ever named could in theory be finite but certainly wouldn’t be practical. (See Who Can Name the Bigger Number?)

Subproblems

You can't teach beginning programmers top-down design
because they don't know which way is up.
— CHARLES ANTHONY RICHARD HOARE
an “interesting quote

Since we’re talking about programming mainly in the sense of programming computers, not so much in the broader context of programming students, armies, societies, or anything else that might be programmable, one might wonder whether the example of trying to take a cookie from a cookie jar, which we used to explain how the steps of an algorithm must be effective, is germane.

These days, with a lot more than just talk about self-driving cars and autonomous weapon systems happening all around us, it seems plausible that a computer program could somehow be written to control a robot that could recognize when no one was watching and remove a cookie from a jar. Perhaps we can even imagine a more sophisticated program that could not only decide when no one was looking but could somehow predict whether anyone was likely to begin looking in the amount of time it would take for the robot under the program’s control to take a cookie out of the jar.

If we believe that an autonomous system capable of taking a cookie without being seen could be devised and controlled by a computer program, then not only can we see how the problem is relevant to our conversation about computer programming and algorithms, but we are also approaching a problem in a top-down manner.

When we do top-down programming, we specify operations like wait until no one is looking and take a cookie from the cookie jar that are themselves problems to be solved by algorithms, and we use the presumed solutions to those problems to create a solution to another problem — before solutions to the subproblems actually exist. When practicing bottom-up programming, we develop the solutions to subproblems problems first, and then combine those solutions to form solutions to larger problems.

Given a program that expresses a solution to a problem that includes solutions to many subproblems, one can’t necessarily tell whether the program was composed using a top-down or a bottom-up approach or some combination of the two. Unless programmed to do otherwise, beginning programmers tend to compose programs from the bottom up, perhaps because they don’t have the experience or confidence needed to go out on a limb and presume that the details of a solution to a sub-problem can be filled in later. Or maybe it’s just human nature to want to create something larger from smaller, finished pieces rather than creating the smaller pieces after you've already used them. Either way, it must certainly be a mistake to think that one approach is best for everyone all the time.

Everything should be built top-down, except the first time.
— ALAN J. PERLIS, “Epigrams on Programming”, #15 (1982)

Hansel and Gretel

In Once Upon an Algorithm (MIT Press, 2017), Martin Erwig uses the story of Hansel and Gretel to demonstrate an algorithm. In Hansel and Gretel, a famine has afflicted the region where the children live with their father and step-mother. The archetypical step-mother convinces Hansel and Gretel’s father to abandon the children in the forest so the adults will have more food for themselves. Hansel overhears his parents discussing the plot and devises an algorithm to solve the problem of finding a way out of the forest. When performed, Hansel’s algorithm solves the problem.

Hansel’s algorithm is composed of a sequence of simple steps, some of which involve literal steps. First, Hansel collects several pebbles. He surreptitiously drops the pebbles, one by one, as the family treks into the forest thereby forming a trail that Hansel and Gretel are able to follow back out of the forest after their parents are gone. When everything goes as planned, each step of Hansel’s algorithm is effective and transforms the state of various values: the number of pebbles in Hansel’s collection grows as he gathers pebbles and shrinks as he drops them; Hansel and Gretel's location changes as they move from pebble to pebble on their way into and out of the forest. The algorithm terminates when Hansel and Gretel return to the spot where Hansel dropped the first pebble.

For the sake of the story, the reader is asked to believe that Hansel had no problem collecting enough pebbles so that he wouldn’t run out too soon, that he could easily carry and conceal the pebbles, that it was easy to find the pebbles in the reverse order they were discarded, and so on. Programmers must pay close attention to these sorts of potential pitfalls when implementing algorithms.

  1. Other more-or-less similar characterizations of the notion of an algorithm exist besides the one provided by Knuth in The Art of Computer Programming.

Computational Methods

One fine winter’s day when Piglet was brushing away the snow in front of his house, he happened to look up, and there was Winnie-the-Pooh. Pooh was walking round and round in a circle, thinking of something else, and when Piglet called to him, he just went on walking.
— A. A. MILNE, Winnie-the-Pooh (1926)
A computational method is a procedure that might not terminate but otherwise has all of the characteristics of an algorithm.

Pooh and Piget Go Hunting and Nearly Catch a Woozle

In Chapter 3 of Winnie-the-Pooh by A. A. Milne, Pooh and Piglet perform the steps of a method that is very similar to the algorithm Hansel and Gretel used to find their way out of the forest. At first, Pooh is tracking a mysterious animal by following the paw prints it has left behind in the snow. “Do you think it’s a—a—a Woozle?” Piglet wonders. As Pooh and Piglet proceed, they notice that more animals appear to be joining up with the prey that Pooh was initially pursuing. The prospect of meeting up with a pack of beasts that seems to be increasing in size makes Piglet very nervous.

It turns out that Pooh was just following paw prints that he himself had made in the snow when he walked in a circle around a thicket. Pooh eventually realizes his mistake and terminates the hunt. Since the procedure that Pooh was performing — follow the tracks until you catch up with the creature — had the potential to continue indefinitely, at least in a world where we can imagine that snow never melts and a bear can walk in circles forever, we would call it a computational method rather than an algorithm, were we to use such terms to talk about Pooh and Piglet’s potentially perpetual procedure.


Programming Languages

The whole world spoke the same language, with the same vocabulary. Now, as people moved eastwards they found a valley in the land of Shinar where they settled. They said to one another, ‘Come, let us make bricks and bake them in the fire.’ For stone they used bricks, and for mortar they used bitumen. ‘Come,’ they said, ‘let us build ourselves a city and a tower with its top reaching heaven. Let us make a name for ourselves, so that we do not get scattered all over the world.’ Now Yahweh came down to see the city and the tower that the people had built. ‘So they are all a single people with a single language!’ said Yahweh. ‘This is only the start of their undertakings! Now nothing they plan to do will be beyond them. Come, let us go down and confuse their language there, so they cannot understand one another.’ Yahweh scattered them thence all over the world, and they stopped building the city. That is why it is called Babel, since there Yahweh confused the language of the whole world; and from there Yahweh scattered them all over the world.
— Genesis 11:1-9
(The New Jerusalem Bible)
A programming language is a means of precisely specifying a computational method. Programming languages give us the ability to create sequences of instructions that are unambiguous.

The authors of an article published on CodeLani.com, a website that dubs itself the computer language encyclopedia, estimated that there were between 500 and 2000 active, general purpose programming languages in the world in 2017. (See “How many programming languages are there in the world?”) Their estimate for all active computer languages — not just general purpose languages — was between 5000 and 25,000! The number of distinguishable programming languages explodes again if you count all the versions and variations of myriad languages, not to mention the actual tools needed to use a language.

Just trying to be clear about the language one is using — its name, feature set, and terms of use, to name a few characteristics that might matter — can be difficult. Sometimes a language (like Python, created by Guido van Rossum) is initially developed by one person: over time features are added, especially when a language becomes popular; more people participate in the development of the language; and naming systems emerge to differentiate between earlier and later versions of the language. Sometimes languages will merge to form a new, unified language, like the precursors to the the language now colloquially known as JavaScript did when the developers (web browser vendors) of those languages agreed to cooperate and formally refer to the new language as ECMAScript, a less than ideal name that has been criticized for sounding like a skin disease. Stewardship or ownership of a language may transfer hands, resulting in changes to long-term plans: this happened when Oracle Corporation acquired Sun Microsystems, the company that originally employed the creators of the Java programming language, which was initially called Oak before it was released to the public; in 2018, Oracle announced that customers who use Java SE (Standard Edition) for commercial purposes will have to start paying periodic subscription fees instead of a one-time perpetual license fee plus an annual support fee.

Not only are there myriad languages and often many versions of a given language, but the tools we actually use to work with programming languages vary too. For example, web browser vendors implement various subsets of ECMAScript in different versions of their browsers. The fact that JavaScript is present in so many different web browsers is one of the language’s advantages. The fact that JavaScript is implemented a little differently in so many web browsers is a problem that JavaScript developers must deal with.

Choosing

Buyer's remorse is the sense of regret after having made a purchase. It is frequently associated with the purchase of an expensive item such as a vehicle or real estate.
— Wikipedia, Buyer's remorse (accessed on 27 Oct 2018)

Whether you’re a teacher, an autodidact, an amateur, or a professional programmer, choosing a set of programming languages, libraries, and related tools can be a dicey decision. Often, there are many viable options and no obviously best one. Once you’ve committed yourself, there will always be opportunities to call your decision into question for one reason or another. All we can do is try to make smart choices and be prepared to ignore or respond to our critics, who, like the friends and long-lost relatives of a lottery winner, invariably come out of the woodwork.

Multilingualism

In like manner, the beginner who has learned a new language always translates it back into his mother tongue, but he assimilates the spirit of the new language and expresses himself freely in it only when he moves in it without recalling the old and when he forgets his native tongue.
— KARL MARX, Der 18te Brumaire des Louis Napoleon (1852)
translated by Saul K. Padover from the German edition of 1869

Someone who is only casually interested in computers and programming might be content learning just one programming language. But if you want to do more than just dabble in computer programming, sooner or later you’ll want to master more than one language.

Many programming projects require the use of more than one programming language, and few nontrivial projects involve only one. Over periods of time, professional programmers can expect to contribute to multiple projects of various kinds involving several languages. This is certainly true when the time period is a year or more, and it is often the case for periods as short as a week or less.

Programmers benefit from exposure to different kinds of programming languages. In More Programming Pearls, Confessions of a Coder (1988), the sequel to Programming Pearls (1986), published by Addison-Wesley, Jon Bentley presents an argument for multilingualism at the beginning of “Column 2: Associative Arrays”:

Anthropologists say that language has a profound effect on wold view. That observation, usually known as Whorf’s hypothesis, is often summarized as “Man’s thought is shaped by his tongue.”

Like most programmers, my computing thought is shaped by my Algol-like tongue. PL/1, C and Pascal look pretty much alike to programmers like me, and it’s not hard for us to translate such code into Cobol or Fortran. Our old, comfortable patterns can easily be expressed in these languages.

But other languages challenge the way we think about computing. We are amazed by Lispers as they work their magic with their S-expressions and recursion, by APL fans who model the universe as the outer product of a couple of long vectors, and by Snobol programmers who seem to turn any problem into a big string. We Algolish programmers may find it painful to study these foreign cultures, but the exposure usually yields insight.

Hardly any field of techniques for thinking is more fascinating than language.
— EDMUND C. BERKELEY, Giant Brains; or, Machines That Think (1949)

Learning and Teaching

It has often been said that a person does not really understand something until after teaching it to someone else. Actually, a person does not really understand something until after teaching it to a computer, i.e. expressing it as an algorithm. [...] Linguists thought that they understood languages, until they tried to explain languages to computers; they soon discovered how much more remains to be learned.
— DONALD E. KNUTH, Selected Papers on Computer Science (1996)
One way of learning by teaching, is the plastic platypus learning or platypus learning technique. This technique is based on evidence that show that teaching an inanimate object improves understanding and knowledge retention of a subject.
— Wikipedia, Learning by teaching (accessed on 2 Nov 2018)

Once you’ve selected a set of programming languages to learn, you’ll want a good team of teachers to go with them: books, websites, classes, workshops, parents, peers, siblings, colleagues, paid teachers.

Some teachers are tutorials designed for beginners; of those, some try to assist budding programmings who have no prior programming experience. Other teachers cater to programmers who have programming experience or a strong background in a closely related field like mathematics but are learning a new programming language from scratch. Other teachers, like language specifications and reference manuals, will be organized and written for programmers in need of a comprehensive, authoritative source of information — perhaps to resolve a heated debate between confident coders — rather than for someone trying to learn how to code. Some teachers won’t even teach you the particular programming language you are trying to learn, but will give you good problems for you to sink your teeth into. Some enlightened teachers even provide hints, answers, and difficulty ratings along with their problems.

Besides having good team of teachers, you will need to practice being a teacher yourself. You will have to use a programming language to write programs that “teach” a computer how to solve problems, because that’s the only way you’ll really learn to code.

Strive to teach readers — human readers — of your programs what it is you are trying to teach the computer to do. When you can do that well, you will feel confident that you are on your way to mastering the material you are trying to learn.

Programming

JavaScript

Java

The best book on programming for the layman is Alice in Wonderland,
but that's because it's the best book on anything for the layman.
— ALAN J. PERLIS

quoted by Ken Arnold, James Gosling, and David Holmes in
The Java Programming Language, 4th ed. (2006)
on p. 755, “Further Reading”


Programs

The only way to learn a new programming language is by writing programs in it.
The first program to write is the same for all languages:

Print the words
hello, world

This is the big hurdle; to leap over it you have to be able to create the program text some-
where, compile it successfully, load it, run it, and find out where your output went.
With these mechanical details mastered, everything else is comparatively easy.
— KERNIGHAN & RITCHIE, The C Programming Language (1988)
A computer program is a computational method expressed in a programming language.

To run or execute a program is to perform the steps of the program. A program that is “running” or being “executed” is a program that is being performed.

Computer programs can be very small and simple. Sometimes rather small programs are not so simple and must be written and explained very carefully. Clever programs can be small ones too.

Many computer programs are large and complex. Many of those are so large and complex that they could not possibly have been created by a single person. Teams of people are often required to maintain computer programs. Large programs can easily turn into big balls of mud.


Source Code

A program’s source code is the text that expresses the program and follows the rules of one or more programming languages.

Source code may include specially formatted sections of text known as comments that can be used to describe the program but do not affect what the program does when it runs. Comments structured in a very particular way may themselves be a type of code that serves as an input to a program that produces “pretty” program documentation as output.

Source code or just code may refer to the text of an entire program. Here's a short program written in the JavaScript programming language, for example:

/* The first program to write in JavaScript. */
(function() {
  "use strict"; // Opt out of sloppy mode.
  alert('hello, world\n');
})();
A code snippet is some source code that has been removed or snipped from the context of a larger program. Here is an example of a Java “snippet”:
System.out.print("hello, world\n");

And here is an entire program written in the Java programming language that includes the snippet:

public class FirstJavaProgram
{
    public static void main(String[] args)
    {
        System.out.print("hello, world\n");
    }
}


Documentation

Let us change our traditional attitude to the construction of programs.
Instead of imagining that our main task is to instruct a computer what to do,
let us concentrate rather on explaining to human beings what we want a computer to do.
— DONALD E. KNUTH, “Literate Programming” (1984)
Jon Bentley probably hit the nail on the head when he once was asked why literate programming hasn’t taken the whole world by storm. He observed that a small percentage of the world’s population is good at programming, and a small percentage is good at writing;
apparently I am asking everybody to be in both subsets.
— DONALD E. KNUTH, “Interview with Donald Knuth” (2008)
Program documentation is writing that explains or clarifies some aspect of a computer program. Self-documenting code refers to executable source code that is easy for someone to read and comprehend without any additional commentary.

Comments are one type of program documentation. Comments appear in a program’s source code. Comments can be terse notes that highlight or clarify small details, like the purpose or usage of a variable. Or they can be verbose commentary about major sections of a program or the program as a whole.

When program documentation and the parts of the program that the documentation refers to contradict each other, questions are raised about which represents the intended behavior of the program — the program documentation or the program’s actual instructions. Sometimes it is obvious which is wrong, but sometime’s it’s not. To avoid introducing contradictions when making changes to a program, programmers should remember that existing documentation may need to change too.

When should comments be added to code? Where should they be placed, exactly? How should they be formatted? What should they should consist of? These are tricky questions to answer, and opinions vary. Organization can make it easier to answer some of these questions by articulating policies and guidelines pertaining to documentation. Good answers will depend partly on how the code is intended to be used and who the anticipated audience is. Is the typical reader an expert who uses the language frequently, or is it more likely to be someone with very little coding experience?

“Throw-away” code that a programmer writes for fun or practice may not need the same sort of program documentation that code designed to be reused often by a large number of programmers requires. However, it can sometimes be difficult to predict how long a program will last or who all might use it in the future.

The need to explain source code via comments can be reduced through the use of carefully chosen names that suggest what a variable represents or what a subprogram does. But this type of built-in documentation does no good if people who need the information don’t have access to it. Program’s like Java’s javadoc utility translate information that appears in source code, including information that appears in structured comments, into documentation packages that can be published independently of the source code itself. The output of such tools can be no better than the writing that goes into them, of course.

The expense of uncaught errors made early in a design and implementation of a program accumulates over the program’s lifetime. Programmers can often catch errors and improve their programs simply by carefully explaining how a program works (or should work) to someone else.

Programmers are composers. The best programmers are ones who look out for the interests of both the users of their programs and the readers of their writing. Excellent programmers strive to write well in every language they use.


Compilers and Linkers

Programs called compilers take code for a program expressing a computational method in one language (the source language) as input and produce an equivalent computational method in another language (the target language) as output.

Compilers typically translate higher level languages into lower level languages. Usually the output produced by a compiler is not intended to be read or easily understood by humans. Rather, code produced as output of a compiler is usually intended to be read and processed by other programs or by a machine.

When we execute a compiler to translate the source code of a program into another language we say that we are “compiling” the source code of the program. Sometimes source code is translated directly into a language that a particular machine can process. Sometimes further processing is needed.

Programs that combine or “link” pieces of compiled code together into a program that can be executed by a machine are called linkers.

Interpreters

Sometimes code that has been compiled expresses a computational method in a language that cannot be executed directly by a machine but can be performed by a type of program called an interpreter, which carries out the steps of the computational method by executing statements that the interpreter understands, one after another.

The program that defines that computational method of the interpreter itself must be expressed in a language that a machine (or perhaps even another interpreter) can execute (or interpret).

Human-readable source code does not have to be compiled to be interpreted. For example, small snippets of JavaScript code or even large JavaScript programs, can be interpreted directly by JavaScript interpreters that are built into most if not all modern web browsers in widespread use today.


Virtual Machines

A virtual machine is a special case of an interpreter that performs the instructions of a compiled program intended for some sort of machine, either a physical computer or a mythical machine with its own so-called machine language.

Virtual machines have some distinct advantages and disadvantages.

If there are virtual machines available for all hardware platforms that a program may need to run on, then a compiler can be created that only has to be capable of translating programs into the language of the virtual machine. Furthermore, identical copies of a compiled program can be transferred to and executed on any physical machine that is hosting a suitable virtual machine. This simplifies the job of crafting a compiler, and it makes it easier for organizations to manage versions of programs that may need to run on various hardware platforms. The burden then shifts to the development and management of different versions of the virtual machine program itself.

A disadvantage of virtual machines is that programs generally cannot be executed as efficiently (i.e. as quickly) when run on a virtual machine as an equivalent program that has been compiled into the language of the physical machine that is running an instance of the virtual machine.

Knuth discusses some other advantages and disadvantages of programs like interpreters and virtual machines in section 1.4.3 of The Art of Computer Programming.


Subprograms

A subprogram is a program within a program. We say that a program invokes or calls a subprogram when it executes the instruction needed to get the subprogram started. A program that calls a subprogram may be referred to as the calling sequence, the calling program, or the caller of the subprogram.

Subprograms are often created to avoid duplicating instructions that may need to be performed multiple times at different places in a program. Subprograms can also be used to break up a large program into small pieces. When subprograms are viewed in terms of what problem they solve, rather than how they solve it, they can make it easier to understand a program that involves many components by hiding details that don’t necessarily need to be comprehended.

When a subprogram terminates, the execution of instructions is generally expected to resume at the instruction in the calling sequence that follows the instruction used to invoke the subprogram. Sometimes, however, a subprogram terminates “abnormally.” In Java and JavaScript, for example, abnormal termination happens when an error or exception is “thrown,” in which case control may be automatically transferred to an instruction sequence somewhere in the calling program designed to handle the situation.

A callback or call-after is an input to a subprogram that references another subprogram that the subprogram being invoked may transfer control to when it terminates.

The terms function, method, procedure, routine, subroutine, coroutine, and interpretive routine are nearly synonymous with ‘subprogram’. In fact, you will probably encounter one or more of those words more often than you encounter the word subprogram. We expect a function to always return some sort of value. When programmers familiar with today’s object-based or object-oriented programming languages hear the word method, they usually think of a subprogram associated with an object. In Volume 1 of The Art of Computer Programming, Knuth describes subroutines, coroutines, and interpretive routines. A subroutine is a special case of a coroutine, as Knuth explains in section 1.4.2. The Wikipedia article “Coroutine” (retrieved on 26 Oct 2018) contrasts coroutines with subroutines:

When subroutines are invoked, execution begins at the start, and once a subroutine exits, it is finished; an instance of a subroutine only returns once, and does not hold state between invocations. By contrast, coroutines can exit by calling other coroutines, which may later return to the point where they were invoked in the original coroutine; from the coroutine's point of view, it is not exiting but calling another coroutine.

Subprograms are written in terms of zero or more parameters. A parameter is a variable that refers to one of the inputs to a subprogram. The values of a subprogram’s parameters affect the subprogram’s actions and are subject to change from one execution of the subprogram to another. The variables that a subprogram is defined in terms of are sometimes called formal parameters.

Values that are assigned to a subprogram’s formal parameters when the subprogram is invoked are called arguments. An argument is an input to a subprogram. Sometimes arguments are called actual parameters.


Data Types

Understanding the distinctions among [types of data] is so important that we formally define the idea: a data type is a set of values and a set of operations defined on those values.
— SEDGEWICK & WAYNE, Computer Science: An Interdisciplinary Approach (2017)

A piece of data stored and manipulated by a digital computer is formed by a sequence of binary values or bits, which we often imagine to be some pattern of ones and zeros. For a series of ones and zeroes to have meaning, there must be a set of rules that govern how any particular pattern of bits should be interpreted and how they may be manipulated.

In the context of a computer program, a data type, or type, is a classification of patterns of bits and a set of operations. A type gives meaning to the patterns that belong to a given class and defines how those values (represented by patterns of bits) may be used as inputs to or may be produced as outputs of various algorithms.

For instance, a type called byte could be defined to be all possible 8-bit values interpreted as non-negative base 2 numbers with the operations addition (+), subtraction (-), multiplication (*), integer division (/), and remainder (%) defined such that each operation takes two 8-bit values of type byte as input and produces a single 8-bit value of type byte as output. A complete definition of the byte type would include a specification that determines the output produced for all possible combinations of inputs for each operation applicable to the type.


Primitive Types of Data

Programming languages like Java and JavaScript provide primitive types that can be used alone or combined to form other types of data.

Java’s Eight Primitive Types

The primitive types provided by the Java programming language are:

  • byte (8-bit unsigned integers)
  • short (16-bit signed integers)
  • int (32-bit signed integers)
  • long (64-bit signed integers)
  • float (32-bit floating point numbers)
  • double (64-bit floating point numbers)
  • char (16-bit single characters)
  • boolean (true or false)

NOTE: Java’s keyword void is not a type!

When we define a method in a Java program, we must specify the type of the value, if any, that is returned by the method when it terminates. If a method does not return a value, then this fact is signified by substituting the keyword void where the method’s return type would have otherwise appeared in the code. The keyword void is not a type in the Java programming language according to the language specification. The keyword void simply means that a method does not return a value.

JavaScript’s Six Primitive Types

The ECMAScript 2018 language specification defines seven data types. The six primitive types are:

  • Boolean
  • Null (For historical reasons, typeof null evaluates to "object".)
  • Undefined
  • Number (64-bit floating point numbers)
  • String
  • Symbol

(The seventh built-in JavaScipt data type is Object.)

Java’s Reference Types

The bit pattern of a primitive type of data encodes a value directly; end of story. Java’s non-primitive types, however, are reference types. The bit pattern of a reference type doesn't encode a value directly. Rather, the bit pattern of a reference type represents either a location (an address in memory) where the bits that encode the actual value (a sequence or collection of values) is stored, or the pattern represents the special literal value known as null, which means there is no value stored elsewhere in memory.

Java’s Array Type

An array of values (a sequence of values of some type) is a type of data (a reference type) that is built into the Java programming language. The language provides special syntax for indicating that a value is an array of some type of data, for specifying array literals, and for accessing the elements of an array.

Standard Java Types Defined by Classes

Examples of standard Java reference types that are defined by classes include Object, String, Number, various subclasses of Number including AtomicInteger, AtomicLong, BigDecimal, Byte, Double, etc.

User Defined Types

Languages like Java and JavaScript allow programmers to combine types to form new types.

User Defined Types Demo (Java)

Literals

A literal is a set of symbols used in a program to directly refer to a constant value as opposed to a location at which some value may be stored. Examples of typical literal values include

  • The single symbol 1 to represent an integer.
  • The four symbols 3.14 (the digit 3, a period, the digit 1, the digit 4) to represent a floating point number.
  • The three symbols 'a' (two single quotes and the letter a) to represent a single character.
  • The four symbols true (the letters t, r, u, and e) to represent a Boolean value.
  • The four symbols null (the letter n, u, l, and l) to represent the notion of no value.
  • The sixteen symbols "hello, world\n" to represent a string composed of 13 characters.

In contrast to variables, which can store any one of a class of fixed values, a given literal always represent the same immutable value. Literals may be used to initialize variables.

Identifiers, Reserved Words, and Keywords

An identifier is a name that may be used in a program to refer to an entity like a subroutine, a variable, a label, a type, a class, or a package.

A reserved word is a name that cannot be used as an identifier.

A keyword is a reserved word that has special meaning in a particular context.

Variables and Their Values

Programs manipulate values represented by by sequences of bits.

A variable consists of a storage location (a place where a value may be put) and an identifier that refers to that location. The “value of a variable” is the value that is stored in the place referred to by the variable’s identifier.

  • An initialized variable is a variable to which a particular value has been assigned.
  • An uninitialized variable is a variable that, at some point during the execution of a program, has no particular value in its storage location, because a value has not yet been put there by any instruction. (Every storage location must be associated with some sequence of bits, but what that sequence actually is may be unpredictable or undefined when a variable is uninitialized.)
  • A constant variable is a variable that, once initialized, will not allow its value to be changed.

Topics

  1. Executing Programs
    1. Terminology
      1. Invoking
      2. Calling
      3. Running
      4. Executing
    2. Processing machine code
    3. Interpreting code
  2. Programming languages
    1. Low-level languages
      1. Machine languages
      2. Assembly languages
    2. High-level languages
    3. Compiling code
    4. Linking code
  3. Working with source code
    1. Code editors
    2. Coding conventions
    3. Coding style
    4. Comments and documentation
    5. Documentation generators
    6. Documenting pre- and post-conditions
    7. Code beautification (prettyprint)
    8. Code obfuscation
    9. Debugging
  4. Some Fundamental Elements of Source Code
    1. Data, Values, and Literals
    2. Expressions
    3. References, Addresses, and Variables
    4. Names, Identifiers, and Scope
    5. Reserved words and Keywords
    6. Instructions, Commands, and Statements
  5. Input and Output
  6. Primitive Data Types
    1. Characters
    2. Strings
    3. Numbers
      1. Unsigned Integers
      2. Signed Integers
      3. Floating-point Numbers
    4. Boolean Values
    5. Type Conversion
      1. Implicit
      2. Explicit
  7. Variables
    1. Declaring Variables
    2. Initializing Variables
      1. Uninitialized Variables
      2. Default Values
      3. Inline Initialization
    3. Constant Variables
    4. Parameters
    5. Index Variables
  8. Operators and Operations
    1. Assignment Operator
    2. Comparison Operator
      1. Value Equivalence
      2. Instance Equivalence
      3. Type Equivalence
      4. Less Than
      5. Less Than or Equal
      6. Greater Than
      7. Greater Than or Equal
    3. Selection
    4. Mathematical Operations
      1. Addition
      2. Subtraction
      3. Multiplication
      4. Division
      5. Remainder
    5. String Concatenation
    6. Logical Operations
      1. Logical And
      2. Logical Or
      3. Logial Xor
      4. Logical Not
    7. Bitwise Operations
      1. Bitwise And
      2. Bitwise Or
      3. Bitwise Xor
      4. Bitwise Not
      5. Shift Operations
    8. Grouping Operator
    9. Operator Precedence
      1. MDN: JavaScript Operator Precedence
      2. Sedgewick/Wayne: Java Operator Precedence
  9. Control Flow
    1. Primitive control flow constructs
      1. Sequences of statements
      2. Labels and goto statements
      3. Subroutines
    2. Conditionals or Choice
      1. if condition then goto
      2. if condition then do
      3. if condition then do else do
      4. if condition then do else if condition ...
      5. Switch or case statements
    3. Loops
      1. condition-controlled
      2. collection-controlled
    4. Break Statements
      1. Unlabeled
      2. Labeled
  10. Subroutines
    1. Declaring
    2. Signatures
    3. Parameters
    4. Return type
    5. Defining
    6. Returning From
    7. Invoking
    8. Arguments
    9. Side Effects Of
    10. Types of Subroutines
      1. Functions
      2. Methods
      3. Coroutines
      4. Interpretive Routines
      5. Trace Routines
      6. Recursive subroutines
  11. Arrays
  12. Objects and Composite Data Types

Raising Chickens: Inputs, Outputs, and Side Effects

Donald Knuth tells us that algorithms are basic to all of computer programming. He also says that an algorithm has zero or more inputs and one output. Consider the following gambit.

Gambit C (Raise chickens).

Given some land, some chickens, some chicken feed, and a loan or a nice nest egg, a farmer tries to eke out a living.

  1. Sometimes chickens must fend for themselves.
  2. If the farmer has no chickens or goes bankrupt, the gambit terminates.1
  3. The farmer scatters chicken feed on the ground.
  4. Chickens scratch the ground, possibly damaging vegetation in the vicinity, and eat the feed.
  5. Chickens poop often, anywhere.
  6. Hens that are of age lay eggs daily in cozy places.
  7. The farmer collects most of the eggs most days.
  8. Sometimes a chicken dies.
  9. Sometimes chicks are produced as an output.2
  10. In the fall, the farmer gathers chicken manure and spreads it on his garden bed.
  11. The farmer eats eggs from the chickens and vegetables from the garden. (The farmer poops.)
  12. The farmer tries to sell eggs for a good price.
  13. The farmer buys more chicken feed if doing so makes cents.
  14. The cycle continues. (Go to step 1.)

What are the inputs to Gambit C? What are the outputs? Does Gambit C produce any side effects? How is a chicken itself like an algorithm?

NOTES

  1. Moral: Don’t put all your eggs in one basket. Learn how to farm, not just how to raise chickens.3
  2. Assuming a rooster provides the necessary inputs to the algorithm.
  3. In other words, if you want to be a programmer, don't learn a programming language: learn how to program and how to learn programming languages. The language you learn today will probably not be the language you’ll need to know how to use tomorrow.

Farmers and Programmers

Unsere ‘Reichen’ — das sind die Ärmsten! Der eigentliche Zweck alles Reichtums ist vergessen!
(“Our ‘rich people’ – those are the poorest! The real purpose of all wealth has been forgotten!“)1

— NIETZSCHE, Werke in drei Bänden (1966)

Good programmers are like good farmers. The average professional programmer might make more money these days than the average small farmer. But the best farmers, like the best programmers, are likely to be wealthy self-starters.2

Farmers develop effective routines. Feeding animals, milking cows, sheering sheep, and planting and harvesting crops are examples of jobs that farmers perform over and over again. Good farmers figure out efficient ways of performing important, repetitive tasks.

Farmers need tools that are well-suited to the problem at hand, and they need quite a few tools, because they must solve lots of problems. Sometimes a shovel is the right tool for digging a small hole, but a backhoe or an excavator might get a big job done faster. A tractor and mechanical baler might be needed to make tons of hay. Wire and bolt cutters come in handy. Farmers need tools that are easy to use, reliable, and not too expensive to replace or fix when they break.

Good farmers keep their tools organized. They need to be able to find a tool quickly when it’s needed. On a well-managed farm, everything has a place where it belongs, and everything is always in its place.

On the one hand, good farmers know and use standard techniques. On the other hand, they are often forced to devise creative solutions to problems. Farmers know where to focus their attention, because they can’t afford to waste time on things that are relatively unimportant. Many of the best farmers are heavily invested in fences, have a dog that keeps an eye out for trouble, and raise some free-range chickens that search for omnipresent bugs, which, if not too unwieldy, are promptly eliminated by said chickens when found.

The Essence of Programming

Programmers practice the art of composing algorithms from building blocks (basic values and operations) and controlling the sequential execution of algorithms. The execution of a computer program is, at its lowest level, the execution of a series of simple operations (algorithms) defined by the language of the machine a program runs on. These operations are applied, one after another, to simple types of inputs (also defined by the language of the machine) to produce outputs.

Programming languages allow programmers to combine very simple elemental values and operations to form composite values and additional algorithms, which in turn can be used to create more complex values and algorithms. This process of building information structures and algorithms out of components built from components can repeat itself many times.

Programs and programmers are limited by finite capacities of machines. The time required for an algorithm to complete is also often an important consideration and will depend partly on the speed of machines and partly on the ingenuity of algorithm designers. Some problems may prove to be impossible to solve in a reasonable amount of time under any circumstances. The ability of humans to comprehend the nature of extremely complex programs can also be a limiting factor.

High Level Programming Languages and Libraries

High level programming languages define primitive types of data such as numeric, Boolean, and alphabetic types. Data types determine how sequences of binary digits are used to represent information and are associated with operations or operators (i.e. algorithms) that take zero or more values of particular types as inputs and produce an output.

High level programming languages provide:
  • ways of executing algorithms, including ways of supplying input values to an algorithm and ways of accessing the output value produced;
  • ways of sequencing algorithms to be executed;
  • and ways of combining elements to form larger, more sophisticated algorithms.

Standard libraries of algorithms often accompany high level programming languages. Some algorithms in standard libraries provide access to hardware devices, enabling the programmer to receive inputs from and send outputs to users. Most library routines save the programmer from having to create frequently used algorithms from scratch.

Primitive and Built-in Types

The Java programming language provides eight primitive types.

Description Primitive Type Examples of Literals
8-bit signed two’s complement integer byte -128, -0, 0 (default), 0xa, 127
16-bit signed two's complement integer short
32-bit signed two's complement integer int
64-bit two's complement integer long
single-precision 32-bit IEEE 754 floating point float
double-precision 64-bit IEEE 754 floating point double
a single 16-bit Unicode character char '?', 'a', 'z', '\u0061', '\u007a'
either the value true or the value false boolean

SEE

  1. https://en.wikipedia.org/wiki/Primitive_data_type
  2. https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html

Operators

SEE

  1. https://docs.oracle.com/javase/tutorial/java/nutsandbolts/operators.html

Case Study: Something for Nothing

Java

JavaScript

Case Study: DecimalValue

Java

JavaScript