About TOY

We shall now consider how we can design a very simple machine that will think. Let us call it Simon, because of its predecessor, Simple Simon. By designing Simon, we shall see how we can put together physical equipment for handling information in such a way as to get a very simple mechanical brain. At every point in the design of Simon, we shall make the simplest possible choice that will still give us a machine that: handles information, transfers information automatically from one part of the machine to another, and has control over the sequence of operations. Simon is so simple, in fact, that it could be built to fill up less space than a grocery-store box, about 4 cubic feet. If we know a little about electrical work, we will find it rather easy to make Simon. [...]

We can say that Simon has a mentality of 4. We mean not that Simon is age 4 but just the simple fact that Simon knows only 4 numbers and can do only 4 operations with them. But Simon can keep on doing those operations in all sorts of routines as long as Simon has instructions. We decide that Simon will know just 4 numbers, 0, 1, 2, 3, in order to keep our model mechanical brain very simiple.

— EDMUND C. BERKELEY, Giant Brains; or Machines that Think (1949)
To help you better understand the nature of computation on your computer, we introduce in this section TOY, an imaginary machine designed for this book that is very similar to the computers that first came into widespread use in the 1970s. We study it today because it also shares the essentially characteristics of the modern day microprocessors found in your mobile device and your computer and everywhere else, not to mention countless other computing devices developed in the intervening years. [...]

TOY demonstrates that simple computational models can perform useful and nontrivial calculations, and also serves as a reference point that can help you understand the basic characteristics of your own computer. One of the remarkable facts of the evolution of computation over the past several decades is that all computers share the same basic architecture, an approach that was widely adopted almost immediately after it was first articulated by John von Neumann in 1945.

— ROBERT SEDGEWICK & KEVIN WAYNE, Computer Science: An Interdisciplinary Approach (2017)
Although MMIX has 256 different opcodes, we will see that they fall into a few easily learned categories.
— DONALD E. KNUTH, The Art of Computer Programming, Volume 1,
Fascicle 1: MMIX A RISC Computer for the New Millennium
(2005)

A Computer that’s Just Right

In [Robert] Southey's tale, three anthropomorphic bears — “a little, small, wee bear, a middle-sized bear, and a great, huge bear” — live together in a house in the woods.
— English Wikipedia: Goldilocks and the Three Bears (accessed in September 2018)
In Chapter 6 of Computer Science: An Interdisciplinary Approach (Addison-Wesley, 2017), Robert Sedgewick and Kevin Wayne describe a mythical computer called TOY. TOY has four times as many operations as Edmund Callis Berkeley’s sublimely simple computer, Simon, and one sixteenth as many as Donald Ervin Knuth’s idealistic machine, MMIX. Hence, it’s a little bit easier to compose interesting machine language programs using TOY than it is with Simon. And it’s a little bit easier to learn how to write machine-language programs using TOY than it is with MMIX. So if you’ve got a hankering for tinkering with a machine language (many readers are no doubt wondering from whence such a hankering could arise), and if Simon seems unreasonably small and MMIX looms dauntingly large, then TOY might be just right for you.

Why Tinker with TOY?

Many readers are no doubt thinking, “Why does Knuth replace MIX by another machine instead of just sticking to a high-level programming language? Hardly anybody uses assemblers these days.” Such people are entitled to their opinions, and they need not bother reading the machine-language parts of my books. But the reasons for machine language that I gave in the preface to Volume 1, written in the early 1960s, remain valid today.
— DONALD E. KNUTH, The Art of Computer Programming, Volume 1,
Fascicle 1: MMIX A RISC Computer for the New Millennium
(2005)

On page 945 of Computer Science: An Interdisciplinary Approach, Robert Sedgewick and Kevin Wayne give three reasons for composing machine-language programs.

  • Machine-language programs are still often preferred in applications.
  • Viewing a computation at this lowest level can expose its essence.
  • You can better appreciate programs such as compilers that create machine-language programs for your computer.

On page ix of The Art of Computer Programming, Vol. 1, 3rd ed. (1997), Donald Knuth gives the following reasons for using a machine-oriented language in his books.
  1. A programmer is greatly influenced by the language in which programs are written; there is an overwhelming tendency to prefer constructions that are simplest in that language, rather than those that are best for the machine. By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality.
  2. The programs we require are, with a few exceptions, all rather short, so with a suitable computer there will be no trouble understanding the programs.
  3. High-level languages are inadequate for discussing important low-level details such as coroutine linkage, random number generation, multi-precision arithmetic, and many problems involving the efficient use of memory.
  4. A person who is more than casually interested in computers should be well schooled in machine language, since it is a fundamental part of a computer.
  5. New algebraic languages go in and out of fashion every five years or so, while I am trying to emphasize concepts that are timeless.

Students are entitled to their opinions about TOY, and they need not enroll in the classes I teach. (But I really hope enough do to keep me gainfully employed!)

Stored Program Computers

TOY, Simon, MMIX, a laptop computer, and the computer in your cellphone are examples of stored program computers. Unlike machines that store instructions using an array (i.e. a sequence or series) of jacks or sockets into which patch cords can be inserted to complete an electrical circuit, a stored program computer stores its instructions in an array of memory locations, each of which in turn holds an array of bits. (I’ll say more about bits in just a little bit.)

Stored program computers have processors for executing instructions, registers for efficiently working with data, memory locations for storing larger quantities of instructions and data than will fit in the computer’s comparatively small number of registers, external storage devices for storing even more values than will fit into higher-performing memory, and input/output (IO) devices. TOY is a virtual machine that simulates all the aforementioned features of a stored computer except for external storage.

Bits. The instructions that a processor executes and their corresponding inputs, outputs, and intermediate data values are represented by sequences of binary values called bits. The classical notion of a binary bit is a single unit of information that at any one time can assume one of two values. In contrast, a “quantum bit” or qubit, the so-called unit of information for these newfangled quantum computer that we’ve been hearing about in the news, can allegedly be in a “coherent superposition of two states” at the same time! (So I’ve been told.)

We often imagine an array of banal binary bits as a series or pattern of 1s and 0s. For example, the quantity
1110001110111111 (1)
is a typical pattern TOY might encounter.

Sometimes you’ll hear someone say that a bit is set or unset or that a bit is high or low. It doesn't matter what symbols or words we use to refer to the value of a bit as long as the language we use is precise and the value of any given bit is one of only two possible values. What really matters is how sequences of bits are interpreted.

A sequence of 8 bits is often called a byte or sometimes an octet. One thousand bytes is called a kilobyte, abbreviated KB. A million bytes is one megabyte or one MB, and so on with the prefixes Giga (billion), Tera (trillion), Peta (quadrillion), Exa (quintillion), etc. Four bits may be called a half-byte, a nibble, a tetrade, or a semi-octet.

Before we conclude this bit about bits, it might interest you to know that I evaluated the statement

(Math.floor(Math.random() * 65536)).toString(2); (2)
to produce the psuedo-random pattern of 1s and 0s shown in (1). If you’re reading this page online, you can try evaluating the expression yourself by clicking or tapping on the statement in (2).

Memory. The individual sequences of 1s and 0s stored in a computer’s memory locations are accessed via integral address values associated with each location. The time required to access the bits stored at any particular address is essentially constant. In other words, the time required to retrieve or store one unit of data from memory does not depend significantly on the address at which the bits that define the unit of data are stored.

The smallest address used to reference data in a computer’s memory is address zero (0). The largest possible address is usually the largest integer value that can be represented by a word of data, i.e. the value 2word-size − 1. The word ‘word’ can refer to closely related but somewhat different concepts. Here we are actually talking about what is sometimes referred to has a “hardware word”; hence, we’ll start using that phrase now in an attempt to be more clear about what we mean.

A hardware word is a fixed-sized piece of data (i.e. a fixed-length sequence of bits) handled as a unit of information by the central processing unit(s) of a computer. TOY’s simulated central processing unit or CPU processes instructions composed of 16 bits of data, so TOY’s hardware word size is 16. MMIX, the mythical computer that Donald Knuth describes in Fascicle 1: MMIX A RISC Computer for the New Millennium and in the third edition of Volume 1 of The Art of Computer Programming: Fundamental Algorithms, has a hardware word size of 64.

Since TOY’s (or TOY4’s, rather) hardware word size is 16, the largest memory address that TOY4 can access, in theory, is address number 65535 (base 10). In practice, TOY4 has only 256 distinct memory locations because TOY4’s program counter or PC — a special register that holds the address of the next instruction to be executed — holds only 8 bits of data. If TOY4’s PC held 16 bits instead of 8, then TOY4’s memory could be much larger than it is. TOY4’s last actual memory address is 255 in base 10, which is 28 − 1 or 11111111 in base 2 or #ff in base 16.

Processors. A computer processor is an electronic device that manipulates data values by executing instructions in rapid succession. The instructions a processor executes are very similar to math operations. They usually take as input zero or more encoded values stored in registers or in memory and produce as output an encoded value stored in a register or in memory. For example, TOY’s ADD instruction takes two input values (the addends) stored in one or two registers and produces an output value (the sum), which is also stored in a register.

Recall that a computer program is an algorithm or a computational method expressed in a programming language. And a computational method is like an algorithm except that it might not terminate.

The low-level programming language a processor understands is that processor’s machine language. A program written in machine code is a program expressed in a machine language. A machine language is a set of instructions or operations — identified by values called opcodes — and associated rules for specifying the locations of inputs that may be required to perform the operations and the locations of any outputs that may be produced. TOY’s machine language consists of sixteen operations and rules pertaining to a fixed-length (16 bit) format used to encode an operation, the location of the operation’s required inputs (if any), and the output produced (if any) by performing the operation.

A processor performs operations belonging to its machine language on data stored in registers or in memory. Values stored in registers can be loaded from and stored to memory. And values stored in memory can be obtained from and sent to other devices such as external data storage devices and IO devices.

Sometimes the goal of an instruction is the production of a side effect rather than an output. For example, executing TOY’s halt instruction causes TOY to stop executing instructions, assuming TOY’s simulated processor was running a program when it encountered the halt instruction. This is a side effect of performing the halt operation, not an encoded output. One might view the halt instruction as a case of an operation that produces no output, or one might consider the final state of the program counter — a special register dedicated to the task of storing the address of the next instruction — to be the output of the halt instruction. Either way, the main objective of the instruction is to halt the program. Usually, we don’t care what the final value of the program counter is.

Programs and languages. Programmers use — and therefore must learn — all sorts of different programming languages. Some programming languages are cryptic and tightly coupled to the hardware a program runs on, while other languages are designed to be easier for humans to read and write with and can be used to produce programs that will run on many different types of computers.

Machine languages and assembly languages (an assembly language being one step above machine code) are classes of low-level languages that are simple but rather cryptic and assume a particular hardware platform. Java and JavaScript are example of a languages that are easier for people to read and can be used to develop software that will run on many different types of machines. However, high-level languages are often much more complicated than low-level languages, especially when you take into account the myriad libraries and environs that accompany instances and versions of a given language.

Programs written in an assembly language or a high-level programming language like JavaScript, Java, or C must either be interpreted directly by another computer program, must be translated into an intermediate language that can be interpreted by some program, or must be translated into a machine language that some processor can understand. No matter what, one program or another involved in the interpretation or execution of any program you write must be expressed in machine code so that it can be executed by a physical processor.

Some computers are designed to support the execution of self-modifying code, i.e. programs that modify their own instructions. Most modern computers, however, expect to execute static programs that change only data values, not the instructions that define the program itself. That said, one program can modify or even produce another program, since computer instructions can themselves be treated as data values. Whether a sequence of bits is an instruction or data is a matter of perspective. Sophisticated operating system software is often responsible for managing the execution of programs and controlling what values a program can and cannot modify.

Instructions and data. A given sequence of bits can be interpreted as an instruction in one context and as data in another. A computer’s program counter (PC) contains either the memory address of the next instruction to be executed or the address of the instruction just executed, depending on the computer’s design. As previously mentioned, TOY’s PC contains the address of the instruction to be executed next. Hence, bits stored in TOY’s memory will be interpreted by TOY as an instruction when when the next instruction is executed if the value of the PC — also known as the instruction pointer (IP) or the instruction address register (IAR) — is the address of those bits.

An instruction tells a computer what operation to perform from the set of operations the computer knows — adding two numbers together, for example — as well as where to get or put data values that may be associated with the operation, such as where to get the addends and where to put the resulting sum, in the case of addition. Data, on the other hand, represents a value of some type, such an integer, or a symbol in an alphabet, or a color, or any other sort of discrete entity.

TOY processes instructions 16 bits or two bytes at a time. When reading a sequence of two bytes from left to right, the first four bits of a TOY instruction tell TOY what operation to perform. The remaining 12 bits usually encode additional information about how to perform the operation, although sometimes certain bits may be ignored. For example, when interpreted as an instruction, the sequence of 16 zeros (0000000000000000) tells TOY to halt or stop processing instructions. The sequences 0000000000000001 and 0000111111111111 also tell TOY to halt, because TOY ignores all but the first four bits when processing a halt instruction.

Theory and practice. The challenge of properly translating a program from one language into another is very interesting, and programs that do so are complex. But the complexities associated with the actual execution of a program involve much more than simply translating a computer program from a high-level language into an intermediate language or into machine code. Furthermore, the basic way in which instructions are executed as illustrated by TOY is much simpler and easier to understand than what actually happens in practice. The way TOY works captures the essence of how computers operate, but in practice lots of clever techniques are used to improve the performance of a computer.

A single processing unit (as opposed to multiple processors or a multi-core processor) executes instructions in series, one instruction after another. But often a program can be structured in such a way that it is possible to perform portions of the program in parallel on more than one processing unit at a time. This is called parallel computing and is a major field of study in computer science. Sometimes programmers write explicit instructions that tell a computer to execute parts of a program in parallel if possible. And sometimes computer hardware designers build support for parallel computing directly into the machine.

Even the execution of a single operation can be more complicated than it first appears. This is obviously true when a processor is being simulated in software, as is the case with a virtual computer like TOY. But you’ll also find that some processors perform certain operations by producing sequences of instructions called microcode for other processors or processor-like circuitry to execute. The execution of a computer program involves layers upon layers of abstraction, each of which hides interesting details.

Add techniques like caching and branch prediction to the mix and we’ve got a very complicated system on our hands. Not to mention the inherent complexity of algorithms that attempt to translate our own speech from one spoken language into another, or programs that strive to safely drive our cars — with us in them!

Numbers. When performing addition and subtraction, TOY interprets sequences of bits that represent numbers as base 2 numbers.