Peter Karp's Technical Blog: 2015

Wednesday, April 1, 2015

Teaching the Internet: One of the Big Ideas of Computer Science

The Internet is not only a ubiquitous tool in our society, but it is an amazing intellectual and engineering achievement. Just as every person in our society should learn the basics of how electricity works, every person in our society should learn the basics of how the Internet works -- both because most people use it almost every day, and because there are some enchanting ideas behind the Internet. I believe its basic elements could be taught to middle school students in several days using the following approach.

What does it even mean to connect computers together using a network? By analogy, the telephone network allows us to connect any two phones in the world so that they can exchange audio streams. Despite the huge size and complexity of the telephone network, it's quite simple for users to establish a phone call without any knowledge of the underlying network. For example, to place a call, they don't need to know that the call must pass through a switching station in (for example) New Jersey. Similarly, users of any two computers on the Internet can establish a connection between their computers in which the computers exchange streams of bits (ones and zeros) that are represented as different voltage levels on the electrical cables connecting the computers. Again, the users can open a connection without specifying that their data must flow through a certain trans-Pacific cable.

But how does the fact that computers can exchange ones and zeros allow them to exchange web pages or cat videos? There are really three key aspects of understanding how computers represent data: the first is that ones and zeros are the basic currency of computers, the primitives with which all digital data are encoded; the second is that any integer can be represented as a sequence of ones and zeros by encoding it in base two (binary); the third is that digital data such as text, images, and audio are encoded within computers and computer networks by translating them into numbers (and then, into a stream of ones and zeros).

In sixth grade grade my son learned how to view decimal numbers as numbers encoded using base ten. That is, to consider every digit in a decimal number as a multiplier for a power of ten (the "ones place", the "tens place", the "hundreds place", etc.). Once you grasp that notion for base ten, it's easy to change the base to another number, to teach how to encode numbers in base two. When a number is encoded in binary, it becomes a sequence of ones and zeros.

The next step is to describe how to encode various types of data as numbers. Encoding text as numbers is simple: we assign a different number to every character. So we can encode the sequence of characters within a book or a web page by translating each character to its corresponding number. Images can be encoded as numbers by assigning a different number to every pixel in the image according to the brightness of the pixel (color images can be encoded by using three numbers for each pixel, one for the brightness of each primary color). Thus, an image becomes a long series of numbers (which are translated to an even longer series of bits). A video can be encoded as a sequence of images. Then it becomes possible to explain how many images and videos might fit into one gigabyte, and how the speed of an Internet connection affects the time required to download a cat video.

Once we can explain how a cat video is encoded as a sequence of bits, we can explain how that sequence of bits can be segmented into packets, what packet switching is, and why packet switching facilitates sharing of network bandwidth, failure recovery, and management of network congestion. We can explain that links in the network can be constructed using a range of physical implementations, from ethernet to optical fiber to wifi. We can explain how every computer has an address within the network, and the importance of routing algorithms in directing packets along the most efficient path through the network. We can explain that packets have headers that specify their destination address and that prevent them from circulating in the network indefinitely.

We can also explain what the difference is between the Internet and the World Wide Web: the notion that the Internet speaks multiple languages, or protocols, to accomplish different tasks. Consider this analogy to a possible extension to the telephone system: imagine that we added one additional digit to every telephone number, where that digit specifies which of ten possible human languages the answerer should speak when they answer the phone. Similarly, every Internet connection specifies a port number that designates whether the computer that answers the connection should be prepared to receive an email message, or provide a web page, or initiate a Skype conversation, or should provide data for an interactive multi-user aerial combat game.

The Internet is extremely complex, and I don't mean to imply that every aspects of its workings can be explained in a few days. But I believe that its essential aspects can be explained in a few days at the middle school or high school level. You don't have to be a computer scientist or a programmer to understand basic aspects of how the Internet works.

Tuesday, March 31, 2015

Teaching Computer Science without Teaching Coding

I'm all for teaching more people how to code, but let's be clear that it's possible to understand a lot about how computers work, and about computer science, without knowing how to program. And if we insist that people learn how to program before they learn how computers work, we will be setting up a barrier that will ultimately prevent most people from learning how computers work. Because most people find coding boring and don't have the patience or motivation to learn all the tedious details required to get programs to work. Why not teach the interesting concepts first as a way of motivating people how to learn to code?

I'll also note that people can -- and many people do -- learn how to program without learning grand ideas of computer science. Learning these grand ideas, like computational complexity theory, will make someone a better programmer. But many programmers never get that far.

I think we can teach basic aspects of how the Internet works to someone who does not know how to code. I also think we could explain basic aspects of computational complexity theory to someone who does not know how to program. For example, I think most people assume that all computational problems are linear in the size of the problem input. Imagine it takes 1 second for a computer to add up a million numbers. If you double the difficulty of the problem, you will double the time required to compute the answer, right? Meaning that in this case, it would take about 2 seconds for the computer to add up two million numbers? That happens to be the correct answer for this computational problem. But the fact that some computational problems are worse than linear will astound most people, I think. For example, for some problems the time required to compute a problem solution is the square of the difficulty of the problem. But it is easy to explain examples of hard computational problems, like the Traveling Salesman problem.

So let's not think that learning to code is a prerequisite for understanding grand ideas of computer science, just as learning to be an auto mechanic is not a prerequisite for learning basic concepts of how cars work.

Sunday, March 29, 2015

Every Educated Person Should Learn the Grand Ideas of Computer Science

We take for granted that educated people should understand the grand ideas of science. For example, the Big Bang theory, the theory of evolution, the atomic theory of matter -- these ideas are major human achievements. They are also fun and beautiful ideas that are essential to understanding the world around us, and to participating in our increasingly technical society. Every high school student learns these ideas -- you don't have to earn an undergraduate degree in biology to learn the theory of evolution.

Unfortunately, high school students do not learn the grand ideas of computer science. I believe that most non-computer scientists do not know any of the grand ideas of computer science, nor could they enumerate the subfields of computer science. Most people equate computer science with programming, which is a bit like equating physics with electronics design. Computer science and physics are the sciences behind the technologies.

I posit that all high school students should learn the major results of computer science at a basic level: ideas such as computer networks, operating systems, development of programming languages and databases, artificial intelligence, computational complexity theory, and computer graphics. I believe that the major results from these subfields could be explained at the high school level in a month. I also believe they could be explained without teaching students how to code, and that the recent emphasis on teaching coding is a distraction from the bigger and more important notion of teaching the grand ideas of computer science to every member of our society.