Skip to content

Index Python Tu Ple Assignment


Chapter 10  Tuples

10.1  Tuples are immutable

A tuple1 is a sequence of values much like a list. The values stored in a tuple can be any type, and they are indexed by integers. The important difference is that tuples are immutable. Tuples are also comparable and hashable so we can sort lists of them and use tuples as key values in Python dictionaries.



Syntactically, a tuple is a comma-separated list of values: >>> t = 'a', 'b', 'c', 'd', 'e' Although it is not necessary, it is common to enclose tuples in parentheses to help us quickly identify tuples when we look at Python code:

>>> t = ('a', 'b', 'c', 'd', 'e') To create a tuple with a single element, you have to include the final comma:

>>> t1 = ('a',) >>> type(t1) <type 'tuple'> Without the comma Python treats as an expression with a string in parentheses that evaluates to a string: >>> t2 = ('a') >>> type(t2) <type 'str'> Another way to construct a tuple is the built-in function . With no argument, it creates an empty tuple:

>>> t = tuple() >>> print t () If the argument is a sequence (string, list or tuple), the result of the call to is a tuple with the elements of the sequence: >>> t = tuple('lupins') >>> print t ('l', 'u', 'p', 'i', 'n', 's') Because is the name of a constructor, you should avoid using it as a variable name.

Most list operators also work on tuples. The bracket operator indexes an element:

>>> t = ('a', 'b', 'c', 'd', 'e') >>> print t[0] 'a' And the slice operator selects a range of elements.

>>> print t[1:3] ('b', 'c') But if you try to modify one of the elements of the tuple, you get an error:

>>> t[0] = 'A' TypeError: object doesn't support item assignment You can't modify the elements of a tuple, but you can replace one tuple with another: >>> t = ('A',) + t[1:] >>> print t ('A', 'b', 'c', 'd', 'e')

10.2  Comparing tuples

The comparison operators work with tuples and other sequences; Python starts by comparing the first element from each sequence. If they are equal, it goes on to the next element, and so on, until it finds elements that differ. Subsequent elements are not considered (even if they are really big). >>> (0, 1, 2) < (0, 3, 4) True >>> (0, 1, 2000000) < (0, 3, 4) True The function works the same way. It sorts primarily by first element, but in the case of a tie, it sorts by second element, and so on.

This feature lends itself to a pattern called DSU for
Decorate
a sequence by building a list of tuples with one or more sort keys preceding the elements from the sequence,

Sort
the list of tuples using the Python built-in , and

Undecorate
by extracting the sorted elements of the sequence.
For example, suppose you have a list of words and you want to sort them from longest to shortest: txt = 'but soft what light in yonder window breaks' words = txt.split() t = list() for word in words: t.append((len(word), word)) t.sort(reverse=True) res = list() for length, word in t: res.append(word) print res The first loop builds a list of tuples, where each tuple is a word preceded by its length.

compares the first element, length, first, and only considers the second element to break ties. The keyword argument tells to go in decreasing order.



The second loop traverses the list of tuples and builds a list of words in descending order of length. So among the five character words, they are sorted in reverse alphabetical order. So "what" appears before "soft" in the following list.

The output of the program is as follows: ['yonder', 'window', 'breaks', 'light', 'what', 'soft', 'but', 'in'] Of course the line loses much of its poetic impact when turned into a Python list and sorted in descending word length order.

10.3  Tuple assignment

One of the unique syntactic features of the Python language is the ability to have a tuple on the left hand side of an assignment statement. This allows you to assign more than one variable at a time when the left hand side is a sequence.

In this example we have a two element list (which is a sequence) and assign the first and second elements of the sequence to the variables and in a single statement. >>> m = [ 'have', 'fun' ] >>> x, y = m >>> x 'have' >>> y 'fun' >>> It is not magic, Python roughly translates the tuple assignment syntax to be the following:2 >>> m = [ 'have', 'fun' ] >>> x = m[0] >>> y = m[1] >>> x 'have' >>> y 'fun' >>> Stylistically when we use a tuple on the left hand side of the assignment statement, we omit the parentheses, but the following is an equally valid syntax: >>> m = [ 'have', 'fun' ] >>> (x, y) = m >>> x 'have' >>> y 'fun' >>> A particularly clever application of tuple assignment allows us to swap the values of two variables in a single statement: >>> a, b = b, a Both sides of this statement are tuples, but the left side is a tuple of variables; the right side is a tuple of expressions. Each value on the right side is assigned to its respective variable on the left side. All the expressions on the right side are evaluated before any of the assignments.

The number of variables on the left and the number of values on the right have to be the same:

>>> a, b = 1, 2, 3 ValueError: too many values to unpack More generally, the right side can be any kind of sequence (string, list or tuple). For example, to split an email address into a user name and a domain, you could write:

>>> addr = '[email protected]' >>> uname, domain = addr.split('@') The return value from is a list with two elements; the first element is assigned to , the second to . >>> print uname monty >>> print domain python.org

10.4  Dictionaries and tuples

Dictionaries have a method called that returns a list of tuples, where each tuple is a key-value pair3. >>> d = {'a':10, 'b':1, 'c':22} >>> t = d.items() >>> print t [('a', 10), ('c', 22), ('b', 1)] As you should expect from a dictionary, the items are in no particular order.

However, since the list of tuples is a list, and tuples are comparable, we can now sort the list of tuples. Converting a dictionary to a list of tuples is a way for us to output the contents of a dictionary sorted by key: >>> d = {'a':10, 'b':1, 'c':22} >>> t = d.items() >>> t [('a', 10), ('c', 22), ('b', 1)] >>> t.sort() >>> t [('a', 10), ('b', 1), ('c', 22)] The new list is sorted in ascending alphabetical order by the key value.

10.5  Multiple assignment with dictionaries

Combining , tuple assignment and , you can see a nice code pattern for traversing the keys and values of a dictionary in a single loop: for key, val in d.items(): print val, key This loop has two iteration variables because returns a list of tuples and is a tuple assignment that successively iterates through each of the key/value pairs in the dictionary.

For each iteration through the loop, both and are advanced to the next key/value pair in the dictionary (still in hash order).

The output of this loop is: 10 a 22 c 1 b Again in hash key order (i.e. no particular order).

If we combine these two techniques, we can print out the contents of a dictionary sorted by the value stored in each key/value pair.

To do this, we first make a list of tuples where each tuple is . The method would give us a list of tuples--but this time we want to sort by value not key. Once we have constructed the list with the value/key tuples, it is a simple matter to sort the list in reverse order and print out the new, sorted list. >>> d = {'a':10, 'b':1, 'c':22} >>> l = list() >>> for key, val in d.items() : ... l.append( (val, key) ) ... >>> l [(10, 'a'), (22, 'c'), (1, 'b')] >>> l.sort(reverse=True) >>> l [(22, 'c'), (10, 'a'), (1, 'b')] >>> By carefully constructing the list of tuples to have the value as the first element of each tuple, we can sort the list of tuples and get our dictionary contents sorted by value.

10.6  The most common words

Coming back to our running example of the text from Romeo and Juliet Act 2, Scene 2, we can augment our program to use this technique to print the ten most common words in the text as follows: import string fhand = open('romeo-full.txt') counts = dict() for line in fhand: line = line.translate(None, string.punctuation) line = line.lower() words = line.split() for word in words: if word not in counts: counts[word] = 1 else: counts[word] += 1 # Sort the dictionary by value lst = list() for key, val in counts.items(): lst.append( (val, key) ) lst.sort(reverse=True) for key, val in lst[:10] : print key, val The first part of the program which reads the file and computes the dictionary that maps each word to the count of words in the document is unchanged. But instead of simply printing out and ending the program, we construct a list of tuples and then sort the list in reverse order.

Since the value is first, it will be used for the comparisons and if there is more than one tuple with the same value, it will look at the second element (the key) so tuples where the value is the same will be further sorted by the alphabetical order of the key.

At the end we write a nice which does a multiple assignment iteration and prints out the ten most common words by iterating through a slice of the list ().

So now the output finally looks like what we want for our word frequency analysis. 61 i 42 and 40 romeo 34 to 34 the 32 thou 32 juliet 30 that 29 my 24 thee The fact that this complex data parsing and analysis can be done with an easy-to-understand 19 line Python program is one reason why Python is a good choice as a language for exploring information.

10.7  Using tuples as keys in dictionaries

Because tuples are hashable and lists are not, if we want to create a composite key to use in a dictionary we must use a tuple as the key.

We would encounter a composite key if we wanted to create a telephone directory that maps from last-name, first-name pairs to telephone numbers. Assuming that we have defined the variables , and , we could write a dictionary assignment statement as follows: directory[last,first] = number The expression in brackets is a tuple. We could use tuple assignment in a loop to traverse this dictionary.

for last, first in directory: print first, last, directory[last,first] This loop traverses the keys in , which are tuples. It assigns the elements of each tuple to and , then prints the name and corresponding telephone number.

10.8  Sequences: strings, lists, and tuples--Oh My!

I have focused on lists of tuples, but almost all of the examples in this chapter also work with lists of lists, tuples of tuples, and tuples of lists. To avoid enumerating the possible combinations, it is sometimes easier to talk about sequences of sequences.

In many contexts, the different kinds of sequences (strings, lists and tuples) can be used interchangeably. So how and why do you choose one over the others?



To start with the obvious, strings are more limited than other sequences because the elements have to be characters. They are also immutable. If you need the ability to change the characters in a string (as opposed to creating a new string), you might want to use a list of characters instead.

Lists are more common than tuples, mostly because they are mutable. But there are a few cases where you might prefer tuples:
  1. In some contexts, like a statement, it is syntactically simpler to create a tuple than a list. In other contexts, you might prefer a list.

  2. If you want to use a sequence as a dictionary key, you have to use an immutable type like a tuple or string.

  3. If you are passing a sequence as an argument to a function, using tuples reduces the potential for unexpected behavior due to aliasing.
Because tuples are immutable, they don't provide methods like and , which modify existing lists. However Python provides the built-in functions and , which take any sequence as a parameter and return a new list with the same elements in a different order.



10.9  Debugging

Lists, dictionaries and tuples are known generically as data structures; in this chapter we are starting to see compound data structures, like lists of tuples, and dictionaries that contain tuples as keys and lists as values. Compound data structures are useful, but they are prone to what I call shape errors; that is, errors caused when a data structure has the wrong type, size or composition or perhaps you write some code and forget the shape of your data and introduce an error.

For example, if you are expecting a list with one integer and I give you a plain old integer (not in a list), it won't work.

When you are debugging a program, and especially if you are working on a hard bug, there are four things to try:
reading:
Examine your code, read it back to yourself, and check that it says what you meant to say.

running:
Experiment by making changes and running different versions. Often if you display the right thing at the right place in the program, the problem becomes obvious, but sometimes you have to spend some time to build scaffolding.

ruminating:
Take some time to think! What kind of error is it: syntax, runtime, semantic? What information can you get from the error messages, or from the output of the program? What kind of error could cause the problem you're seeing? What did you change last, before the problem appeared?

retreating:
At some point, the best thing to do is back off, undoing recent changes, until you get back to a program that works and that you understand. Then you can start rebuilding.
Beginning programmers sometimes get stuck on one of these activities and forget the others. Each activity comes with its own failure mode.



For example, reading your code might help if the problem is a typographical error, but not if the problem is a conceptual misunderstanding. If you don't understand what your program does, you can read it 100 times and never see the error, because the error is in your head.



Running experiments can help, especially if you run small, simple tests. But if you run experiments without thinking or reading your code, you might fall into a pattern I call "random walk programming," which is the process of making random changes until the program does the right thing. Needless to say, random walk programming can take a long time.



You have to take time to think. Debugging is like an experimental science. You should have at least one hypothesis about what the problem is. If there are two or more possibilities, try to think of a test that would eliminate one of them.

Taking a break helps with the thinking. So does talking. If you explain the problem to someone else (or even yourself), you will sometimes find the answer before you finish asking the question.

But even the best debugging techniques will fail if there are too many errors, or if the code you are trying to fix is too big and complicated. Sometimes the best option is to retreat, simplifying the program until you get to something that works and that you understand.

Beginning programmers are often reluctant to retreat because they can't stand to delete a line of code (even if it's wrong). If it makes you feel better, copy your program into another file before you start stripping it down. Then you can paste the pieces back in a little bit at a time.

Finding a hard bug requires reading, running, ruminating, and sometimes retreating. If you get stuck on one of these activities, try the others.

10.10  Glossary

comparable:
A type where one value can be checked to see if it is greater than, less than or equal to another value of the same type. Types which are comparable can be put in a list and sorted.

data structure:
A collection of related values, often organized in lists, dictionaries, tuples, etc.

DSU:
Abbreviation of "decorate-sort-undecorate," a pattern that involves building a list of tuples, sorting, and extracting part of the result.

gather:
The operation of assembling a variable-length argument tuple.

hashable:
A type that has a hash function. Immutable types like integers, floats and strings are hashable; mutable types like lists and dictionaries are not.

scatter:
The operation of treating a sequence as a list of arguments.

shape (of a data structure):
A summary of the type, size and composition of a data structure.

singleton:
A list (or other sequence) with a single element.

tuple:
An immutable sequence of elements.

tuple assignment:
An assignment with a sequence on the right side and a tuple of variables on the left. The right side is evaluated and then its elements are assigned to the variables on the left.

10.11  Exercises


Exercise 1   Revise a previous program as follows: Read and parse the "From" lines and pull out the addresses from the line. Count the number of messages from each person using a dictionary.

After all the data has been read print the person with the most commits by creating a list of (count, email) tuples from the dictionary and then sorting the list in reverse order and print out the person who has the most commits.
Sample Line: From [email protected] Sat Jan 5 09:14:16 2008 Enter a file name: mbox-short.txt [email protected] 5 Enter a file name: mbox.txt [email protected] 195
Exercise 2   This program counts the distribution of the hour of the day for each of the messages. You can pull the hour from the "From" line by finding the time string and then splitting that string into parts using the colon character. Once you have accumulated the counts for each hour, print out the counts, one per line, sorted by hour as shown below. Sample Execution: python timeofday.py Enter a file name: mbox-short.txt 04 3 06 1 07 1 09 2 10 3 11 6 14 1 15 2 16 4 17 2 18 1 19 1

Exercise 3   Write a program that reads a file and prints the letters in decreasing order of frequency. Your program should convert all the input to lower case and only count the letters a-z. Your program should not count spaces, digits, punctuation or anything other than the letters a-z. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at .



1
Fun fact: The word "tuple" comes from the names given to sequences of numbers of varying lengths: single, double, triple, quadruple, quituple, sextuple, septuple, etc.
2
Python does not translate the syntax literally. For example if you try this with a dictionary it will not work as might expect.
3
This behavior is slightly different in Python 3.0.

This is an older version of the book now known as Think Python. You might prefer to read a more recent version.

Chapter 9

Tuples

9.1 Mutability and tuples

So far, you have seen two compound types: strings, which are made up of characters; and lists, which are made up of elements of any type. One of the differences we noted is that the elements of a list can be modified, but the characters in a string cannot. In other words, strings are immutable and lists are mutable.

There is another type in Python called a tuple that is similar to a list except that it is immutable. Syntactically, a tuple is a comma-separated list of values:

Although it is not necessary, it is conventional to enclose tuples in parentheses:

To create a tuple with a single element, we have to include the final comma:

Without the comma, Python treats as a string in parentheses:

Syntax issues aside, the operations on tuples are the same as the operations on lists. The index operator selects an element from a tuple.

And the slice operator selects a range of elements.

But if we try to modify one of the elements of the tuple, we get an error:

Of course, even if we can't modify the elements of a tuple, we can replace it with a different tuple:

9.2 Tuple assignment

Once in a while, it is useful to swap the values of two variables. With conventional assignment statements, we have to use a temporary variable. For example, to swap and :

If we have to do this often, this approach becomes cumbersome. Python provides a form of tuple assignment that solves this problem neatly:

The left side is a tuple of variables; the right side is a tuple of values. Each value is assigned to its respective variable. All the expressions on the right side are evaluated before any of the assignments. This feature makes tuple assignment quite versatile.

Naturally, the number of variables on the left and the number of values on the right have to be the same:

9.3 Tuples as return values

Functions can return tuples as return values. For example, we could write a function that swaps two parameters:

Then we can assign the return value to a tuple with two variables:

In this case, there is no great advantage in making a function. In fact, there is a danger in trying to encapsulate , which is the following tempting mistake:

If we call this function like this:

then and are aliases for the same value. Changing inside makes refer to a different value, but it has no effect on in . Similarly, changing has no effect on .

This function runs without producing an error message, but it doesn't do what we intended. This is an example of a semantic error.

As an exercise, draw a state diagram for this function so that you can see why it doesn't work.

9.4 Random numbers

Most computer programs do the same thing every time they execute, so they are said to be deterministic. Determinism is usually a good thing, since we expect the same calculation to yield the same result. For some applications, though, we want the computer to be unpredictable. Games are an obvious example, but there are more.

Making a program truly nondeterministic turns out to be not so easy, but there are ways to make it at least seem nondeterministic. One of them is to generate random numbers and use them to determine the outcome of the program. Python provides a built-in function that generates pseudorandom numbers, which are not truly random in the mathematical sense, but for our purposes they will do.

The module contains a function called that returns a floating-point number between 0.0 and 1.0. Each time you call , you get the next number in a long series. To see a sample, run this loop:

To generate a random number between 0.0 and an upper bound like , multiply by .

As an exercise, generate a random number between and .
As an additional exercise, generate a random integer between and , including both end points.

9.5 List of random numbers

The first step is to generate a list of random values. takes an integer argument and returns a list of random numbers with the given length. It starts with a list of zeros. Each time through the loop, it replaces one of the elements with a random number. The return value is a reference to the complete list:

We'll test this function with a list of eight elements. For purposes of debugging, it is a good idea to start small.

The numbers generated by are supposed to be distributed uniformly, which means that every value is equally likely.

If we divide the range of possible values into equal-sized "buckets," and count the number of times a random value falls in each bucket, we should get roughly the same number in each.

We can test this theory by writing a program to divide the range into buckets and count the number of values in each.

9.6 Counting

A good approach to problems like this is to divide the problem into subproblems and look for subproblems that fit a computational pattern you have seen before.

In this case, we want to traverse a list of numbers and count the number of times a value falls in a given range. That sounds familiar. In Section 7.8, we wrote a program that traversed a string and counted the number of times a given letter appeared.

So, we can proceed by copying the old program and adapting it for the current problem. The original program was:

The first step is to replace with and with . That doesn't change the program; it just makes it more readable.

The second step is to change the test. We aren't interested in finding letters. We want to see if is between the given values and .

The last step is to encapsulate this code in a function called . The parameters are the list and the values and .

By copying and modifying an existing program, we were able to write this function quickly and save a lot of debugging time. This development plan is called pattern matching. If you find yourself working on a problem you have solved before, reuse the solution.

9.7 Many buckets

As the number of buckets increases, gets a little unwieldy. With two buckets, it's not bad:

But with four buckets it is getting cumbersome.

There are two problems. One is that we have to make up new variable names for each result. The other is that we have to compute the range for each bucket.

We'll solve the second problem first. If the number of buckets is , then the width of each bucket is .

We'll use a loop to compute the range of each bucket. The loop variable, , counts from 0 to :

To compute the low end of each bucket, we multiply the loop variable by the bucket width. The high end is just a away.

With , the output is:

You can confirm that each bucket is the same width, that they don't overlap, and that they cover the entire range from 0.0 to 1.0.

Now back to the first problem. We need a way to store eight integers, using the loop variable to indicate one at a time. By now you should be thinking, "List!"

We have to create the bucket list outside the loop, because we only want to do it once. Inside the loop, we'll call repeatedly and update the -eth element of the list:

With a list of 1000 values, this code produces this bucket list:

These numbers are fairly close to 125, which is what we expected. At least, they are close enough that we can believe the random number generator is working.

As an exercise, test this function with some longer lists, and see if the number of values in each bucket tends to level off.

9.8 A single-pass solution

Although this program works, it is not as efficient as it could be. Every time it calls , it traverses the entire list. As the number of buckets increases, that gets to be a lot of traversals.

It would be better to make a single pass through the list and compute for each value the index of the bucket in which it falls. Then we can increment the appropriate counter.

In the previous section we took an index, , and multiplied it by the to find the lower bound of a given bucket. Now we want to take a value in the range 0.0 to 1.0 and find the index of the bucket where it falls.

Since this problem is the inverse of the previous problem, we might guess that we should divide by instead of multiplying. That guess is correct.

Since , dividing by is the same as multiplying by . If we multiply a number in the range 0.0 to 1.0 by , we get a number in the range from 0.0 to . If we round that number to the next lower integer, we get exactly what we are looking for a bucket index:

We used the function to convert a floating-point number to an integer.

Is it possible for this calculation to produce an index that is out of range (either negative or greater than )?

A list like that contains counts of the number of values in each range is called a histogram.

As an exercise, write a function called that takes a list and a number of buckets as arguments and returns a histogram with the given number of buckets.

9.9 Glossary

immutable type
A type in which the elements cannot be modified. Assignments to elements or slices of immutable types cause an error.
mutable type
A data type in which the elements can be modified. All mutable types are compound types. Lists and dictionaries are mutable data types; strings and tuples are not.
tuple
A sequence type that is similar to a list except that it is immutable. Tuples can be used wherever an immutable type is required, such as a key in a dictionary.
tuple assignment
An assignment to all of the elements in a tuple using a single assignment statement. Tuple assignment occurs in parallel rather than in sequence, making it useful for swapping values.
deterministic
A program that does the same thing each time it is called.
pseudorandom
A sequence of numbers that appear to be random but that are actually the result of a deterministic computation.
histogram
A list of integers in which each element counts the number of times something happens.
pattern matching
A program development plan that involves identifying a familiar computational pattern and copying the solution to a similar problem.

This is an older version of the book now known as Think Python. You might prefer to read a more recent version.