A quick recap:
We’re cracking on well here, going through the Python course on Coursera as taught by Dr. Chuck (see footnotes for more info and links). We’ve looked a strings and how we can slice, dice and extract data from them using find and split functions. We’ve gone on to learn about files, and proceeded to open and read data from files, both as strings and as lists. We’ve been able to index lists using integer values from zero upwards. Then we’ve gone on to look at dictionaries, which are mini two-field databases of key/value pairs referenced using their keys. And all throughout we’ve been learning about and using various kinds of loops or iterations, conditional statements, and functions (both built-in and defined in-program).
And now for something completely different:
All these things were familiar to me from programming in school and as a maths undergrad (albeit a little rusty!). But now we come on to something completely different, something I hadn’t heard of before: tuples. I’ll call tuples the big brother of lists as they’re very similar to lists: they’re basically another type of collection of things.
Building a tuple from scratch:
We can build a tuple from scratch. Here we set the items in the tuple in a similar way as if we were creating items in a list, but instead of using square brackets, we’d use normal brackets:
newtuple = (3, 5, 7)
stringtuple = ('cakes', 'dogs', 'apples')
Items in the tuple can either be string or numeric; both string and numeric items can have functions such as max and min applied to them:
print max(newtuple)
print min(stringtuple)
Just as we could with items in lists, or the keys or values or key/value pairs in a dictionary, we can iterate through all the items in a tuple using the for .. in construct.
Lists vs. tuples:
We remember that lists are indexed using an integer value from zero upwards, e.g. listname[0] returns the first item in listname. Tuples are indexed in exactly the same way, so tuple[0] returns the value pair (‘peanuts’: 0).
Lists are mutable – meaning that the value assigned to a particular index/position can be changed (i.e. we can assign a new value to that index/position). Tuples are immutable – that is, once a value is assigned to that position/index within the tuple it cannot be reassigned with a new value.
Because of their immutability (i.e. we cannot change them once established) tuples are far faster and more efficient to use in many programs as they take up less memory space. They’re especially useful creating temporary variables which are likely to be used once and not need again or for variables which do not need to change.
Lists can have their order changed by using sorting or reversing functions – in which case the index number will (after sorting) apply to a different value. Lists remain as lists upon sorting. Tuples cannot have sorting or reversing functions applied to them. However, a tuple can be taken as an argument to a sorting or reversing function and the result will be a list, with items as per the items in the tuple but in the sorted/changed order.
We can add new items to a list using the append() function, but we cannot append items to a tuple.
Functions applicable to lists and tuples:
Lists | append; count; extend; index; insert; pop; remove; reverse; sort |
Tuples | count; index |
Assignment statements:
Sequences of variables can be turned into tuples very easily using assignment statements. Here the left-hand side of the assignment statement is a sequence of variables given as a tuple (i.e. within brackets and separated by commas) while the items at right-hand side of the assignment statement may either be constants (strings or numerics) or other variables, again given as a tuple. For example:
(x,y) = (1,5)
(word,value) = ('apples',1)
(a,b,c,d,e) = (5,7,'oranges',3,'pears')
jelly = 'fish'
(p,q,r) = (3,'gold',jelly)
In the above examples, y is 5; value is 1; e is ‘pears’; and r is ‘fish’ (where ‘fish’ is the value of the variable jelly).
One more point: we don’t need to include the brackets on the left-hand tuple; Python will automatically assign the values in the tuple at RHS to the variables at LHS in sequence. For example:
j,k = (3,'bananas')
Building a list of tuples from a dictionary:
Now that looks a little familiar… this is exactly what we used when we turned items in a dictionary into a tuple using for word,count in words.items() : to loop through multiple pairs of dictionary items (see part 7 and also the tagging engine project). Actually, to be accurate, in this case we turned the dictionary items into a list containing multiple tuples, with each tuple being of two items (the key/value pairs from the dictionary). It looked like this:
# counting all words into a dictionary
words = dict()
for word in wordlist :
____words[word] = words.get(word,0) + 1
# counting most common words in dictionary
wordhigh = None ; counthigh = None
for word,count in words.items() :
____if wordhigh is None or count > counthigh :
________wordhigh = word
________counthigh = count
print 'Most common word and its count:', wordhigh, counthigh
The following line in the above example uses the items() function to split the key/value pairs (in this case, word and its count) in the dictionary called words, and loops through the word/count value pairs.
for word,count in words.items() :
Imagine that the dictionary contained the following items: apples: 1, dogs: 3, cakes: 5, peanuts: 0. (As you can see, I like dogs more than apples and I really like cakes, but I can’t stand peanuts!) If we print out the dictionary, it shows the following (remember a dictionary is contained in curly brackets, and the order will likely change because a dictionary has its own internal way of ordering items): {‘peanuts’: 0, ‘cakes’: 5, ‘apples’: 1, ‘dogs’: 3}
If we were to print out the iteration variables word, count in the loop above, we’d get an output like this:
peanuts 0
cakes 5
apples 1
dogs 3
Now if we were to print out the list of tuples we created using words.items(), we’d get a list of the pairs (the list is contained in square brackets, and each set of value pairs – each of which is a two-item tuple – is contained in brackets): [(‘peanuts’: 0), (‘cakes’: 5), (‘apples’: 1), (‘dogs’: 3)]. When referencing this list of tuples, the list indexing holds true, so list[0] returns the first tuple in the list which is (‘peanuts’: 0).
Uses of tuples – comparisons:
We can compare one tuple to another using conditional functions such as greater than, less than, equals, and so on.
tuplea = (0,1,5,7,65)
tupleb = (0,1,5,9,0)
print 'tuple a:', tuplea
print 'tuple b:', tupleb
if tuplea < tupleb :
____print 'tuple a is smaller than tuple b'
We have to be careful here because Python with compare each item in the tuple in succession, first items first, then each successive item in sequence. If the answer is False when comparing the first items(comparing 0 with 0), Python will move on to compare the second items (1 and 1), and third items (5 and 5) and so on, until Python reaches a comparison for which the condition is True. Here the 4th item, 7, is less than 9, so Python returns True to the conditional and the statement gets printed. So items 5 in the tuples (65 and 0) are not compared at all.
Comparisons can also be done on strings (based on alphabetic order) as well as numerics.
Uses of tuples – sorting by key:
When creating the list of tuples in the above example, we can then apply the sort function to the list. This will sort all the tuples in the list in the order of the first item in each tuple, i.e. the key. Values remain with their referencing key, but the tuples cannot be sorted on values, only on the first item in the tuples.
listoftuples = words.items()
listoftuples.sort()
A shorter and quicker method for sorting the list of tuples created from the dictionary is to use the sorted() function with words.items() as its argument:
sortedlist = sorted(words.items())
We could even add the for .. in construct as well to loop through the sorted list of tuples (sorted by key) and do something with them:
for key,value in sortedlist = sorted(words.items()) :
____print key, value
Uses of tuples – sorting by value:
We saw in the tagging engine how we wanted to return the top n (e.g. top 3, top 5, etc.) most common key words. There is a pattern we can use with tuples to enable us to sort the list of tuples by value rather than by key, i.e. by making value the first item in each tuple thus allowing us to sort on that. We’ll want to sort it in descending order.
We can do so using the append() function to append values to a temporary list within a loop, but we can specify the argument (i.e. what it is we want to append) as a tuple of two items with value as the first item. This argument is denoted by (value,key) where key,value are the two variables within the iteration loop. The line of code required is listname.append( (value,key) ).
The following neat snippet shows exactly how to do this, and comes directly from Dr. Chuck’s course (see footnotes for links). Note that, where we have two or more tuples in the list with the same value, the list will be sorted based firstly on the first item (value, numeric, sorted by number) and then on the second item (key, string, sorted alphabetically, for example: 3 dogs and then 3 cookies).
words = {'apples': 1, 'dogs': 3, 'cakes': 5, 'peanuts': 0, 'cookies': 3}
templist = list()
for key,value in words.items() :
____templist.append( (value,key) )
templist.sort(reverse=True)
print 'Sorted by value, descending:', templist
We can go further and print out those keys and values (e.g. words and their counts) for the top 10 (or top n) counts – e.g. the top n most common words in a text file. To do so we add another loop after sorting templist, but this time iterating through only the first n tuples in the list using:
n = 10
for value,key in templist[:n] :
____print key, value
Uses of tuples – refining my projects:
This is incredibly powerful, and I can see how this can be used to make the Tagging Engine even better (and no doubt faster for larger files). But I can also easily see the usefulness of tuples for the aged debtor analysis program I began working on but parked because it seemed too complicated at the time. Here I can use a tuple as an argument to append combinations of items to a temporary list, in the required order. I can then sort and sum various parameters such as total debt, balance due in future, current, 30+ days overdue, etc. etc. I’ve got a lot more work to be getting on with for my Programming Projects now!!
Supersonic sorting:
The course does show us an even faster version of sorting with lists of tuples, but this is making my brain hurt again!
words = {'apples': 1, 'dogs': 3, 'cakes': 5, 'peanuts': 0, 'cookies': 3}
print sorted( [ (value,key) for key,value in words.items() ] )
The square brackets denote something called list comprehension, which is a special Python syntax which constructs a dynamic list. (Note this dynamic list replaces what we called templist in the previous longer version above.) The dynamic list is a list consisting of tuples denoted by (value,key) and is created as a result of looping through the iteration variables key,value in words.items(), that is the successive pairs of values within the dictionary called words. The resulting list is then sorted – here in ascending order – using the sorted() function, and the whole thing is printed out.
Iterative development and debugging:
Although that last line of code is pretty advanced stuff, and even though I can understand it when it’s explained to me as well as Dr. Chuck does, this just reinforces to me how much more I’ve still got to learn.
But still, the benefit of learning and writing programs in a more long-hand and more structured way is that I can check each element as I write it to make sure it’s all working correctly (iterative development). And that lends itself to adding sensible debugging print statements at relevant points throughout the program too, to check everything’s working fine.
So doing things the long-hand way isn’t really an issue at this stage in the game. In fact, it really helps me to understand the algorithms being constructed, and the structures of the databases being built up. It also really helps me with both developing – and debugging – programs.
And even when I get frustrated with not knowing how best to do something, forcing me to get out my trusty sledgehammer again, or when I’m going round and round in circles trying and failing to work things out quickly and being forced to go off and read the documentation (darn it!), I do get there in the end. And all that practice must be helping me to get better too. So it’s all good.
I’m really happy I’m taking this course on Coursera and above all I’m absolutely thrilled that I discovered such an amazing thing as coding – something that I can do and I enjoy doing so much.
There may just be hope for me and my mid-life career-change yet!!
Read more like this:
This post follows on from earlier Coding 101 posts and records my responses and learnings from the highly-recommended Python programming book and Coursera specialisation by Charles Severance (see References below).
References:
Book: Programming for Informatics – Exploring Information by Charles Severance
Course: Python Data Structures by Univ. of Michigan. Part of the Python for Everybody specialisation.