Iterables and Iterators
- object that can be iterated over.
- object that implements a
next()method. Often used to wrap other objects and make them iterable.
The for statement allows to iterate over various types of collections:
# over a list, you get a value for value in mylist: print value # over a dict, you get a key for key in mydict: print key # over a string, you get a character for char in mystring: print char # over a file, you get a line for line in open('myfile.txt') print line # etc
These collections are iterable, we'll see how iterators are implicitly used to access the collections' items.
many objects can be made into iterables, thanks to a simple protocol
# we have a collection we'd like to iterate over items = [1, 5, 8] # we get an iterator for the collection it = iter(items) # we get each element, one at a time e = it.next() # we do something with e print e # outputs: 1 e = it.next() print e # outputs: 5 e = it.next() print e # outputs: 8 e = it.next() # raises StopIteration Exception
for loop works
The way we write it:
for element in items: # do something with element
The way it works under the hood (pretty much)
_it = iter(items) # get iterator while True: try: i = _it.next() # get next element except StopIteration, IndexError: break # do something with element
The short explanation on
iter() is that it operates on objects that implement either the
__getitem__() methods and it returns an iterator.
# lists implement __iter__() and can return an iterator themselves a = [5, 6, 9, 1] it = a.__iter__() it.next() # outputs: 5 it.next() # outputs: 6 # when calling iter() on a list it simply uses that iterator directly it = iter(a) it.next() # outputs: 5 # strings only implement __getitem__(), a = "abcde" a.__getitem__(3) # outputs: 'd' a.__getitem__(1) # outputs: 'b' # iter() will have to "wrap" them with a third party iterator it = iter(a) # returns a "wrapper" iterator object # this method of the iterator calls a.__getitem__(index) under the hood it.next() # outputs: 'a'
creating iterable objects
Essentially 2 approaches:
- implementing the
__getitem__()method on the class
class DoubleChar(object): def __init__(self, seq): self.seq = seq def __getitem__(self, index): print "__getitem__ was called" return self.seq[index] * 2 for i in DoubleChar('abc'): print i # outputs: # __getitem__ was called # aa # __getitem__ was called # bb # __getitem__ was called # cc # __getitem__ was called
__getitem__ was called a 4th time, but an
IndexError was raised by the extra
self.seq[index] call and implicitly suppressed in the
- implementing the
__iter__()method and ensure that its returned object itself implements the
next()method (i.e. is an iterator).
class DoubleChar(object): def __init__(self, seq): self.counter = 0 self.seq = seq self.length = len(seq) def __iter__(self): return self def next(self): # iter() needs this method to be defined # on whatever object is returned by __iter__() # (which in this case is this very object), # otherwise a TypeError will be raised. print "next() has been called" if self.counter >= self.length: raise StopIteration value = self.seq[self.counter] self.counter += 1 return value * 2 for i in DoubleChar('abc'): print i # outputs: # next has been called # aa # next has been called # bb # next has been called # cc # next has been called
In the previous examples, note how many time the messages "__getitem__ was called" and "next has been called" were printed, as opposed to how many time the actual value itself was printed. This suggests that a
for loop can be stopped by raising an
__getitem__() or a
StopIteration inside the iterator's
next() method (see earlier explanation of how a
for loop actually works).
class max(object): def __init__(self, someiterable, max): self.iterable = someiterable self.max = max def __getitem__(self, index): if index >= self.max: raise IndexError("you asked for %s" % self.max) return self.iterable.__getitem__(index) for i in max("woejfoeifwsjodf", 3): print i # outputs: # w # o # e
Note how the IndexError and the message within have been suppressed in the
which to use
As a rule of thumb,
__getitem__() should give a semantic indication that random access is possible. Originally
__iter__() did not exist and to create iterators people would write code that looked like
class Stream(object): def __init__(self, url): self.buffer = connect(url) def __getitem__(self, index): next = self.buffer.download() if next == 'EOF': raise IndexError return next for i in Stream(someurl): print i
Note how the
index parameter is completely ignored inside the
__getitem__() method? This is misleading. The presence of
__getitem__() gives the false impression that a
Stream object can be accessed randomly, when in fact it can't. For that reason, the mechanisms associated with
iterator.next() were added in Python 2.2. It allows for a more semantically accurate usage.