Python Generators

Before you take on Python generators, make sure you understand iterators. Generators are a more convenient way of creating iterators. Like iterators, they're useful when needing to iterate over large data structures.

Generator Functions

def countdown(number):
    while number > 0:
        yield number
        number -= 1

As soon as there's the yield keyword in a function or a method, it turns into a generator function.

Let's put some flags in the function and run it to understand how it works.

def countdown(n):
    print "countdown was just called with %s" % n
    while n > 0:
        print "before yield n is worth %s" % n
        yield n
        n -= 1
        print "after yield n is worth %s" % n

The first thing you'll note is that a generator function won't just run when called, it will instead return a generator object.

countdown(3)
# outputs: <generator object countdown at 0xa4....>

Let's try that again

c = countdown(3)

Your second remark will be that upon calling next() on a generator object, it will execute the body of the generator function and will stop at the yield statement.

c = countdown(3)

c.next()
# outputs:
    # countdown was just called with 3
    # before yield n is worth 3
    # 3

Your third remark: when calling next() again, it will resume where it left off and will stop again on the next yield statement

c.next()
# outputs:
    # after yield n is worth 2
    # before yield n is worth 2
    # 2

Finally, upon exiting, the generator implicitly raises a StopIteration, like an iterator would do more explicitly.

c.next()
# outputs:
    # after yield n is worth 1
    # before yield n is worth 1
    # 1

c.next()
# raises: StopIteration Exception

From observation, generators seem a more convenient way of creating iterators, without having to deal with the iterator protocol (__iter__(), next(), etc)

Generator Expressions

Remember how Python's list comprehension expressions create new lists;

doubled = [i * 2 for i in "abc"]
print doubled
# outputs: ['aa', 'bb', 'cc']

There's a similar expression to return a generator instead

doubled = (i * 2 for i in "abc")
print doubled
# outputs: <generator object <genexp> at 0x32...>

Unlike a list comprehension that creates a new list of values, a generator expression creates a generator object. That object doesn't actually carry a list of values, but rather an expression that can generate values on the fly.

for i in doubled:
    print i
# outputs:
    # aa
    # bb
    # cc

Parentheses on a generator expression can be dropped if used as a single function argument:

sum(x*x for x in s)

The general syntax of generator expressions:

(expression for e1 in s1 if cond1
            for e2 in s2 if cond2
            ...       
            for en in sn if condn)

# meaning

for e1 in s1:
    if cond1:
        for e2 in s2:
            if cond2:
                ...
                for en in sn:
                    if condn:
                        yield expression

Generator objects

A generator object is meant to be consumed only once. There is no rewind() method.

You can create many generator objects with the generator function if you wish to reuse the expression.

c = countdown(3)
for i in c:
    print i

# to do this again call countdown() once more and start over

c = countdown(3)
for i in c:
    print i

You can prematurely terminate a generator with its close() method:

gen = (i*2 for i in "abcd")

gen.next()
# outputs: aa

gen.close()

gen.next()
# raises: StopIteration Exception

References

http://stackoverflow.com/questions/231767/the-python-yield-keyword-explained

http://www.dabeaz.com/generators/index.html