Iterators and Generators in Python3

Recently I needed a way to infinitely loop over a list in Python. Traditionally, this is extremely easy to do with simple indexing if the size of the list is known in advance. For example, an approach could look something like this:

1l = [1, 2, 3]
2i = 0
3
4while True:
5    print(l[i])
6    i += 1
7    if i == len(l):
8        i = 0

Eventually I settled on a an inbuilt approach using the itertools module from the standard library. Consequently the code became a lot cleaner:

1import itertools
2l = [1, 2, 3]
3
4for n in itertools.cycle(l):
5    print(n)

But for the fun of it, I decided to try to re-implement itertools.cycle myself, and in order to do that, I first have to understand how generators work in python3. I already do, but I will explain it here, after which I will demonstrate the itertools.cycle re-implementation.

Both iterators and generators have their own short sections in the offical python3 tutorial. This post is my own explanation of them.

Iterators

Iterators are objects that define a __next__ method that is called every time the next value of the iterable is desired. They can be iterated over, yielding their members one by one.

An object is said to be iterable if it defines an __iter__ method that returns an above mentioned iterator.

It can be a little confusing. Think of it like this: strings are iterable (able to be iterated over) because their base class (str) defines an __iter__ method that returns an iterator object which has a __next__ method.

For example, in the following code:

1for n in "abc": 
2    print(n)

, behind the scenes, for calls iter() on "abc", which in turn calls "abc".__iter__(). Since "123" is of type str, a class that defines the __iter__ method, that one is called, and an iterator object is returned.

Then, for simply calls the __next__ method on that interator object, passing its return value to you in the form of a variable, until __next__ raises a StopIteration exception, at which point the loop stops.

It actually calls next() on the iterator object, which in turn calls its __next__ method. The next() builtin is handy because it provides another parameter that can be used to specify a value to be returned if the iterator object is already exahusted.

Example

We can see this in action:

 1> string = "abc"
 2> iterator = string.__iter__()
 3
 4> print(iterator.__next__())
 5'a'
 6> print(iterator.__next__())
 7'b'
 8> print(iterator.__next__())
 9'c'
10
11> print(iterator.__next__())
12Traceback (most recent call last):
13  File "<stdin>", line 1, in <module>
14StopIteration

As we can see, when we exhaust the iterator, a StopIteration exception is raised.

Using the built-in next() function we have a greater degree of control: we can designate a value to be returned in the case where __next__ would exhaust the object (and raise an exception):

 1> string = "abc"
 2> iterator = string.__iter__()
 3
 4> print(next(iterator, 3))
 5'a'
 6> print(next(iterator, 3))
 7'b'
 8> print(next(iterator, 3))
 9'c'
10> print(next(iterator, 3))
113
12> print(next(iterator, 3))
133

Instead of directly calling object.__iter__(), you should probably use the built-in iter() method, as it has a little added functionality depending on the use case, but ultimately does the same thing (iterate over an object, that is).

Re-implementing itertools.cycle using iterators

Knowing all this, we can write a custom class that can infinitely iterate over another iterable:

 1class InfiniteIterable:
 2    def __init__(self, original):
 3        self.original_iterable = original
 4        self.len = len(original)
 5        self.i = 0
 6    def __iter__(self):
 7        return self
 8    def __next__(self):
 9        if self.i > (self.len - 1):
10            self.i = 0
11
12        ret = self.original_iterable[self.i]
13        self.i += 1
14
15        return ret
16
17for n in InfiniteIterable("abc"):
18    print(n)

Since we never raise a StopIteration exception from our __next__ method, the iterator never halts!

Since our custom class is an iterable object in and of itself (i.e. it defines an __next__ method), __iter__ just returns self! But if we didn’t want to implement iteration logic from within this class, we could make __iter__ return an instance of some other class.

Generators

Generator functions are regular Python functions that ease the process of creating iterators. They are written exactly as normal functions, with the added requirement of including at least one yield statement. The return value of a generator function is a generator, which is a kind of iterator.

 1>>> def g():
 2...     while True:
 3...             yield 3
 4
 5>>> '__iter__' in dir(g()) and '__next__' in dir(g())
 6True
 7>>> g()
 8<generator object g at 0x7fe49a2b35a0>
 9>>> type(g())
10<class 'generator'>

When a yield statement is encountered inside a generator function, it yields control back to the outside code, ‘returning’ the value that was the argument to the yield statement. The next time the generator is re-entered (via next(), the .send() method, or via a for-loop iteration), executuion resumes after the yield statement.

The generator is exhausted when the associated generator function returns, and a StopIteration exception is raised, just like with iterators. If the generator function returned with an associated value, it is wrapped by the StopIteration exception instance and can be subsequently accessed. This means that a return value statement from within a generator function is semantically equivalent to raise StopIteration(value) - except that the exception cannot be caught from within the containing generator function.

Implementing InfiniteIterable using generators

 1def InfiniteIterable(original_iterable):
 2    i = 0
 3    while True:
 4        yield original_iterable[i]
 5        i += 1
 6        if i > (len(original_iterable) - 1):
 7            i = 0
 8
 9for n in InfiniteIterable("abc"):
10    print(n)

This accomplishes the exact same task as manually writing the iterable does, but in a more readable manner. It is essentially just syntatic sugar.

generator.send() - bi-directional communication

Generators also implement a .send() method, which allows for bi-directional communication with a ‘running’ generator and outside code.

Calling next(generator) as well as generator.__next__() is equivalent to calling generator.send(None). That is to say, both next and .send request the next value from the generator, while .send also sends some data in.

The result of the yield expression in the generator function becomes the value of .send’s first argument (or None if next() was used instead).

yield from

The yield from syntax, introduced in PEP 380 is used for delegating control to a subgenerator. That is to say - the yield from syntax iterates over the requested generator, and then yields the values back out as they come in.

The yield from expression also has an accessible result that is present if the associated subgenerator returned with a value.

On the surface, it may seem that yield from gen() is just a shorthand for the following for loop:

1for v in gen():
2    yield v

While that is true for a number of simple use-cases - when the semantics of the iterator-generator methods send, throw and close is introduced, it becomes apparent that the yield from syntax performs a much more complicated job under the hood. This becomes evident after taking a look at the Formal Semantics of PEP 380.

Most notably, if the outer controlling generator is sent in a value from external code, it propagates that value and sends it to the generator in the yield from expression. Alongside that, it also handles all the possible edge cases as related to exception handling inside generators and the .throw/.close methods.