Skip to content

Stream Iterator

Razvan Musaloiu-E edited this page Apr 7, 2017 · 2 revisions

There are many ways to do iteration in Go. Rather than leaving it to each developer to make the decision, this page describes a concrete solution for the stream iterator, the case when it is difficult or impossible to know the number of objects that the iterator will return. Suitable examples for this include the records in a file and streaming RPCs. The solution has a single blocking call and allows for cancellation. Because Go does not support generics, we only define conventions rather than Go interfaces.

Example client:

s := provider.Stream()
for s.Advance() {
  val := s.Value()
  …
  if enough {
    s.Cancel()
    // Optional, s.Advance() would subsequently return false.
    break
  }
}

if s.Err() != nil && s.Err() != verror.ErrCanceled {
  return s.Err()
}

API:

type <DataProvider> interface {
  // Stream provides a way to iterate over the values in
  // <DataProvider>.
  Stream() Stream
}

type Stream interface {
  // Advance stages an element so the client can retrieve it
  // with Value.  Advance returns true iff an element has been
  // staged.  The client must call Advance before calling Value.
  // The client must call Cancel if it does not iterate through 
  // all elements (i.e. until Advance returns false).  Advance 
  // may block if an element is not immediately available.
  Advance() bool

  // Value returns the element that was staged by Advance.
  // Value always returns this staged value until the next
  // call to Advance.  Value may panic if Advance returned
  // false or was not called at all.  Value does not block.
  //
  // In general, the behavior of Value is undefined if the 
  // underlying collection of elements changes while iteration is 
  // in progress.  If <DataProvider> supports concurrent 
  // modification, it should document its behavior.
  Value() <DataProviderType>

  // Err returns a non-nil error iff the stream encountered
  // any errors.  Err does not block.  Err should be called
  // after Advance returns false (i.e. after the for-loop
  // terminates).  Err returns verror.ErrCanceled if the iterator
  // was canceled.
  Err() error

  // Cancel notifies the stream provider that it can stop
  // producing elements.  The client should call Cancel if
  // it does not iterate through all elements (i.e. until Advance
  // returns false).  Cancel is idempotent and can be called
  // concurrently with a goroutine that is iterating via
  // Advance/Value.  Cancel causes Advance to subsequently
  // return  false, but it does not interfere with the currently
  // staged value.  Cancel causes Err to return
  // verror.ErrCanceled if there was not already an error.
  // Cancel does not block.
  Cancel()
}

Why not defining an actual Go interface?

Go does not support generics, so the only way to define an iterator interface would be to use the interface{} type. This requires typecasting, so we lose the benefit of Go's static type checking. There are two ways to use interface{}.

Method 1:

type Stream interface {
  …
  Value() interface{}
  …
}

for s.Advance() {
  val := s.Value().(<DataProviderType>)
  …
}

Method 2:

type Stream interface {
  …
  Value(v interface{})
  …
}

for s.Advance() {
  var val <DataProviderType>
  s.Value(&val)
  …
}

Both styles make the client code more verbose.

The downside of not using a Go interface is that we can't compose iterators or provide common libraries for working with iterators.

Drawbacks

The stream iterator can be used for in-memory data structures, but it is not the most convenient for the iterator implementer. Specifically, it is natural for the iterator to start at the first element instead of starting in the "needs to call Advance()" state as required by the Stream style. Additionally, error handling and cancellation are very likely unnecessary for in-memory data structures. All of this means a bunch of extra work for the implementer and a heavyweight API for the client. At the same time, we might want to support extra features for in-memory data structures such as concurrent modification or reverse iteration. The Steam style does not work well for these features.

Alternatives

Channels

Channels work with Go's range builtin, so they are convenient for the client. Unfortunately, cancellation requires a second channel and a bunch of boilerplate code that can't be shared across iterator implementations. Error handling requires either a third channel or a mutex—even more boilerplate that can't be shared. On top of being much harder for the implementer, channels are 25–50x slower than plain iterators (see the Evaluation section from A Survey of Iterator (or Generator) Patterns in golang).

Callbacks

Cancellation and error handling are tricky to get right. Early termination results in verbose code.

Closures

Cancellation is not possible and error handling makes things even uglier.

Multiple return values

it := foo.Stream()
for val, done := it.Recv(); !done; val, done = it.Recv() {
  …
}
err := it.Err()

This style was a top contender. While you can fit the iteration on fewer lines of code, we found the redundant calls to it.Recv() to be ugly.

it := foo.Stream()
for {
  val, err := it.Recv()
  if err == io.EOF() {
    break
  }
  if err != nil {
    return err
  }
  …
}

We used this style in our IPC code for a while. It was nice that there was only one method to learn. However, the 8 lines of boilerplate above hid the important parts of the for-loop.

Java-style iterators

it := foo.Stream()
for it.HasNext() {
  x := it.Next()
  …
}
err := it.Err()

On the surface, this API is very similar to the Stream API we chose. However, both HasNext() and Next() are potentially blocking. If a client cares about blocking, they have to do an asynchronous dance twice instead of just once. This API is also more difficult for the implementer to get right.

HasValue/Value

it := foo.GetStream()
for ; it.HasValue(); it.Next() {
   x := it.Value()
   …
}
err := it.Error()

Similar to the Java-style iterators, both HasValue() and Next() could block. If a client cares about blocking, they have to do an asynchronous dance twice instead of just once. Additionally, this API has an extra method that introduces more cognitive load on client developers.

References

A Survey of Iterator (or Generator) Patterns in golang

This page is based on a internal Vanadium document from 2014 authored by Ken Ashcraft with help from Mehrdad Afshari, Jason Hickey and Todd Wang.

Clone this wiki locally