Iterators: pointers vs cursors

leave this field blank to prove your humanity

Article title*

Article revisions must have a non-empty title

Article body*

You're already doing both of these by hand. This post emphaiszes the abstractions people make around them. We all know the humble for loop with an index:

int items[] = {...};
    for (size_t i = 0; i < std::size(items); ++i)
        process(items[i]);

Implemented literally, a real `i` register would be used to offset the address of items[0] in each iteration. We could instead precompute that pointer arithmetic in the loop itself:

for (int* it = &items[0]; it != &items[std::size(items)]; ++it)
        process(*it);

Both of these cases would be optimized to the same thing by the compiler. Actually, make that *are* optimized to the same thing: https://godbolt.org/z/Wex4azK6b (ty, 
Mr Godbolt, [What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid](https://www.youtube.com/watch?v=bSkpMdDe4g4)).

This second example is what the C++ Standard Library <sup>[STL differentiation](https://stackoverflow.com/questions/5205491/whats-the-difference-between-stl-and-c-standard-library)</sup> has chosen for its iterator abstraction:

for (int* it = std::begin(items); it != std::end(items); ++it)
        process(*it);

What of it? **The loop body only needs the pointer** and not the container and index. The pointer addition has been precomputed, albeit the information of the start of the container lost.

Note that although these examples use actual pointers, the C++ Standard's pointer-*style* iterators extend to more container types than contiguous arrays in memory.

# `std::ranges` Tangent

Iterators in general are powerful, particularly for algorithm abstraction and generally writing reusable and maintainable code. However, personally, I think separating begin and end pairs is ugly, overly verbose and error prone. In particular for errors, when people take shortcuts and don't pass the end around too. For example, `std::copy` only takes a single output iterator so bounds checking is impossible. Did we not learn from this with C strings?

for (char* it = std::begin(items); *it != '\0'; ++it)  // reminds me of linked list sentinel nodes
        process(*it);

`std::span`, `std::ranges` and `views` make iterators so much nicer to work with by hiding them completely:

std::ranges::for_each(items, process);

However, there's still some way to go. E.g. `std::ranges::copy` takes an input range but an output iterator and has no output range overload. See JeanHeyd Meneide's [Throwing Out the Kitchen Sink - Output Ranges](https://thephd.dev/output-ranges).

# Why does it matter?

The seemingly minor difference betwen pointer and index iteration becomes significant when writing algorithms and composing multiple together. I wonder if this choice was made in line with C++'s *uncompromised performance* superpower that Jon Kalb names in his C++Now 2024 talk, [This is C++: Uncompromised Performance, Undefined Behavior, & Move Semantics](https://www.youtube.com/watch?v=fcRHiFH04a4).

Tristan Brindle points out that keeping the index and container separate, which he generalizes to **cursor** and **sequence**, can improve safety --- specifically, avoiding *undefined behaviour* (symmetry to the above noted) --- and sometimes better performance in his CppCon 2023 talk, [Iteration Revisited: A Safer Iteration Model for Cpp](https://www.youtube.com/watch?v=nDyjCMnTu7o). The [flux](https://github.com/tcbrindle/flux) library comprehensively demonstrates this idea providing a complete implementation of iterators and algorithms that use cursors instead of pointers, including adaptors to the standard library.

Could a convention of always using ranges mitigate the iterator UB cases? E.g. we unit test and trust the iterator abstraction, then hide it completely with ranges. Maybe not due to iterator invalidation rules. Could preconditions help keep good performance here? Maybe it really doesn't matter, once we're at the abstraction level of composing algorithms. These are not rhetorical, I actually want to know.

# Algorithms

Iterators are powerful because we can separate data input/output implementation, and container/storage if there is any, from their processing. This is more than just allowing your function to take a sized raw array or std::vector although that's useful too. Iterators can be lazily evaluated and written. For example, `std::views::iota` only computes the next index when its iterator is incremented. `std::back_inserter` doesn't need pre-allocated storage. `std::ostream_iterator` could grow a file when written to. Since iterators are at the core they deserve some thought.

There's a fantastic related talk about algorithms, [Rappel: Compose Algorithms, Not Iterators - Google's Alternative to Ranges](https://www.youtube.com/watch?v=itnyR9j8y6E). Some ideas here could help with the problem of iterator invalidation, although it seems that may be at the cost of lazy evalutation.

# Exploring some edge cases

Iteration is such a basic concept in programming. Neither pointers nor cursors are necessarily better, but isn't it interesting to see how using each plays out. Lets explore some examples.

Should we use a pointer or an index to mark the end of a range? Interestingly, `std::vector` and `std::span` use sizes (effectively an index) to mark the end of the range. `std::span` in particular is contrary to the standard's convention of pointer based iterators. If you want, for example the second half of a `std::span`, you have to recompute its size rather than just reuse an end pointer:

std::span itemView(items);
    std::span drop1 = itemView.subspan(1);  // is equivalent to ...
    std::span drop1{itemView.begin() + 1, itemView.size() - 1};  // need to +1 and -1!

Pointers don't work across address spaces. Well, I guess there's relative pointers, e.g. my [`decodeless::offset_ptr`](https://github.com/decodeless/offset_ptr). Indices are robust and intuitive particularly when storing iterators in files and across memory shared between processes or hardware like GPUs.

Indices may offer memory and bandwidth benefits. Pointers are typically byte granularity. For a start, many objects are larger than a byte and being able to store inbetweens is wasteful. Moreover if the memory pointed to is known to be all in the one place, e.g. single allocation containers such as std::vector, we could be using smaller 32-bit or even 8-bit indices.

## More Complex Indirection

Consider these examples where we want an array of variable sized arrays:

```
// indices
int items[]      = {0, 1, 2, 3, 4, 5, 6};
size_t offsets[] = {0, 3, 5};
size_t counts[]  = {3, 2, 2};

// pointers
int items[]   = {0, 1, 2, 3, 4, 5, 6};
int* begins[] = {&items[0], &items[3], &items[5]};
int* ends[]   = {&items[3], &items[5], &items[7]};
```

Indices may match ABIs of existing libraries more than pointers. Arrays of indices become more obviously useful when considering saving such relational data to disk, sending across the network etc.

However, writing abstractions around indices can be harder. I once tried miserably with https://github.com/pknowles/index_ptr. The intent was to provide views of indirect data so you didn't have to touch indices directly. It sounds like over-engineering but indices can get complicated fast when they're they are layered. For example, imagine if `items` were themselves indices!

A second attempt went better with https://github.com/pknowles/indirect_view, another weekend project that better layered abstractions --- I suspect similarly to flux's adaptors. Its `cursor_iterator` simply aggregates an existing iterator with a reference to a container.

## Iterating in Parallel

Indices are fantastic for this:

int itemsA[] = {...};
    int itemsB[] = {...};
    for (size_t i = 0; i < std::size(itemsA); ++i)
        process(itemsA[i], itemsB[i]);

This is often done when implementing a structure of arrays, which is also incredibly powerful. Unfortunately, if you throw away the index by pre-computing the pointer offset, you can't do the same with pointers. Isn't it frustrating that there isn't a nicer way to do this in modern C++? Well, there sort of is, with C++23:

for(int [a, b] : std::views::zip(itemsA, itemsB))
        process(a, b);

A `std::ranges::for_each` version is possible but `process()` would need to take a tuple as I'm not aware of an std::apply adapter. I also wonder what performance implications arise when the compiler can't inline everything.

There is this, which becomes more tempting with `std::execution::par_unseq`, but don't!

for (int& a : itemsA)
        process(a, itemsB[&a - &itemsA[0]]);

# Takeaways

- There's more than one way to do iteration. There's two. (More? ¯\\_(ツ)_/¯)
- Pros and cons to both. We mix them all over the place. Is this OK?
- Maybe we can hide the implementation entirely?

Toggle Preview

Edit message*

A description of the changes made

Iterators: pointers vs cursors

Flag the thing you clicked for moderator attention.

Error