Add iterator adapters #150

mattmorgis · 2025-09-18T02:34:00Z

This adds two new utility functions to convert iterators and async iterators into file like objects that can be used with ijson:

ijson.from_iter(iterable_of_bytes)
ijson.from_aiter(async_iterable_of_bytes)

I updated the README with two working examples.

This implementation uses both aiter() and anext() which are only available in Python 3.10+. With Python 3.9 reaching EOL in October 2025, I wanted to also see if there was any appetite for dropping 3.9 support?

Related Issues:
#44
#124
#147
#58

Summary by Sourcery

Introduce iterator adapters to convert synchronous and asynchronous byte iterators into file-like objects usable by ijson, update documentation with examples, and add tests for both sync and async adapters.

New Features:

Add from_iter utility to wrap a synchronous byte iterable as a file-like object
Add from_aiter utility to wrap an asynchronous byte iterable as a file-like object

Enhancements:

Leverage Python 3.10 aiter() and anext() in async adapter implementation

Documentation:

Update README with usage examples for from_iter and from_aiter

Tests:

Add comprehensive sync adapter tests for read, parse, items, and kvitems
Add comprehensive async adapter tests for read, parse, items, and kvitems

sourcery-ai · 2025-09-18T02:34:05Z

Reviewer's Guide

This PR introduces two iterator adapters—from_iter and from_aiter—backed by IterReader and AiterReader classes that leverage Python 3.10 iterator helpers, updates the package export, enriches the README with sync and async usage examples, and adds comprehensive tests covering read behavior and parsing scenarios.

Class diagram for new iterator adapter classes

classDiagram
    class IterReader {
        - _iter: Iterator[bytes]
        + __init__(byte_iter: Iterator[bytes])
        + read(n: int) bytes
    }
    class AiterReader {
        - _aiter: AsyncIterator[bytes]
        + __init__(byte_aiter: AsyncIterator[bytes])
        + read(n: int) bytes
    }
    class from_iter {
        + from_iter(byte_iter: Iterable[bytes]) IterReader
    }
    class from_aiter {
        + from_aiter(byte_aiter: AsyncIterable[bytes]) AiterReader
    }
    from_iter --> IterReader
    from_aiter --> AiterReader

Class diagram for integration with ijson

classDiagram
    class IterReader {
        + read(n: int) bytes
    }
    class AiterReader {
        + read(n: int) bytes
    }
    class ijson {
        + from_iter(iterable_of_bytes) IterReader
        + from_aiter(async_iterable_of_bytes) AiterReader
    }
    ijson --> IterReader
    ijson --> AiterReader

File-Level Changes

Change	Details	Files
Added iterator adapters for sync and async byte iterators	Introduced IterReader with read() method Implemented AiterReader with async read() using aiter/anext Defined from_iter and from_aiter factory functions	`src/ijson/adapters.py`
Exported new adapter functions in package init	Imported from_iter and from_aiter in init.py Adjusted module exports to include the new functions	`src/ijson/__init__.py`
Augmented README with adapter usage examples	Added example usage for from_iter Added example usage for from_aiter	`README.rst`
Added tests covering adapter functionality	Created fixtures for sync and async chunk sources Wrote tests for read(0), read(n) behavior Verified integration with basic_parse, parse, items, and kvitems for both sync and async	`tests/test_adapters.py`

Possibly linked issues

Incomplete parse with trailing garbage #123: The PR directly solves the issue's problem by introducing ijson.from_iter and ijson.from_aiter to efficiently parse JSON from byte iterators and async iterators, eliminating the need for custom file-like object wrappers and their associated inefficiencies.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `src/ijson/adapters.py:10-13` </location>
<code_context>
+    def __init__(self, byte_iter: Iterator[bytes]):
+        self._iter = byte_iter
+
+    def read(self, n: int) -> bytes:
+        if n == 0:
+            return b""
+        return next(self._iter, b"")
+
+
</code_context>

<issue_to_address>
**issue (bug_risk):** The read method ignores the requested byte count 'n'.

IterReader.read should accumulate data from the iterator until 'n' bytes are returned or the iterator is exhausted, to match standard file-like behavior.
</issue_to_address>

### Comment 2
<location> `src/ijson/adapters.py:22-25` </location>
<code_context>
+    def __init__(self, byte_aiter: AsyncIterator[bytes]):
+        self._aiter = byte_aiter
+
+    async def read(self, n: int) -> bytes:
+        if n == 0:
+            return b""
+        return await anext(self._aiter, b"")
+
+
</code_context>

<issue_to_address>
**issue (bug_risk):** AiterReader.read does not respect the 'n' parameter.

The async read method should accumulate data until 'n' bytes are read or the iterator is exhausted, to match expected behavior.
</issue_to_address>

### Comment 3
<location> `tests/test_adapters.py:24-30` </location>
<code_context>
+    return chunks()
+
+
+def test_from_iter_read0_does_not_consume():
+    chunks = [b'{"key":', b'"value"}']
+    file_obj = ijson.from_iter(iter(chunks))
+    assert file_obj.read(0) == b""
+    assert file_obj.read(1) == b'{"key":'
+    assert file_obj.read(1) == b'"value"}'
+    assert file_obj.read(1) == b""
+
+
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for empty input iterators.

Please add tests for both `from_iter` and `from_aiter` using empty iterables to confirm they return b'' on read and handle empty input correctly.

```suggestion
def test_from_iter_read0_does_not_consume():
    chunks = [b'{"key":', b'"value"}']
    file_obj = ijson.from_iter(iter(chunks))
    assert file_obj.read(0) == b""
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

def test_from_iter_empty_iterator():
    file_obj = ijson.from_iter(iter([]))
    assert file_obj.read() == b""
    assert file_obj.read(1) == b""
    assert file_obj.read(0) == b""

import pytest
import asyncio

@pytest.mark.asyncio
async def test_from_aiter_empty_iterator():
    async def empty_aiter():
        if False:
            yield  # never yields
    file_obj = await ijson.from_aiter(empty_aiter())
    assert await file_obj.read() == b""
    assert await file_obj.read(1) == b""
    assert await file_obj.read(0) == b""
```
</issue_to_address>

### Comment 4
<location> `tests/test_adapters.py:33-38` </location>
<code_context>
+    assert file_obj.read(1) == b""
+
+
+def test_from_iter_accepts_iterable():
+    chunks = [b'{"key":', b'"value"}']
+    file_obj = ijson.from_iter(chunks)  # no iter(...)
+    assert file_obj.read(1) == b'{"key":'
+    assert file_obj.read(1) == b'"value"}'
+    assert file_obj.read(1) == b""
+
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for non-standard iterables and error conditions.

Please add tests for generators, tuples, and invalid inputs to verify correct exception handling and behavior.

```suggestion
def test_from_iter_accepts_iterable():
    chunks = [b'{"key":', b'"value"}']
    file_obj = ijson.from_iter(chunks)  # no iter(...)
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

def test_from_iter_accepts_generator():
    def chunk_gen():
        yield b'{"key":'
        yield b'"value"}'
    file_obj = ijson.from_iter(chunk_gen())
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

def test_from_iter_accepts_tuple():
    chunks = (b'{"key":', b'"value"}')
    file_obj = ijson.from_iter(chunks)
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

import pytest

@pytest.mark.parametrize("invalid_input", [
    "not bytes",  # string, not bytes
    123,          # integer, not iterable
    None,         # NoneType
    [123, 456],   # list of non-bytes
])
def test_from_iter_invalid_inputs_raise(invalid_input):
    with pytest.raises(Exception):
        ijson.from_iter(invalid_input)
```
</issue_to_address>

### Comment 5
<location> `README.rst:668-650` </location>
<code_context>
+               async with client.stream('GET', 'https://jsonplaceholder.typicode.com/posts') as resp:
+                   resp.raise_for_status()
+                   f = ijson.from_aiter(resp.aiter_bytes())
+                   async for item in ijson.items(f, 'item'):
+                      print(f"post id = {post['id']}, \t title: {post['title']}")

-   The first alternative is best, since ``requests`` will automatically decode
</code_context>

<issue_to_address>
**issue:** Variable name mismatch in async example print statement.

The print statement should reference 'item["id"]' and 'item["title"]' to match the loop variable.
</issue_to_address>

### Comment 6
<location> `src/ijson/adapters.py:11-13` </location>
<code_context>
    def read(self, n: int) -> bytes:
        if n == 0:
            return b""
        return next(self._iter, b"")

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return b"" if n == 0 else next(self._iter, b"")
```
</issue_to_address>

### Comment 7
<location> `src/ijson/adapters.py:23-25` </location>
<code_context>
    async def read(self, n: int) -> bytes:
        if n == 0:
            return b""
        return await anext(self._aiter, b"")

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return b"" if n == 0 else await anext(self._aiter, b"")
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/ijson/adapters.py

tests/test_adapters.py

README.rst

sourcery-ai · 2025-09-18T02:35:08Z

src/ijson/adapters.py

+        if n == 0:
+            return b""
+        return next(self._iter, b"")


suggestion (code-quality): We've found these issues:

Lift code into else after jump in control flow (reintroduce-else)

Replace if statement with if expression (assign-if-exp)

Suggested change

if n == 0:

return b""

return next(self._iter, b"")

return b"" if n == 0 else next(self._iter, b"")

sourcery-ai · 2025-09-18T02:35:09Z

src/ijson/adapters.py

+        if n == 0:
+            return b""
+        return await anext(self._aiter, b"")


suggestion (code-quality): We've found these issues:

Lift code into else after jump in control flow (reintroduce-else)

Replace if statement with if expression (assign-if-exp)

Suggested change

if n == 0:

return b""

return await anext(self._aiter, b"")

return b"" if n == 0 else await anext(self._aiter, b"")

rtobar

Thank you very much @mattmorgis for what looks like a nice piece of work including documentation updates and tests.

I'm currently away and won't be back until early October, so I don't expect to be able to merge this until then (but it might happen, who knows). In the meantime I still have this a look and I left a couple of initial comments. Overall the solution is fine, it's mostly some.smsll.tweaks to better align things with the rest of the codebase. Thanks again!

BTW, I didn't really look into the AI review, this is turned on by default on my organisation but I don't personally use it. This is to say that you shouldn't feel.obliged to follow it's directions.

rtobar · 2025-09-18T02:55:28Z

src/ijson/adapters.py

+    return IterReader(iter(byte_iter))
+
+
+def from_aiter(byte_aiter: AsyncIterable[bytes]) -> AiterReader:


Like with the main entry points for ijson, I'd rather we had a single from_iter function here that handled both sync and async cases, returning the correct Reader instance depending on the input iterable type (you can check for the presence of __aiter__ for instance).

rtobar · 2025-09-18T03:07:54Z

tests/test_adapters.py

+    assert file_obj.read(1) == b""
+
+
+def test_from_iter_basic_parse(backend, chunks):


I think the two first tests above that test from_iter directly are fine here. Same for the rest of the test below that do the same.

However for the rest I would have gone very differently. Please have a look at contest.py, where test cases are dynamically generated for each backend based on different "backend adaptor", each basically feeding ijson data in a different way (str, bytes, sync/async file-like objects) and for each backend. These are then used to run a number of tests that venter around the actual.functions (basic_parse, parse, etc). I think most tests that follow in this module are (most probably because I didn't actually check) a small subset of those function-oriented test cases.

So what I'd do is provide two new of these "backend adaptors" for the sync/async cases. They'd take the test input and create iterables out of them (iterating over 1 character/byte at a time, or maybe even by random amounts), then pass those through ijson.from_iter, and finally feed that to the ijson routine under test. That would make many of these tests below unnecessary, and would ensure all test cases work as expected with iterable inputs. Let me know if you need any guidance with this.

rtobar · 2025-09-18T03:09:32Z

src/ijson/adapters.py

+
+
+def from_iter(byte_iter: Iterable[bytes]) -> IterReader:
+    """Convert a synchronous byte iterable to a file-like object."""


We support both str and bytes for file -like objects, it'd be nice to support both too here. Shouldn't be too hard? Just need to encode strings with utf8, with a warning like we do in the utils.module.

rtobar · 2025-09-18T03:13:04Z

README.rst

   See the options_ section for details.

-#. How do I use ijson with the ``requests`` library
+#. **Q**: How do I use ijson with ``requests`` or ``httpx`` 


For consistency please also fix the starting line of the answer to start with **A**

rtobar · 2025-09-18T03:15:25Z

README.rst

+       import httpx, ijson, asyncio

-   * Pass the ``Response.raw`` object (the underlying ``socket.socket``) to ijson.
+       async def main():


This is a very simple code sample, I think we an remove this main (and it's invocation with asyncii.run) to remove some.indentation and keep the code a bit easier to the eye.

rtobar · 2025-09-18T03:21:53Z

README.rst

+       import requests
+       import ijson
+
+       with requests.get('https://jsonplaceholder.typicode.com/posts', stream=True) as resp:


Popular as it may be, I'd rather not reference any specific website in particular; instead I'd keep it intentionally vague like in the other examples where only http://... is shown. Same for the actual loop where each item is processed.

rtobar · 2025-09-18T03:36:45Z

@mattmorgis about aiter/anext, I actually hadn't realised they were that new.. I'd rather keep this PR self-contained, so maybe you could provide very simple aiter/anext implementations in the same adapter module for python versions < 3.10 that can be later removed once support for 3.9 is removed.

mattmorgis added 2 commits September 17, 2025 22:15

add adapters for iterators

def1b55

update readme and add examples

8e93cc7

sourcery-ai bot reviewed Sep 18, 2025

View reviewed changes

rtobar reviewed Sep 18, 2025

View reviewed changes

		return IterReader(iter(byte_iter))


		def from_aiter(byte_aiter: AsyncIterable[bytes]) -> AiterReader:

		assert file_obj.read(1) == b""


		def test_from_iter_basic_parse(backend, chunks):



		def from_iter(byte_iter: Iterable[bytes]) -> IterReader:
		"""Convert a synchronous byte iterable to a file-like object."""

Add iterator adapters #150

Are you sure you want to change the base?

Add iterator adapters #150

Uh oh!

Conversation

mattmorgis commented Sep 18, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for new iterator adapter classes

Class diagram for integration with ijson

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sourcery-ai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rtobar left a comment

Choose a reason for hiding this comment

Uh oh!

rtobar Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rtobar Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rtobar Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rtobar Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rtobar Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rtobar Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rtobar commented Sep 18, 2025

Uh oh!

Uh oh!

mattmorgis commented Sep 18, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Sep 18, 2025 •

edited

Loading