Skip to content

Conversation

mattmorgis
Copy link

@mattmorgis mattmorgis commented Sep 18, 2025

This adds two new utility functions to convert iterators and async iterators into file like objects that can be used with ijson:

  • ijson.from_iter(iterable_of_bytes)
  • ijson.from_aiter(async_iterable_of_bytes)

I updated the README with two working examples.

This implementation uses both aiter() and anext() which are only available in Python 3.10+. With Python 3.9 reaching EOL in October 2025, I wanted to also see if there was any appetite for dropping 3.9 support?

Related Issues:
#44
#124
#147
#58

Summary by Sourcery

Introduce iterator adapters to convert synchronous and asynchronous byte iterators into file-like objects usable by ijson, update documentation with examples, and add tests for both sync and async adapters.

New Features:

  • Add from_iter utility to wrap a synchronous byte iterable as a file-like object
  • Add from_aiter utility to wrap an asynchronous byte iterable as a file-like object

Enhancements:

  • Leverage Python 3.10 aiter() and anext() in async adapter implementation

Documentation:

  • Update README with usage examples for from_iter and from_aiter

Tests:

  • Add comprehensive sync adapter tests for read, parse, items, and kvitems
  • Add comprehensive async adapter tests for read, parse, items, and kvitems

Copy link

sourcery-ai bot commented Sep 18, 2025

Reviewer's Guide

This PR introduces two iterator adapters—from_iter and from_aiter—backed by IterReader and AiterReader classes that leverage Python 3.10 iterator helpers, updates the package export, enriches the README with sync and async usage examples, and adds comprehensive tests covering read behavior and parsing scenarios.

Class diagram for new iterator adapter classes

classDiagram
    class IterReader {
        - _iter: Iterator[bytes]
        + __init__(byte_iter: Iterator[bytes])
        + read(n: int) bytes
    }
    class AiterReader {
        - _aiter: AsyncIterator[bytes]
        + __init__(byte_aiter: AsyncIterator[bytes])
        + read(n: int) bytes
    }
    class from_iter {
        + from_iter(byte_iter: Iterable[bytes]) IterReader
    }
    class from_aiter {
        + from_aiter(byte_aiter: AsyncIterable[bytes]) AiterReader
    }
    from_iter --> IterReader
    from_aiter --> AiterReader
Loading

Class diagram for integration with ijson

classDiagram
    class IterReader {
        + read(n: int) bytes
    }
    class AiterReader {
        + read(n: int) bytes
    }
    class ijson {
        + from_iter(iterable_of_bytes) IterReader
        + from_aiter(async_iterable_of_bytes) AiterReader
    }
    ijson --> IterReader
    ijson --> AiterReader
Loading

File-Level Changes

Change Details Files
Added iterator adapters for sync and async byte iterators
  • Introduced IterReader with read() method
  • Implemented AiterReader with async read() using aiter/anext
  • Defined from_iter and from_aiter factory functions
src/ijson/adapters.py
Exported new adapter functions in package init
  • Imported from_iter and from_aiter in init.py
  • Adjusted module exports to include the new functions
src/ijson/__init__.py
Augmented README with adapter usage examples
  • Added example usage for from_iter
  • Added example usage for from_aiter
README.rst
Added tests covering adapter functionality
  • Created fixtures for sync and async chunk sources
  • Wrote tests for read(0), read(n) behavior
  • Verified integration with basic_parse, parse, items, and kvitems for both sync and async
tests/test_adapters.py

Possibly linked issues

  • Incomplete parse with trailing garbage #123: The PR directly solves the issue's problem by introducing ijson.from_iter and ijson.from_aiter to efficiently parse JSON from byte iterators and async iterators, eliminating the need for custom file-like object wrappers and their associated inefficiencies.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `src/ijson/adapters.py:10-13` </location>
<code_context>
+    def __init__(self, byte_iter: Iterator[bytes]):
+        self._iter = byte_iter
+
+    def read(self, n: int) -> bytes:
+        if n == 0:
+            return b""
+        return next(self._iter, b"")
+
+
</code_context>

<issue_to_address>
**issue (bug_risk):** The read method ignores the requested byte count 'n'.

IterReader.read should accumulate data from the iterator until 'n' bytes are returned or the iterator is exhausted, to match standard file-like behavior.
</issue_to_address>

### Comment 2
<location> `src/ijson/adapters.py:22-25` </location>
<code_context>
+    def __init__(self, byte_aiter: AsyncIterator[bytes]):
+        self._aiter = byte_aiter
+
+    async def read(self, n: int) -> bytes:
+        if n == 0:
+            return b""
+        return await anext(self._aiter, b"")
+
+
</code_context>

<issue_to_address>
**issue (bug_risk):** AiterReader.read does not respect the 'n' parameter.

The async read method should accumulate data until 'n' bytes are read or the iterator is exhausted, to match expected behavior.
</issue_to_address>

### Comment 3
<location> `tests/test_adapters.py:24-30` </location>
<code_context>
+    return chunks()
+
+
+def test_from_iter_read0_does_not_consume():
+    chunks = [b'{"key":', b'"value"}']
+    file_obj = ijson.from_iter(iter(chunks))
+    assert file_obj.read(0) == b""
+    assert file_obj.read(1) == b'{"key":'
+    assert file_obj.read(1) == b'"value"}'
+    assert file_obj.read(1) == b""
+
+
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding tests for empty input iterators.

Please add tests for both `from_iter` and `from_aiter` using empty iterables to confirm they return b'' on read and handle empty input correctly.

```suggestion
def test_from_iter_read0_does_not_consume():
    chunks = [b'{"key":', b'"value"}']
    file_obj = ijson.from_iter(iter(chunks))
    assert file_obj.read(0) == b""
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

def test_from_iter_empty_iterator():
    file_obj = ijson.from_iter(iter([]))
    assert file_obj.read() == b""
    assert file_obj.read(1) == b""
    assert file_obj.read(0) == b""

import pytest
import asyncio

@pytest.mark.asyncio
async def test_from_aiter_empty_iterator():
    async def empty_aiter():
        if False:
            yield  # never yields
    file_obj = await ijson.from_aiter(empty_aiter())
    assert await file_obj.read() == b""
    assert await file_obj.read(1) == b""
    assert await file_obj.read(0) == b""
```
</issue_to_address>

### Comment 4
<location> `tests/test_adapters.py:33-38` </location>
<code_context>
+    assert file_obj.read(1) == b""
+
+
+def test_from_iter_accepts_iterable():
+    chunks = [b'{"key":', b'"value"}']
+    file_obj = ijson.from_iter(chunks)  # no iter(...)
+    assert file_obj.read(1) == b'{"key":'
+    assert file_obj.read(1) == b'"value"}'
+    assert file_obj.read(1) == b""
+
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests for non-standard iterables and error conditions.

Please add tests for generators, tuples, and invalid inputs to verify correct exception handling and behavior.

```suggestion
def test_from_iter_accepts_iterable():
    chunks = [b'{"key":', b'"value"}']
    file_obj = ijson.from_iter(chunks)  # no iter(...)
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

def test_from_iter_accepts_generator():
    def chunk_gen():
        yield b'{"key":'
        yield b'"value"}'
    file_obj = ijson.from_iter(chunk_gen())
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

def test_from_iter_accepts_tuple():
    chunks = (b'{"key":', b'"value"}')
    file_obj = ijson.from_iter(chunks)
    assert file_obj.read(1) == b'{"key":'
    assert file_obj.read(1) == b'"value"}'
    assert file_obj.read(1) == b""

import pytest

@pytest.mark.parametrize("invalid_input", [
    "not bytes",  # string, not bytes
    123,          # integer, not iterable
    None,         # NoneType
    [123, 456],   # list of non-bytes
])
def test_from_iter_invalid_inputs_raise(invalid_input):
    with pytest.raises(Exception):
        ijson.from_iter(invalid_input)
```
</issue_to_address>

### Comment 5
<location> `README.rst:668-650` </location>
<code_context>
+               async with client.stream('GET', 'https://jsonplaceholder.typicode.com/posts') as resp:
+                   resp.raise_for_status()
+                   f = ijson.from_aiter(resp.aiter_bytes())
+                   async for item in ijson.items(f, 'item'):
+                      print(f"post id = {post['id']}, \t title: {post['title']}")

-   The first alternative is best, since ``requests`` will automatically decode
</code_context>

<issue_to_address>
**issue:** Variable name mismatch in async example print statement.

The print statement should reference 'item["id"]' and 'item["title"]' to match the loop variable.
</issue_to_address>

### Comment 6
<location> `src/ijson/adapters.py:11-13` </location>
<code_context>
    def read(self, n: int) -> bytes:
        if n == 0:
            return b""
        return next(self._iter, b"")

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return b"" if n == 0 else next(self._iter, b"")
```
</issue_to_address>

### Comment 7
<location> `src/ijson/adapters.py:23-25` </location>
<code_context>
    async def read(self, n: int) -> bytes:
        if n == 0:
            return b""
        return await anext(self._aiter, b"")

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return b"" if n == 0 else await anext(self._aiter, b"")
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +11 to +13
if n == 0:
return b""
return next(self._iter, b"")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
if n == 0:
return b""
return next(self._iter, b"")
return b"" if n == 0 else next(self._iter, b"")

Comment on lines +23 to +25
if n == 0:
return b""
return await anext(self._aiter, b"")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): We've found these issues:

Suggested change
if n == 0:
return b""
return await anext(self._aiter, b"")
return b"" if n == 0 else await anext(self._aiter, b"")

Copy link

@rtobar rtobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @mattmorgis for what looks like a nice piece of work including documentation updates and tests.

I'm currently away and won't be back until early October, so I don't expect to be able to merge this until then (but it might happen, who knows). In the meantime I still have this a look and I left a couple of initial comments. Overall the solution is fine, it's mostly some.smsll.tweaks to better align things with the rest of the codebase. Thanks again!

BTW, I didn't really look into the AI review, this is turned on by default on my organisation but I don't personally use it. This is to say that you shouldn't feel.obliged to follow it's directions.

return IterReader(iter(byte_iter))


def from_aiter(byte_aiter: AsyncIterable[bytes]) -> AiterReader:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with the main entry points for ijson, I'd rather we had a single from_iter function here that handled both sync and async cases, returning the correct Reader instance depending on the input iterable type (you can check for the presence of __aiter__ for instance).

assert file_obj.read(1) == b""


def test_from_iter_basic_parse(backend, chunks):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the two first tests above that test from_iter directly are fine here. Same for the rest of the test below that do the same.

However for the rest I would have gone very differently. Please have a look at contest.py, where test cases are dynamically generated for each backend based on different "backend adaptor", each basically feeding ijson data in a different way (str, bytes, sync/async file-like objects) and for each backend. These are then used to run a number of tests that venter around the actual.functions (basic_parse, parse, etc). I think most tests that follow in this module are (most probably because I didn't actually check) a small subset of those function-oriented test cases.

So what I'd do is provide two new of these "backend adaptors" for the sync/async cases. They'd take the test input and create iterables out of them (iterating over 1 character/byte at a time, or maybe even by random amounts), then pass those through ijson.from_iter, and finally feed that to the ijson routine under test. That would make many of these tests below unnecessary, and would ensure all test cases work as expected with iterable inputs. Let me know if you need any guidance with this.



def from_iter(byte_iter: Iterable[bytes]) -> IterReader:
"""Convert a synchronous byte iterable to a file-like object."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support both str and bytes for file -like objects, it'd be nice to support both too here. Shouldn't be too hard? Just need to encode strings with utf8, with a warning like we do in the utils.module.

See the options_ section for details.

#. How do I use ijson with the ``requests`` library
#. **Q**: How do I use ijson with ``requests`` or ``httpx``
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency please also fix the starting line of the answer to start with **A**

import httpx, ijson, asyncio
* Pass the ``Response.raw`` object (the underlying ``socket.socket``) to ijson.
async def main():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very simple code sample, I think we an remove this main (and it's invocation with asyncii.run) to remove some.indentation and keep the code a bit easier to the eye.

import requests
import ijson
with requests.get('https://jsonplaceholder.typicode.com/posts', stream=True) as resp:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Popular as it may be, I'd rather not reference any specific website in particular; instead I'd keep it intentionally vague like in the other examples where only http://... is shown. Same for the actual loop where each item is processed.

@rtobar
Copy link

rtobar commented Sep 18, 2025

@mattmorgis about aiter/anext, I actually hadn't realised they were that new.. I'd rather keep this PR self-contained, so maybe you could provide very simple aiter/anext implementations in the same adapter module for python versions < 3.10 that can be later removed once support for 3.9 is removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants