-
Notifications
You must be signed in to change notification settings - Fork 56
Add iterator adapters #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR introduces two iterator adapters—from_iter and from_aiter—backed by IterReader and AiterReader classes that leverage Python 3.10 iterator helpers, updates the package export, enriches the README with sync and async usage examples, and adds comprehensive tests covering read behavior and parsing scenarios. Class diagram for new iterator adapter classesclassDiagram
class IterReader {
- _iter: Iterator[bytes]
+ __init__(byte_iter: Iterator[bytes])
+ read(n: int) bytes
}
class AiterReader {
- _aiter: AsyncIterator[bytes]
+ __init__(byte_aiter: AsyncIterator[bytes])
+ read(n: int) bytes
}
class from_iter {
+ from_iter(byte_iter: Iterable[bytes]) IterReader
}
class from_aiter {
+ from_aiter(byte_aiter: AsyncIterable[bytes]) AiterReader
}
from_iter --> IterReader
from_aiter --> AiterReader
Class diagram for integration with ijsonclassDiagram
class IterReader {
+ read(n: int) bytes
}
class AiterReader {
+ read(n: int) bytes
}
class ijson {
+ from_iter(iterable_of_bytes) IterReader
+ from_aiter(async_iterable_of_bytes) AiterReader
}
ijson --> IterReader
ijson --> AiterReader
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/ijson/adapters.py:10-13` </location>
<code_context>
+ def __init__(self, byte_iter: Iterator[bytes]):
+ self._iter = byte_iter
+
+ def read(self, n: int) -> bytes:
+ if n == 0:
+ return b""
+ return next(self._iter, b"")
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The read method ignores the requested byte count 'n'.
IterReader.read should accumulate data from the iterator until 'n' bytes are returned or the iterator is exhausted, to match standard file-like behavior.
</issue_to_address>
### Comment 2
<location> `src/ijson/adapters.py:22-25` </location>
<code_context>
+ def __init__(self, byte_aiter: AsyncIterator[bytes]):
+ self._aiter = byte_aiter
+
+ async def read(self, n: int) -> bytes:
+ if n == 0:
+ return b""
+ return await anext(self._aiter, b"")
+
+
</code_context>
<issue_to_address>
**issue (bug_risk):** AiterReader.read does not respect the 'n' parameter.
The async read method should accumulate data until 'n' bytes are read or the iterator is exhausted, to match expected behavior.
</issue_to_address>
### Comment 3
<location> `tests/test_adapters.py:24-30` </location>
<code_context>
+ return chunks()
+
+
+def test_from_iter_read0_does_not_consume():
+ chunks = [b'{"key":', b'"value"}']
+ file_obj = ijson.from_iter(iter(chunks))
+ assert file_obj.read(0) == b""
+ assert file_obj.read(1) == b'{"key":'
+ assert file_obj.read(1) == b'"value"}'
+ assert file_obj.read(1) == b""
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding tests for empty input iterators.
Please add tests for both `from_iter` and `from_aiter` using empty iterables to confirm they return b'' on read and handle empty input correctly.
```suggestion
def test_from_iter_read0_does_not_consume():
chunks = [b'{"key":', b'"value"}']
file_obj = ijson.from_iter(iter(chunks))
assert file_obj.read(0) == b""
assert file_obj.read(1) == b'{"key":'
assert file_obj.read(1) == b'"value"}'
assert file_obj.read(1) == b""
def test_from_iter_empty_iterator():
file_obj = ijson.from_iter(iter([]))
assert file_obj.read() == b""
assert file_obj.read(1) == b""
assert file_obj.read(0) == b""
import pytest
import asyncio
@pytest.mark.asyncio
async def test_from_aiter_empty_iterator():
async def empty_aiter():
if False:
yield # never yields
file_obj = await ijson.from_aiter(empty_aiter())
assert await file_obj.read() == b""
assert await file_obj.read(1) == b""
assert await file_obj.read(0) == b""
```
</issue_to_address>
### Comment 4
<location> `tests/test_adapters.py:33-38` </location>
<code_context>
+ assert file_obj.read(1) == b""
+
+
+def test_from_iter_accepts_iterable():
+ chunks = [b'{"key":', b'"value"}']
+ file_obj = ijson.from_iter(chunks) # no iter(...)
+ assert file_obj.read(1) == b'{"key":'
+ assert file_obj.read(1) == b'"value"}'
+ assert file_obj.read(1) == b""
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests for non-standard iterables and error conditions.
Please add tests for generators, tuples, and invalid inputs to verify correct exception handling and behavior.
```suggestion
def test_from_iter_accepts_iterable():
chunks = [b'{"key":', b'"value"}']
file_obj = ijson.from_iter(chunks) # no iter(...)
assert file_obj.read(1) == b'{"key":'
assert file_obj.read(1) == b'"value"}'
assert file_obj.read(1) == b""
def test_from_iter_accepts_generator():
def chunk_gen():
yield b'{"key":'
yield b'"value"}'
file_obj = ijson.from_iter(chunk_gen())
assert file_obj.read(1) == b'{"key":'
assert file_obj.read(1) == b'"value"}'
assert file_obj.read(1) == b""
def test_from_iter_accepts_tuple():
chunks = (b'{"key":', b'"value"}')
file_obj = ijson.from_iter(chunks)
assert file_obj.read(1) == b'{"key":'
assert file_obj.read(1) == b'"value"}'
assert file_obj.read(1) == b""
import pytest
@pytest.mark.parametrize("invalid_input", [
"not bytes", # string, not bytes
123, # integer, not iterable
None, # NoneType
[123, 456], # list of non-bytes
])
def test_from_iter_invalid_inputs_raise(invalid_input):
with pytest.raises(Exception):
ijson.from_iter(invalid_input)
```
</issue_to_address>
### Comment 5
<location> `README.rst:668-650` </location>
<code_context>
+ async with client.stream('GET', 'https://jsonplaceholder.typicode.com/posts') as resp:
+ resp.raise_for_status()
+ f = ijson.from_aiter(resp.aiter_bytes())
+ async for item in ijson.items(f, 'item'):
+ print(f"post id = {post['id']}, \t title: {post['title']}")
- The first alternative is best, since ``requests`` will automatically decode
</code_context>
<issue_to_address>
**issue:** Variable name mismatch in async example print statement.
The print statement should reference 'item["id"]' and 'item["title"]' to match the loop variable.
</issue_to_address>
### Comment 6
<location> `src/ijson/adapters.py:11-13` </location>
<code_context>
def read(self, n: int) -> bytes:
if n == 0:
return b""
return next(self._iter, b"")
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return b"" if n == 0 else next(self._iter, b"")
```
</issue_to_address>
### Comment 7
<location> `src/ijson/adapters.py:23-25` </location>
<code_context>
async def read(self, n: int) -> bytes:
if n == 0:
return b""
return await anext(self._aiter, b"")
</code_context>
<issue_to_address>
**suggestion (code-quality):** We've found these issues:
- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
```suggestion
return b"" if n == 0 else await anext(self._aiter, b"")
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if n == 0: | ||
return b"" | ||
return next(self._iter, b"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): We've found these issues:
- Lift code into else after jump in control flow (
reintroduce-else
) - Replace if statement with if expression (
assign-if-exp
)
if n == 0: | |
return b"" | |
return next(self._iter, b"") | |
return b"" if n == 0 else next(self._iter, b"") |
if n == 0: | ||
return b"" | ||
return await anext(self._aiter, b"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): We've found these issues:
- Lift code into else after jump in control flow (
reintroduce-else
) - Replace if statement with if expression (
assign-if-exp
)
if n == 0: | |
return b"" | |
return await anext(self._aiter, b"") | |
return b"" if n == 0 else await anext(self._aiter, b"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @mattmorgis for what looks like a nice piece of work including documentation updates and tests.
I'm currently away and won't be back until early October, so I don't expect to be able to merge this until then (but it might happen, who knows). In the meantime I still have this a look and I left a couple of initial comments. Overall the solution is fine, it's mostly some.smsll.tweaks to better align things with the rest of the codebase. Thanks again!
BTW, I didn't really look into the AI review, this is turned on by default on my organisation but I don't personally use it. This is to say that you shouldn't feel.obliged to follow it's directions.
return IterReader(iter(byte_iter)) | ||
|
||
|
||
def from_aiter(byte_aiter: AsyncIterable[bytes]) -> AiterReader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with the main entry points for ijson, I'd rather we had a single from_iter
function here that handled both sync and async cases, returning the correct Reader instance depending on the input iterable type (you can check for the presence of __aiter__
for instance).
assert file_obj.read(1) == b"" | ||
|
||
|
||
def test_from_iter_basic_parse(backend, chunks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the two first tests above that test from_iter directly are fine here. Same for the rest of the test below that do the same.
However for the rest I would have gone very differently. Please have a look at contest.py, where test cases are dynamically generated for each backend based on different "backend adaptor", each basically feeding ijson data in a different way (str, bytes, sync/async file-like objects) and for each backend. These are then used to run a number of tests that venter around the actual.functions (basic_parse, parse, etc). I think most tests that follow in this module are (most probably because I didn't actually check) a small subset of those function-oriented test cases.
So what I'd do is provide two new of these "backend adaptors" for the sync/async cases. They'd take the test input and create iterables out of them (iterating over 1 character/byte at a time, or maybe even by random amounts), then pass those through ijson.from_iter, and finally feed that to the ijson routine under test. That would make many of these tests below unnecessary, and would ensure all test cases work as expected with iterable inputs. Let me know if you need any guidance with this.
|
||
|
||
def from_iter(byte_iter: Iterable[bytes]) -> IterReader: | ||
"""Convert a synchronous byte iterable to a file-like object.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support both str and bytes for file -like objects, it'd be nice to support both too here. Shouldn't be too hard? Just need to encode strings with utf8, with a warning like we do in the utils.module.
See the options_ section for details. | ||
|
||
#. How do I use ijson with the ``requests`` library | ||
#. **Q**: How do I use ijson with ``requests`` or ``httpx`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency please also fix the starting line of the answer to start with **A**
import httpx, ijson, asyncio | ||
* Pass the ``Response.raw`` object (the underlying ``socket.socket``) to ijson. | ||
async def main(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very simple code sample, I think we an remove this main (and it's invocation with asyncii.run) to remove some.indentation and keep the code a bit easier to the eye.
import requests | ||
import ijson | ||
with requests.get('https://jsonplaceholder.typicode.com/posts', stream=True) as resp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Popular as it may be, I'd rather not reference any specific website in particular; instead I'd keep it intentionally vague like in the other examples where only http://...
is shown. Same for the actual loop where each item is processed.
@mattmorgis about aiter/anext, I actually hadn't realised they were that new.. I'd rather keep this PR self-contained, so maybe you could provide very simple aiter/anext implementations in the same adapter module for python versions < 3.10 that can be later removed once support for 3.9 is removed. |
This adds two new utility functions to convert iterators and async iterators into file like objects that can be used with ijson:
ijson.from_iter(iterable_of_bytes)
ijson.from_aiter(async_iterable_of_bytes)
I updated the README with two working examples.
This implementation uses both
aiter()
andanext()
which are only available in Python 3.10+. With Python 3.9 reaching EOL in October 2025, I wanted to also see if there was any appetite for dropping 3.9 support?Related Issues:
#44
#124
#147
#58
Summary by Sourcery
Introduce iterator adapters to convert synchronous and asynchronous byte iterators into file-like objects usable by ijson, update documentation with examples, and add tests for both sync and async adapters.
New Features:
Enhancements:
Documentation:
Tests: