Skip to content

Conversation

@erlendvollset
Copy link
Collaborator

  • Add support for streams
  • Add basic support for records
  • Raise FeaturePreviewWarning on all invocations of stream/record methods
  • Use new warn_on_all_method_invocations decorator across the board

Description

Please describe the change you have made.

Checklist:

  • Tests added/updated.
  • Documentation updated. Documentation is generated from docstrings - these must be updated according to your change.
    If a new method has been added it should be referenced in cognite.rst in order to generate docs based on its docstring.
  • Changelog updated in CHANGELOG.md.
  • Version bumped. If triggering a new release is desired, bump the version number in _version.py and pyproject.toml per semantic versioning.

@gemini-code-assist
Copy link
Contributor

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

@erlendvollset erlendvollset force-pushed the streams-and-records branch 2 times, most recently from 344b827 to 6cfb164 Compare August 12, 2025 14:58
@codecov
Copy link

codecov bot commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 80.93842% with 65 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.67%. Comparing base (67b9b76) to head (bcd743f).

Files with missing lines Patch % Lines
cognite/client/_api/data_modeling/records.py 41.30% 27 Missing ⚠️
...gnite/client/data_classes/data_modeling/records.py 85.36% 12 Missing ⚠️
...ite/client/data_classes/data_modeling/instances.py 76.31% 9 Missing ⚠️
cognite/client/_api/data_modeling/streams.py 80.55% 7 Missing ⚠️
cognite/client/data_classes/data_modeling/ids.py 54.54% 5 Missing ⚠️
...gnite/client/data_classes/data_modeling/streams.py 93.15% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2246      +/-   ##
==========================================
- Coverage   90.82%   90.67%   -0.16%     
==========================================
  Files         170      174       +4     
  Lines       25666    25898     +232     
==========================================
+ Hits        23312    23483     +171     
- Misses       2354     2415      +61     
Files with missing lines Coverage Δ
cognite/client/_api/agents/agents.py 100.00% <100.00%> (ø)
cognite/client/_api/data_modeling/__init__.py 100.00% <100.00%> (ø)
...nite/client/_api/hosted_extractors/destinations.py 94.44% <100.00%> (+1.11%) ⬆️
cognite/client/_api/hosted_extractors/jobs.py 88.57% <100.00%> (+0.10%) ⬆️
cognite/client/_api/hosted_extractors/mappings.py 96.00% <100.00%> (+1.35%) ⬆️
cognite/client/_api/hosted_extractors/sources.py 95.08% <100.00%> (+1.05%) ⬆️
cognite/client/_api/simulators/__init__.py 100.00% <100.00%> (ø)
cognite/client/_api/simulators/integrations.py 96.66% <100.00%> (-0.11%) ⬇️
cognite/client/_api/simulators/logs.py 100.00% <100.00%> (ø)
cognite/client/_api/simulators/models.py 98.27% <100.00%> (-0.06%) ⬇️
... and 14 more

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@erlendvollset erlendvollset force-pushed the streams-and-records branch 2 times, most recently from de116c6 to 4eb65aa Compare August 14, 2025 10:32
@erlendvollset erlendvollset marked this pull request as ready for review August 14, 2025 10:32
@erlendvollset erlendvollset requested review from a team as code owners August 14, 2025 10:32
@evertoncolling evertoncolling changed the title Alpha support for streams and records feat(records): alpha support for streams and records Oct 2, 2025
resource_cls=Stream,
method="GET",
chunk_size=chunk_size,
limit=limit,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: we do not have limit on listStream atm.

https://api-docs.cogheim.net/redoc/#tag/Streams/operation/listStream

>>> from cognite.client import CogniteClient
>>> client = CogniteClient()
>>> client.data_modeling.streams.delete(streams=["myStream", "myOtherStream"])
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Streams api deviate from cognite api standard on delete (bug has been opened on this), which is generally to do post with items list under , instead we have a issue a DELETE request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only accept one item in the list of streams to be deleted and created btw..

self,
stream: str,
*,
last_updated_time: LastUpdatedRange | None = None,
Copy link

@asosnovski asosnovski Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought. This parameter is required for immutable streams. Should we have a default value for it, or should we maybe require an explicit None for mutable streams so that SDK users have to think which stream they are querying?

@warn_on_all_method_invocations(
FeaturePreviewWarning(api_maturity="alpha", sdk_maturity="alpha", feature_name="Records API")
)
class RecordsAPI(APIClient):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about aggregate endpoint?

Fetches streams as they are iterated over, so you keep a limited number of streams in memory.

Args:
chunk_size (int | None): Number of streams to return in each chunk. Defaults to yielding one stream a time.
Copy link

@asosnovski asosnovski Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand what this thing does, so maybe my worries are groundless. But streams endpoint will have pretty strict rate/concurrency limits. Why have a chunk size of 1, when by default a project can have no more than 10 streams? Maybe set something like 20 to avoid unnecessary API calls?


Get multiple streams by id:

>>> res = client.data_modeling.streams.retrieve(streams=["MyStream", "MyAwesomeStream", "MyOtherStream"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is concerning. How many IDs can be passed this way? Our endpoint allows retrieving only 1 stream by ID, so this results in multiple API invocations, right? And stream endpoints have very strict limits: https://cognitedata.atlassian.net/wiki/spaces/RPILA/pages/4797431945/Design+ILA+rate+and+concurrency+limits#Stream-endpoint-limits Invoking this method with >5 IDs basically guarantees throttling.

"""
return self()

def retrieve(self, external_id: str) -> Stream | None:
Copy link

@asosnovski asosnovski Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about includeStatistics parameter? NB! Statistics calculation is potentially expensive, that's why we have it in the get stream endpoint but not in the list streams. And allowing to get multiple streams by ID with one method invocation kind of circumvents this.


>>> from cognite.client import CogniteClient
>>> client = CogniteClient()
>>> client.data_modeling.streams.delete(streams=["myStream", "myOtherStream"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought. Streams are intended to be long-lived, and customers should think twice before creating or deleting them. A deleted stream will stay for up to 6 weeks in a soft-deleted state, all this time consuming capacity, incurring costs and preventing another stream with the same name from being created. Should we really allow customers to delete multiple streams with 1 request?

"""`List streams <https://developer.cognite.com/api#tag/Streams/operation/listStreamsV3>`_

Args:
limit (int | None): Maximum number of streams to return. Defaults to 10. Set to -1, float("inf") or None to return all items.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned by Andreas, there is no limit. The endpoint returns all the streams there are, as a project isn't supposed to have many.

@overload
def create(self, streams: StreamWrite) -> Stream: ...

def create(self, streams: StreamWrite | Sequence[StreamWrite]) -> Stream | StreamList:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought. Streams are intended to be long-lived, and customers should think twice before creating or deleting them. A deleted stream will stay for up to 6 weeks in a soft-deleted state, all this time consuming capacity, incurring costs and preventing another stream with the same name from being created. Should we really allow customers to create multiple streams with 1 request? Especially considering that the endpoint will have only 1 rps limit, so calling this method with multiple stream names would automatically mean throttling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants