Skip to content

Minor docs corrections plus single-side-join docs rewrite #8229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: staging
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/integration/DataTypingAndSchemasHandling.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ types of data like JSON, Binary, and DB data. In each case format of these data

To provide consistent and proper support for these formats Nussknacker converts meta-information about data to its
own `Typing Information`, which is used on the Designer's part to hint and validate the data. Each part of the diagram
is statically validated and typed on an ongoing basis.
is statically validated and typed on an ongoing basis.

The internal Nussknacker's `Typing Information` "system" is based on Java datatypes. Consequently, the data mappings presented in the remaining part of this document map data types between the non Java datatypes and Java datatypes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mby just "typing" instead of "Typing information" system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thia, "system" is too much probably, I will remove it. However the document uses "Typing information" all over the place, so it is probably better to keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type system


## Avro schema

Expand Down
2 changes: 1 addition & 1 deletion docs/integration/OpenAPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ components {
| namePattern | false | .* | Regexp for filtering operations by operationId or by created service name (concatenation of HTTP method, path and parameters) |
| security | false | | Configuration for [authentication](https://swagger.io/docs/specification/authentication/) for each `securitySchemas` defined in the *OpenAPI interface definition* |
| security.*.type | false | | Type of security configuration for a given security schema. Currently only `apiKey` is supported |
| security.*.apiKeyValue | false | | API key that will be passed into the service via header, query parameter or cookie (depending on definition provided in OpenAPI) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it difficult to understand statement like "depending on definition provided in OpenAPI".

OpenAPI is too vague for me, it can mean standard defitinition, a particular service (interface) definition or maybe something else. I prefer to be more helpful here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho previous description is simpler (it doesn't use huge world "interface") and points out that behaviour is dependent on specific service configuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...as configured in the given service definition?

| security.*.apiKeyValue | false | | API key that will be passed to the service via header, query parameter or cookie, as indicated in the *OpenAPI interface definition* |
| secret | false | | Configuration for [authentication](https://swagger.io/docs/specification/authentication/) which matches any `securitySchemas` defined in the *OpenAPI interface definition*. This config entry has the same structure as values in `security` object (see above) |

## Operations
Expand Down
32 changes: 17 additions & 15 deletions docs/scenarios_authoring/AggregatesInTimeWindows.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,40 +193,42 @@ The `subscriberId` will be available in a `#totalTransfers.subscriberId` variabl

## Single-side-join

Single-side-join component is conceptually similar to components computing aggregates in time windows, so it is convenient to discuss it here. Conceptually Single-side-join is an equivalent of the [left (or right) join](https://www.w3schools.com/sql/sql_join.asp) . In SQL case, the left join returns all records from the left table, and the matched records from the right table. In Nussknacker's case the Single-side-join will join two ‘branches’ of a scenario - the Main branch and the Joined branch and will **return exactly as many events as there were in the Main branch**. Even if no events will be matched in the Joined branch, an event will be emitted, with the value corresponding to the aggregator selected - null for List and Set, 0 for Sum, null for Min and Max. **The time window boundaries will be determined by the event coming from the main branch** and will be in the range of \[main-branch-event-event-time, main-branch-event-event-time + windowLength\].

Under the hood single-side-join uses sliding-window on the JOINED branch to deliver events to the MAIN branch. You can choose which aggregate function to use on the JOINED branch; yet `Last` is probably the most natural choice if you want the most recent value seen in the JOINED branch.

Single-side-join can be an attractive and very fast alternative to database lookup's if you have enough memory to stream your whole lookup table and (if needed) you are able to stream changes of the look up table records. To use single-side-join as a very fast lookup, configure the topic containing the lookup table values as a **JOINED** branch. Make sure that you set window length to value high enough to ensure that there are always events which qualify to the window in the JOINED branch.
Single-side-join component is conceptually similar to components computing aggregates in time windows, so it is convenient to discuss it here. Conceptually Single-side-join is an equivalent of the [left (or right) join](https://www.w3schools.com/sql/sql_join.asp) . In SQL case, the left join returns all records from the left table, and the matched records from the right table.

![alt_text](img/singleSideJoinConcept.png "single-side-join")

In Nussknacker's case the Single-side-join joins two ‘branches’ of a scenario - the MAIN branch and the JOINED branch. For every event coming from the MAIN branch, the single-side-join will attempt to enrich it with the result of the `aggregator` function, which will act on events from the JOINED branch. If no events are matched with the JOINED branch, an event will be emitted, with the aggregator result corresponding to the aggregator selected - null for List and Set, 0 for Sum, null for Min and Max. **The time window boundaries will be determined by the event coming from the MAIN branch** and will be in the range of \[main-branch-event-event-time - windowLength, main-branch-event-event-time\].

Because there are no tables and table names to refer to, Nussknacker will derive names of the branches to join from the names of nodes taking part in the Single-side-join. Let’s consider an example where there is a topic containing alerts about subscribers; for every alert generated for the subscriber we want to track all events generated by this subscriber in the 24 hours **preceding** the alert event. The Nussknacker scenario would look like in the picture below.
Under the hood single-side-join uses sliding-window on the JOINED branch to deliver events to the MAIN branch. Consequently, the events coming from the MAIN branch are enriched with the aggregator result and events coming from the JOINED branch are terminated.

![alt_text](img/singleSideJoinScenario.png "single-side-join in an example scenario")
Single-side-join can be an attractive and very fast alternative to database lookup's if you have enough memory to stream your whole lookup table and (if needed) you are able to stream changes of the look up table records. To use single-side-join as a very fast lookup, configure the topic containing the lookup table values as a JOINED branch. Make sure that you set window length to value high enough to ensure that there are always events which qualify to the window in the JOINED branch.


The configuration of the Single-side-join would be as in the picture below; note how Nussknacker Designer helps you to decide which branch is which.
### Parameters

* **branchType** - either `MAIN` or `JOINED`
* **key** - value used to join branches; separate value for each branch. The expression in this field can refer to events and their associated variables (including `#input`) from their respective branches only.
* **aggregateBy** - the input to the `aggregator` function. Only variables associated with the event from the JOINED branch can be used in the `aggregateBy` expression.
* **Ouput variable name** - the name of the variable which will hold the results of the `aggregator` function
* **aggregator** - aggregator function.

![alt_text](img/singleSideJoin.png "image_tooltip")
### Example

There are couple fine points to make here:
Let’s consider an example where there is a topic containing alerts about subscribers; for every alert generated for the subscriber we want to track all events generated by this subscriber in the 24 hours **preceding** the alert event. The Nussknacker scenario would look like in the picture below.

* The time window (of 1 day in our case) will be **closed** upon arrival of the event with the given `#input.subscriber` value.
* The `#input` variable used in the aggregateBy field holds the content of the event “arriving” from the joined branch. This variable will be available downstream.
* The events which arrive from the joined branch are terminated.
* The `#outputVar` will be available downstream of the outer-join aggregate.
![alt_text](img/singleSideJoinScenario.png "single-side-join in an example scenario")

The configuration of the Single-side-join would be as in the picture below; note how Nussknacker Designer helps you to decide which branch is which.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note how Nussknacker Designer helps you to decide which branch is which. <- imho we can skip this part.


![alt_text](img/singleSideJoin.png "image_tooltip")

## Full-outer-join

Full-outer-join is Nussknacker's version of SQLs full outer join. It works much like single-side-join,
but it has aggregates for both branches and emits a new event for every event it receives. Every time
a new event is received, it is matched with events with the same key, then the aggregate for the appropriate
branch is updated, and values of aggregates for both branches are returned. If an event cannot be matched,
then a new event is still emitted, but some aggregates have a value of zero.
then a new event is still emitted, but the the aggregate for the branch with no matched events has value of null or zero, depending on the selected aggregator function.

![alt_text](img/fullOuterJoin.png "full-outer-join interface")

Expand Down
14 changes: 9 additions & 5 deletions docs/scenarios_authoring/DataSourcesAndSinks.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,16 @@ variable name" parameter will be present instead:

![previous_value_window](img/previous_value_window.png)

`previousValue` stores arbitrary value for the given key. This element has two parameters:
- groupBy - expression defining key for which we compute aggregate, e.g. `#input.userId`
- value - stored value
The `PreviousValue` node recalls a value that was captured during a previous occurrence of an event with a specific key. It allows you to access the *last seen* value of a given expression for each group of events defined by a grouping key.

#### Parameters

- **Output variable name** - The recalled previous value will be available under this variable name.
- **key** - Defines how input events are grouped. The component keeps track of the last value separately for each unique key, based on this expression.
- **value** - The value to remember. It will be recalled when another event with the same group key arrives.

For example, given stream of events which contain users with their current location, when we set
- groupBy is `#input.userId`
- key is `#input.userId`
- value is `#input.location`

then the value of the output variable is the previous location for the current user. If this is the first appearance
Expand All @@ -182,7 +186,7 @@ of this user, the **current** location will be returned.
**(Flink engine only)**

Holds event in the node until
*event time* + `delay` >= max (*event time* ever seen by the delay node).
*event time* + `delay` >= max (*event time* ever seen by the Delay node).

The `key` parameter will be removed in the future release of Nussknacker, for the time being, configure it to `#inputMeta.key`.

Expand Down