[NU-2154] Standalone Flink live data #8231

mgoworko · 2025-06-12T10:40:33Z

Describe your changes

This change introduces basic mechanism of synchronizing live data from standalone Flink jobs using the Designer db. This mechanism may be suitable for demonstrational purposes or low/medium traffic environments. It may be a base for future implementation of betten synchronization methods, using for example Redis.

Checklist before merge

Related issue ID is placed at the beginning of PR title in [brackets] (can be GH issue or Nu Jira issue)
Code is cleaned from temporary changes and commented out lines
Parts of the code that are not easy to understand are documented in the code
Changes are covered by automated tests
Showcase in dev-application.conf added to demonstrate the feature
Documentation added or updated
Added entry in Changelog.md describing the change from the perspective of a public distribution user
Added MigrationGuide.md entry in the appropriate subcategory if introducing a breaking change
Verify that PR will be squashed during merge

…-data

mgoworko · 2025-06-12T10:44:00Z

...ink/executor/src/main/scala/pl/touk/nussknacker/engine/process/runner/FlinkScenarioJob.scala

@@ -40,7 +43,26 @@ class FlinkScenarioJob(modelData: ModelData) {
      env: StreamExecutionEnvironment,
      processListeners: List[ProcessListener],
  ): JobExecutionResult = {
-    val compilerFactory         = new FlinkProcessCompilerDataFactory(modelData, deploymentData, processListeners)
+    val liveDataCollectingListener =


It should be a safe change, because nothing changes in Flink jobs when db uploading of results is disabled.

mgoworko · 2025-06-12T10:44:33Z

e2e-tests/src/test/scala/pl/touk/nussknacker/DetectLargeTransactionSpec.scala

@@ -25,6 +37,477 @@ class DetectLargeTransactionSpec extends AnyFreeSpecLike with BaseE2ESpec with M
    eventually {
      val processedTransactions = client.readAllMessagesFromKafka("ProcessedTransactions")
      processedTransactions should equal(largeAmountTransactions)
+      given()


Added assertion of live data in the E2E test using standalone Flink.

# Conflicts: # designer/deployment-manager-api/src/main/scala/pl/touk/nussknacker/engine/api/deployment/DeploymentManager.scala # designer/server/src/main/scala/pl/touk/nussknacker/ui/api/description/scenarioTesting/ResultsWithCountsDto.scala # designer/server/src/main/scala/pl/touk/nussknacker/ui/api/description/scenarioTesting/ResultsWithCountsDtoCodecs.scala # designer/server/src/test/scala/pl/touk/nussknacker/ui/api/livedata/ScenarioLiveDataApiHttpServiceSpec.scala # engine/flink/management/src/main/scala/pl/touk/nussknacker/engine/management/jobrunner/FlinkMiniClusterScenarioJobRunner.scala

arkadius · 2025-06-17T09:58:46Z

build.sbt

@@ -2039,9 +2040,8 @@ lazy val liveDataCollector = (project in file("designer/live-data-collector"))
    ),
  )
  .dependsOn(
-    deploymentManagerApi % Provided,


Let's keep this dependency explicit even if it is transitively available via liveDataCollector. It should be easy to remove this feature if we decide to replace this approach with collector with something else. BTW I missed one thing in the previous PR. The collector module shouldn't be nested inside designer. We have the strategy for modules organization:

nested inside the designer module - things that are not operational, only needed during designing the scenario

nested inside the engine module - things that are used during runtime

on the root level - things that are shared between the designer and operational part (scenario runtime).

Folllowing this approach, the collector should be on the root level, the same as scenario-compiler

arkadius · 2025-06-17T10:18:45Z

components-api/src/main/scala/pl/touk/nussknacker/engine/ModelConfig.scala

    ) extends LiveDataPreviewMode

+    final case class DbUploader(


upload has a very strong connotation with files. We have a very similar mechanism that have a ready set of words for us - the metrics. In the metrics domain, metrics are registered in the registry and then they are reported to some service responsible for exposing them. Based on that we can call it the same manner. Also, when you pick the noun that will be exposed in the configuration, always take a look from the perspective, how you would descibe it in the documenation (even if you don't produce it yet) or during some casual talk with a user. You would rather use sentences such as "To configure the reporting mechanism/storage for live data, you should add ... configuration entry". The noun that was exposed is dedicated for "normal" users (administrators) instead of developers. It describe the mechanism instead of some syntetics role/class in the code. LSS - let's think about the domain model that we want to expose and based on that, let's pick names instead of thinking of implemenatation and then exposing the implementation details in the configuration.

Take a look how the best do this: https://github.com/apache/flink/blob/master/docs/content/docs/deployment/config.md You can find nouns such as "metrics", "execution", "logging", "checkpointing"

arkadius · 2025-06-17T10:21:11Z

...utor/src/main/scala/pl/touk/nussknacker/engine/process/runner/PeriodicLiveDataUploader.scala

+  private class OneShotSource extends SourceFunction[String] {
+
+    override def run(ctx: SourceFunction.SourceContext[String]): Unit = {
+      // emit once and sleep forever


Why sleep forever? BTW, it won't sleep forever but only until java process will be interrupted. Is it ok?

Let's use a new, Source api instead of a legacy one.

arkadius · 2025-06-17T10:24:36Z

...utor/src/main/scala/pl/touk/nussknacker/engine/process/runner/PeriodicLiveDataUploader.scala

+      processIdWithName: ProcessIdWithName,
+      dbUploader: DbUploader,
+  ): Unit = {
+    env


Can you write a few words of comment why you implemented it as a flink data stream processing pipeline? e.g. why is it not a Thread?

arkadius · 2025-06-17T10:26:07Z

...utor/src/main/scala/pl/touk/nussknacker/engine/process/runner/PeriodicLiveDataUploader.scala

+    @transient private var connection: Connection = _
+
+    override def open(openContext: OpenContext): Unit = {
+      Class.forName("org.postgresql.Driver")


It looks like a mistake (dead code).

arkadius · 2025-06-17T10:54:29Z

nussknacker-dist/src/universal/conf/application.conf

@@ -28,6 +28,22 @@ scenarioTypes {
          componentId: sink-kafka
        }
      }
+      allowEndingScenarioWithoutSink: true


This is our official, minimal configuration. We don't plan to enable it for everyone

arkadius · 2025-06-17T10:56:04Z

nussknacker-dist/src/universal/conf/application.conf

@@ -28,6 +28,22 @@ scenarioTypes {
          componentId: sink-kafka
        }
      }
+      allowEndingScenarioWithoutSink: true
+      liveDataPreview {


The same here.

arkadius · 2025-06-17T10:57:06Z

nussknacker-dist/src/universal/conf/application.conf

+      allowEndingScenarioWithoutSink: true
+      liveDataPreview {
+        enabled: true
+        maxNumberOfSamples: 20


We can hardcode this values as a default values for LiveDataPreviewMode.Enabled. Thanks to that, during configuration, only enabling will be necessary.

arkadius · 2025-06-17T10:59:43Z

designer/server/src/main/scala/pl/touk/nussknacker/ui/db/entity/LiveDataEntityFactory.scala

+
+  class LiveDataEntity(tag: Tag) extends TableWithSchema[LiveDataEntityData](tag, "live_data") {
+
+    def scenarioId: Rep[ProcessId] = column[ProcessId]("scenario_id")


For batch processing, we should have a (collectorId, activityId) as a primary key. The activityId for the deployment is a Flink's jobid. Let's leave a comment that currently, this mechanism is limited to the streaming scenarios.

arkadius · 2025-06-17T11:10:02Z

designer/server/src/main/scala/pl/touk/nussknacker/ui/process/livedata/LiveDataRepository.scala

+      }
+  }
+
+  private def removeOldEntries(


What will happen if someone redeploy the scenario during uploadIntervalInSeconds? I guess that you will aggregate results from the last job with the new one? Is it ok?

For mini cluster, we keep results after scenario is stopped. For the standalone Flink, not?

WDYT about an approach, that we collect live data for (jobid, collectorid) and during reading we take the last deployment and return result for jobid = deploymentid? Thanks to that we avoid weak assumptions, magic numbers etc.

arkadius · 2025-06-17T11:55:36Z

...cala/pl/touk/nussknacker/engine/management/jobrunner/FlinkMiniClusterScenarioJobRunner.scala

@@ -79,11 +60,8 @@ class FlinkMiniClusterScenarioJobRunner(

  override def liveDataPreviewSupport: LiveDataPreviewSupport = {


Why liveDataPreviewSupport is a part of ScenarioJobRunner if it is not related with it? I don't see any occurrencces of this method usage from the job runner.

arkadius · 2025-06-17T11:57:59Z

...ink/executor/src/main/scala/pl/touk/nussknacker/engine/process/runner/FlinkScenarioJob.scala

+          val processIdWithName = ProcessIdWithName(processVersion.processId, processVersion.processName)
+          dbUploaderOpt.foreach(PeriodicLiveDataUploader.register(env, processIdWithName, _))
+          Some(
+            LiveDataCollectingListenerHolder.createListenerFor(


When do you close the allocated resources for jobs that have been run on a remote flink cluster?

arkadius · 2025-06-17T11:59:36Z

designer/server/src/main/scala/pl/touk/nussknacker/ui/process/livedata/LiveDataRepository.scala

+        .filter(_.scenarioId === processIdWithName.id)
+        .filter(
+          _.updatedAt < Instant.now.getEpochSecond - uploadIntervalInSeconds - 5
+        ) // Drop data older than interval + 5 seconds


Why drop? why older than interval + 5 seconds? why not 0 seconds or 1 minute? (magic number)

arkadius · 2025-06-17T12:02:07Z

designer/server/src/main/scala/pl/touk/nussknacker/ui/process/livedata/LiveDataRepository.scala

+      maxNumberOfSamples: Int,
+  ): Map[NodeTransition, LiveDataForNodeTransition] = {
+    data.flatten
+      .groupBy(_._1)


You can use toGroupedMap. Thanks to that we avoid this meaningless _1, _2

mgoworko added 14 commits June 6, 2025 15:05

qs

6516a01

qs

f6b2257

qs

ff8b2ef

qs

3e18862

Merge remote-tracking branch 'origin/staging' into preview/flink-live…

32c8a89

…-data

qs

1591d15

Merge remote-tracking branch 'origin/staging' into preview/flink-live…

4f05239

…-data

qs

d676013

qs

2cae3a7

qs

e8234d6

Merge remote-tracking branch 'origin/staging' into preview/flink-live…

3bd2d70

…-data

test

c500e79

qs

83355f0

Merge remote-tracking branch 'origin/staging' into link-live-data

634d693

github-actions bot added client client main fe docs ui labels Jun 12, 2025

qs

d855a78

mgoworko commented Jun 12, 2025

View reviewed changes

mgoworko added 4 commits June 12, 2025 15:47

qs

25fa1c7

qs

07d175a

qs

1b5ca27

github-actions bot removed ui client client main fe labels Jun 12, 2025

mgoworko added 4 commits June 12, 2025 18:24

qs

a0d14ae

Merge remote-tracking branch 'origin/staging' into link-live-data

b54c227

qs

144885c

qs

f484151

mgoworko requested a review from arkadius June 13, 2025 08:03

arkadius reviewed Jun 17, 2025

View reviewed changes


		class LiveDataEntity(tag: Tag) extends TableWithSchema[LiveDataEntityData](tag, "live_data") {

		def scenarioId: Rep[ProcessId] = column[ProcessId]("scenario_id")

		@@ -79,11 +60,8 @@ class FlinkMiniClusterScenarioJobRunner(

		override def liveDataPreviewSupport: LiveDataPreviewSupport = {

[NU-2154] Standalone Flink live data #8231

Are you sure you want to change the base?

[NU-2154] Standalone Flink live data #8231

Uh oh!

Conversation

mgoworko commented Jun 12, 2025

Describe your changes

Checklist before merge

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arkadius Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arkadius Jun 17, 2025 •

edited

Loading