Skip to content

PGSch/pgs-pulsar-flink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOP

  1. docker-compose up
  2. ./setup.sh
  3. LookupDataProducer
    • Stop app manually after some time. Something like this should be returned:
    16:49:53.072 INFO [Thread-0] io.ipolyzos.producers.LookupDataProducer - Sent '117178' user records and '2198' item records. 
    • Data is in Pulsar
  4. ./deploy.sh
  5. EnrichmentStream
  6. OrdersDataSource
  7. localhost:8081

WIP:

  • Optimize to backpressure, buffers, checkpoint intervals and wm intervals for larger state
  • User RocksDB API to demonstrate what gets written and how
  • Use time based joins for session windows and add time constraints

Use Case 1

Data Enrichment with Topic Lookups

Use Case 2

Data Aggregation with Time Constraints on Time Windows

Setup a Pulsar Cluster

docker run -rm -it --name pulsar \
-p 6650:6650  -p 8080:8080 \
--mount source=pulsardata,target=/pulsar/data \
--mount source=pulsarconf,target=/pulsar/conf \
apachepulsar/pulsar:2.9.1 \
bin/pulsar standalone

Setup Pulsar Logical Components

Go into your container

docker exec -it pulsar bash

and run the following commands

  1. Create topics
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/orders
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/users
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/items

bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/view_events
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/purchase_events
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/cart_events

bin/pulsar-admin topics list public/default
  1. Set infinite Retention
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/users
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/items

bin/pulsar-admin topics get-retention persistent://public/default/users
bin/pulsar-admin topics get-retention persistent://public/default/items

bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/view_events
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/purchase_events
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/cart_events

Start a Flink Cluster

start-cluster

Deploy the Flink Job

./deploy.sh

Monitor Flink logs

Tail the logs

tail -f log/flink-*-taskexecutor-*

The original Datasets can be found on the following links:

About

Pulsar with Flink SPE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •