-
Notifications
You must be signed in to change notification settings - Fork 1
alghimo/flink-scala-cassandra-kafka-poc
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# Sample Flink Application using Scala, Kafka and Cassandra. The app performs real-time scoring of transactions based on historical data. It's made of two jobs: ## Scoring Job: The high level process is as follows: - Reads incoming transactions from Kafka. - Enriches the transactions with calculated fields (for instance, extracts the country and the bank from the account) - Enriches the transactions with historical data from Cassandra - Calculates a score which represents the risk for the transaction, 1.0 being "maximum risk" and 0.0 "no risk" - Write the scored transaction to another Kafka topic The incoming transactions in Kafka are expected to be in Json format, and contain the following fields: - id: ID of the transaction - srcAccount: Source account - dstAccount: Destination account - amount: Amoun of the transaction The generated transactions have the same schema, plus: - score: Calculated score for the transaction. It's just based on z-scores - timeToScoreMillis: Total latency (in milliseconds), since the transaction got into the system until it's ready to be written in Kafka. ## Update History Job This second job will take the scored transactions, and if they are not too risky, will update the historical data we have in Cassandra. The process is: - Read the Scored transactions topic - Filter out transactions with a score greater than 0.8 - Update the Cassandra tables. ## Cassandra Tables The three Cassandra tables are quite simple, they just hold the necessary data to have the average and standard deviation, in order to calculate the z-scores. These are the queries to create the keyspace and the tables: ``` CREATE KEYSPACE fraudpoc WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 2}; CREATE TABLE AccountStats ( account ASCII PRIMARY KEY, sumAmounts DOUBLE, sumAmountsSqr DOUBLE, numTransacs BIGINT ); CREATE TABLE AccountToCountryTransactions ( srcAccount ASCII, dstCountry ASCII, sumAmounts DOUBLE, sumAmountsSqr DOUBLE, numTransacs BIGINT, PRIMARY KEY (srcAccount, dstCountry) ); CREATE TABLE AccountToAccountTransactions ( srcAccount ASCII, dstAccount ASCII, sumAmounts DOUBLE, sumAmountsSqr DOUBLE, numTransacs BIGINT, PRIMARY KEY (srcAccount, dstAccount) ); ```
About
POC using Flink, Scala, Cassandra and Kafka to score transactions in real time
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published