Sandbox for JPA POCs
- Maven docker plugin - https://github.com/fabric8io/docker-maven-plugin
- Logback config - https://javatechonline.com/logback-xml-configuration-examples/
- Prepare tomcat for production - https://contabo.com/blog/apache-tomcat-complete-guide-to-setup-and-configuration/
- Regex Cheatsheet https://www.rexegg.com/regex-quickstart.php
- JPA Query Methods https://docs.spring.io/spring-data/jpa/reference/jpa/query-methods.html
- https://jwt.io/
- https://www.baeldung.com/kafka-consumer-offset
- Gradle spring boot multimodule https://tmsvr.com/gradle-multi-module-build/
- jdk-21, docker, docker-compose
- Run docker-compose -f ./docker/dbs-docker-compose.yaml up -d
- mvnw -f ./spring-jpa-demo-parent/pom.xml clean install -DskipTests
- To test a module like demo1-liquibase- run manually src/main/scripts/mysqlusers.sql
- run tests from that module
 
- run manually 
docker exec -it postgres_demo bash
docker exec -it mysql_demo bash
mysql -h localhost -P 3306 --protocol=tcp -u root -p'qwerty' -D demo_db
psql -h localhost -p 5432 -U postgres -d demo_db
# -W - to input the password
psql -h 192.168.0.97 -p 5432 -U superadmin -d postgres -W
psql -h localhost -p 5432 -U postgres -d postgres -W- For IntelliJ add Adapter for Eclipse Code Formatter
- Add eclipse formatter config
- Add import order file
# generate child module
# for webapp use: org.apache.maven.archetypes:maven-archetype-webapp
mvnw -f ./pom.xml archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4 -DgroupId=com.ipostu.service.module -DartifactId=service-module
mvnw -f ./spring-jpa-demo-parent/pom.xml clean install -DskipTests
mvnw -f ./spring-jpa-demo-parent/demo16-springboot-docker-layers/pom.xml clean package -DskipTests
# dependency tree
projectName=demo-spring4-war-annotation && mvnw -f ./$projectName/pom.xml dependency:tree# on VM
nix-shell -p jdk17
nix-shell -p postgresql_16
# host machine
projectName=demo-spring11-jar-boot \
  && mvnw -f ./$projectName/pom.xml clean package \
  && mkdir -p ~/VMSharedD12/_depl \
  && cp -r ./$projectName/target/$projectName ~/VMSharedD12/_depl/$projectName \
  && touch ~/VMSharedD12/_depl/signal1.txt
#############################################
# bash script file pool-artifact.sh
#!/bin/bash
export CATALINA_HOME=/home/test/apache-tomcat-10.1.35
SIGNAL_FILE=~/Desktop/sharedWithHost/_depl/signal1.txt
deploy_war_files() {
  echo "Running: catalina.sh stop"
  $CATALINA_HOME/bin/catalina.sh stop
  echo "Cleaning: $CATALINA_HOME/webapps/*"
  rm -rf $CATALINA_HOME/webapps/*
  echo "Copying: ~/Desktop/sharedWithHost/_depl to $CATALINA_HOME/webapps"
  cp -r ~/Desktop/sharedWithHost/_depl/* $CATALINA_HOME/webapps
  echo "Running: catalina.sh start"
  $CATALINA_HOME/bin/catalina.sh start
}
while true; do
  if [ -f $SIGNAL_FILE ]; then
    rm $SIGNAL_FILE
    deploy_war_files
    rm -rf ~/Desktop/sharedWithHost/_depl
  fi
  sleep 0.5
done- 
Maven surefire plugin doesn't see non-public test classes as well as non-public test class methods 
- 
logback.xml - if ${catalina.home}is undefined, it will createcatalina.home_IS_UNDEFINEDin the path where the program is run
 
- if 
- 
CRUD - GET /posts
- POST /posts
- GET /posts/new
- GET /posts/:id/edit
- GET /posts/:id
- PATCH /posts/:id
- DELETE /posts/:id
 
- 
REST (representational state transfer) - GET /posts
- GET /posts/:id
- POST /posts
- PATCH /posts/:id
- DELETE /posts/:id
 
- 
The <form />in html only supports: GET and POST, but in order to do all kind of requests spring usesHiddenHttpMethodFilter
- 
Configure postgres on VM with Debian 12 with nix-shell - 
nix-shell -p postgresql_16
- 
initdb -D ./pgdata
- 
sudo chown test /run/postgresql
- 
sudo chown <user> /run/postgresql
- 
pg_ctl -D ./pgdata -l logfile start
- 
psql -d postgresconnect via postgres user
- 
CREATE ROLE superadmin WITH SUPERUSER CREATEDB CREATEROLE LOGIN PASSWORD 'qwerty';
- 
psql -U superadmin -d postgres -W- pg_hba.conf
 # IPv4 local connections host all all 0.0.0.0/0 scram-sha-256 # IPv6 local connections host all all ::/0 scram-sha-256 
- 
pgdata/postgresql.conf->password_encryption = scram-sha-256
- 
Enable remote TCP connections, edit pgdata/postgresql.conf- fromlisten_addresses='localhost'tolisten_addresses='*'
- 
pg_ctl -D ./pgdata -l logfile restart
- 
restart postgres process - sudo pkill -e postgre
- sudo chown -R test: /run/postgresql
- pg_ctl -D ./pgdata -l logfile restart
 
 
- 
- 
Virtualbox VM - Networking: NAT -> Bridged Adapter - expose VM to network
- Run on guest OS ip addrand use this ip in host OS
 
- 
DriverManagerDataSourcespring's implementation ofDataSourcewhich creates a new connection whengetConnectionis called
- 
jdbcTemplate.batchUpdate1000x more optimized that inserting/updating records in loop
- 
In postres (and any other RDMS) SELECT * FROM ...without explicit order return records by order of insertion BUTUPDATEquery breaks this order, so the only one guarantee of order is with explicitORDER BY
- 
There is a way to define custom jakarta.validationannotations with custom logic, the impl. is used automatically, client code should only use annotation- To use spring beans, it is necessary to use a different approach, a custom bean which implements org.springframework.validation.Validator, the bean should be used manually (no annotations)
 
- To use spring beans, it is necessary to use a different approach, a custom bean which implements 
- 
Thymeleaf template engine by default doesn't support UTF-8, it is needed to runnew ThymeleafViewResolver().setCharacterEncoding("UTF-8")
- 
To use hibernate, add jdbc driver, hibernate-core and hibernate.properties
- 
Hibernate usage: create hibernate Configurationinstance (internally readshibernate.properties), createsessionFactory(expensive op.), create session,session.beginTransaction(); op. session.getTransaction().commit();
- 
Hibernate caches items inside the transaction 
- 
It is required in case of OneToMany to set item on both sides in order to avoid cache inconsistencies, e.g. Person -> List, if a new item is created, person.getItems().add(item); item.setPerson(person);
- 
Hibernate life-cycles: - Transient - entity type object created by client code, becomes Persistent after session.save(entityObject)
- Persistent (Managed) - session.getis Persistent, in other words if client code calls setters, then hibernate will notice it and will execute related sql onsession.getTransaction().commit()
- Detached - session.detachor whensessionis closed, can be linked to Persistent Context throughsession.merge(...)
- Removed - entity object after session.remove()
 
- Transient - entity type object created by client code, becomes Persistent after 
- 
Owning side (entity) is an entity which doesn't have @One|ManyToOne|Many(mapedBy="..."), in other words it is the opposite entity 
- 
fetch = FetchType.LAZY|EAGER- OneToMany and ManyToMany by default are Lazy
- OneToOne and ManyToOne by default is Eager
 
- 
Hibernate.initialize(person.getItems())is used to load lazy entities, after this call we can useperson.getItems()in detachedperson
- 
Persistence context = Level 1 cache 
- 
Hibernate core(main hibernate dependency) by default provides and implement javax.persistence.*renamed tojakarta.persistence.*which is JPA specification
- 
N+1 problem // 1 Query List<Person> people = session.createQuery("SELECT p FROM Person p", Person.class); // N Queries for(Person person : people) { LOG.info("Person {} has: {}", person.getName(), person.getItems()); } // Result: // 1 time: SELECT * FROM person; // N times: SELECT * FROM items WHERE person_id=? - 
FetchType.EAGERdo not help!
- 
Solution 1 - HQL with LEFT JOIN FETCH List<Person> people = session.createQuery("SELECT p FROM Person p LEFT JOIN FETCH p.items", Person.class); 
 
- 
- 
Difference between session.getandsession.load- session.getdoes- SELECT ...on call
- session.loadreturns a proxy object with only one populated field,- id, if client code calls any- getmethod on the column field, then the proxy will do- SELECT ...
- usage of session.load
 Item item = new Item("..."); Person personProxy = session.load(id, Person.class); item.setOwner(personProxy); session.save(item); 
- 
Spring Data JPA exposes an API for paging and sorting, e.g.: peopleRepository.findAll(PageRequest.of(0, 20, Sort.by(List.of(new Sort.Order(Sort.Direction.DESC, "name"), new Sort.Order(Sort.Direction.DESC, "age"))))); 
- 
@jakarta.persistence.Transienttells hibernate to ignore this field
- 
Build and run spring-boot app: - projectName=demo-spring11-jar-boot && mvnw -f ./$projectName/pom.xml clean package && java -jar ./demo-spring11-jar-boot/target/demo-spring11-jar-boot-1.0-SNAPSHOT.jar
- projectName=demo-spring11-jar-boot && mvnw -f ./$projectName/pom.xml spring-boot:run
 
- 
Spring Security - general - client code should implement AuthenticationProvider- it takes as an input Authentication+Credentials
- returns as an output Authentication+Principal
- Cookie/Sessions == Map<String, Object>
- After successful login, Principalis stored in session and linked to cookie which is returned to user
- Spring security provides a filter which gets by cookie principal object and keeps it in thread local for the current request
 
- it takes as an input 
 
- client code should implement 
- 
Spring Boot Security Starter - By default all mappings are secured, default username is user, the password is shown in cli:Using generated security password: d7d4e2e1-7caf-467c-a54b-4fe48ffc30c9
 
- By default all mappings are secured, default username is 
- 
Spring Security @Component public class AuthProviderImpl implements AuthenticationProvider - AuthenticationProviderisn't required, spring provides its own one, the client code should only define new one if an additional logic is needed
 
- 
CSRF - Cross Site Request Forgery - Protection for endpoints that change data, such as PUT/POST/DELETE
- Backend generates one time token which is injected into the html and is hidden for user
- On request this token is also added into the body of request and backend verifies it
 
- 
Authorization in pring security - Authority - allowed action, e.g. BAN_USER,WITHDRAW_MONEY,SEND_MONEY
- Role ADMIN,USER
 
- Authority - allowed action, e.g. 
- 
.hasRole("ADMIN")is exactly same as.hasAuthority("ROLE_ADMIN")
- 
@Controllercan return json response only if method is annotated with@ResponseBody, by default it tries to resolve view- @RestControllerdoesn't require- @ResponseBody, it returns JSON by default
- JWT JsonWebToken
- Header, Payload, Signature
 
 
- Bean pre|post processor
- AOP xml|java
- Sql stored procedure with transaction & CallableStatement
- Spring web + JaxRS (Jersey)
- grpc
- JPA: column based on Enumordinal|string|custom value (Converter usage)
- Embedded tomcat
- Accessible through: http://localhost:8080/springapp/app
- Context initialization via implementing WebApplicationInitializer
- Liquibase + Spring Boot
- Liquibase is triggered on spring boot startup
- Flyway (default config) in combination with spring boot
- Different kind of keys (autoincrement, UUID string, UUIDD binary)
- natural vs. surrogate key
- composite keys
JDBC usage + flyway config
JDBC Template
JPA EntityManagerFactory/EntityManager and TypedQuery usage
JPA EntityManagerFactory/EntityManager and TypedQuery usage
+ @NamedQuery (query on entity class annotation)
JPA
Pagination (Imperative JPA (EntityManager & TypedQuery), JDBC Template, JpaRepository)
- 
Hibernate mappings OneToMany, ManyToMany, @Embedded/@Embeddable, @Version for optimistic lock 
- 
Before main run testDataLoader
- 
N+1 - @Fetch(FetchMode.SUBSELECT)does left join@OneToMany(mappedBy = "orderHeader", cascade = {CascadeType.PERSIST, CascadeType.REMOVE}, fetch = FetchType.EAGER) @Fetch(FetchMode.SUBSELECT) private Set<OrderLine> orderLines; 
- q
- Use case: on write(insert/update) data should be encrypted, on read(select) decrypted
- Hibernate Interceptor - write works, read has a bug for EmptyInterceptor.onLoad, the state arg is always null. Details https://discourse.hibernate.org/t/hibernate-v6-preloadevent-getstate-always-is-null-issue/7342- One insert and one update as one transaction
 
- Hibernate Listener - works, test is also working, precondition is to turn off the InterceptorRegistrationconfig bean- One insert and one update as one transaction
 
- JPA Callback - works, test is also working, precondition is to turn off the InterceptorRegistration,ListenerRegistrationconfig beans- One insert
 
- JPA Converter - works, test is also working, best option!
- One insert
 
 
- Hibernate Interceptor - write works, read has a bug for 
GET http://localhost:8080/api/v1/beer
GET http://localhost:8080/api/v1/beer/search/findAllByBeerStyle?beerStyle=ALE
GET http://localhost:8080/api/v1/beer/a5d1917a-de91-4172-aac8-a3399883d2b2
# Create
POST http://localhost:8080/api/v1/beer
{
  "beerName": "Mango Bobs - 990",
  "beerStyle": "ALE",
  "upc": "0631234200036",
  "quantityOnHand": 4344,
  "price": 10.26
}
GET http://localhost:8080/api/v1/beer/578b5f18-0a64-4a6c-a056-8d0d22e9d8ab
# Update
PUT http://localhost:8080/api/v1/beer/578b5f18-0a64-4a6c-a056-8d0d22e9d8ab
{
  "beerName": "Mango Bobs - AAA",
  "beerStyle": "ALE",
  "upc": "0631234200036",
  "quantityOnHand": 4344,
  "price": 10.26
}
# Delete
DELETE http://localhost:8080/api/v1/beer/578b5f18-0a64-4a6c-a056-8d0d22e9d8abSimple image
FROM openjdk:21
ENV JAVA_OPTS " -Xms512m -Xmx512m -Djava.security.egd=file:/dev/./urandom"
WORKDIR application
COPY ./spring-jpa-demo-parent/demo16-springboot-docker-layers/target/demo16-springboot-docker-layers-1.0-SNAPSHOT.jar ./
ENTRYPOINT ["java", "-jar", "demo16-springboot-docker-layers-1.0-SNAPSHOT.jar"]Layered spring boot image
FROM openjdk:21 AS builder
WORKDIR application
ADD ./spring-jpa-demo-parent/demo16-springboot-docker-layers/target/demo16-springboot-docker-layers-1.0-SNAPSHOT.jar ./
RUN java -Djarmode=layertools -jar demo16-springboot-docker-layers-1.0-SNAPSHOT.jar extract
FROM openjdk:21
WORKDIR application
COPY --from=builder application/dependencies/ ./
COPY --from=builder application/spring-boot-loader/ ./
COPY --from=builder application/snapshot-dependencies/ ./
COPY --from=builder application/application/ ./
ENTRYPOINT ["java", "-Djava.security.egd=file:/dev/./urandom", "org.springframework.boot.loader.JarLauncher"]Usage
# build regular image with fat jar
docker build  -f ./spring-jpa-demo-parent/demo16-springboot-docker-layers/src/main/dockerBase/Dockerfile -t demo16 .
# run container
docker run demo16
# build layered spring boot application
docker build  -f ./spring-jpa-demo-parent/demo16-springboot-docker-layers/src/main/docker/Dockerfile -t demo16 .
# run container
docker run demo16JDBC Template with transaction handling
public interface AccountService {
    //    @Transactional
    void updateAccountFunds();
    //    @Transactional(propagation = Propagation.NESTED)
    void addFundsToAccount(int amount);
}
@SpringBootApplication
@ImportResource("classpath:application-context.xml")
public class SpringTransactionDemo {
    public static void main(String[] args) {
        SpringApplication.run(SpringTransactionDemo.class, args);
    }
}public interface AccountService {
       @Transactional
    void updateAccountFunds();
       @Transactional(propagation = Propagation.NESTED)
    void addFundsToAccount(int amount);
}
@SpringBootApplication
//@ImportResource("classpath:application-context.xml")
public class SpringTransactionDemo {
    public static void main(String[] args) {
        SpringApplication.run(SpringTransactionDemo.class, args);
    }
}Event streaming platform Main capabilities:
- Publish(Produce)/Subscribe(Consume) to streams of events, including continuous import/export of data from another system
- To store streams of events
- To process streams of events as they occur or retrospectively
- Kafka Cluster - set of brokers which consume/produce events, if one broker is down, the job is taken by another one
- Kafka Topic - FIFO structure to which event is assigned, it has a name
- Partition - Storage Room which is a part of a topic
- Each partition will save incoming message in the same order according to the sequence in which they come
- Partition is a way to achieve parralelism
- Messages are stored in order for each partition
- Order across partition is not guaranteed
- Each message has offset started from zero
 
- Offset - a number assigned to each partition which references to an event, if event is processed the offset is incremented
- Each consumer in the group maintains a specific offset
 
 
- Each partition will save incoming message in the same order according to the sequence in which they come
 
- Partition - Storage Room which is a part of a topic
- Retention Period - messages are stored on the hard drive for a certain amount of time
- Default is 7 days (configurable)
 
- Message is immutable
- Define partition when creates topic
- Can add partition later
- Can't delete partition later (delete partition = delete data)
- Producer sends message to Kafka
- Sends to a specific topic + message content
- Kafka will automatically select partition (partition can be selected programatically if needed)
- Kafka Producer Key - a value for which the hash is calculated and selected a partition, if a new partition is added, the message can go to another partition
- Consumer guaranteed to read data in order for each partition
- Order is incrementing by offset (low offset to high), cannot reverse
- Each partition maximum one consumer per consumer group
- If there are 2 partitions and 3 consumers, 1 consumer will be idle
 
- One consumer can read from more than one partition
- Consumer group is a set of consumers which can consume from the same topic/partition same events but to process them differently (Each consumer in the group maintains a specific offset)
- Consumer groups (for the same topic/partition) process events in parallel
- Consumer Offset
- Kafka Saves this value
- Unique for each partition
- Checkpoint that indicates last read
 
- Consumer chose when to saving(commit) offset
- At most once
- Zero (unprocessed) or once (processed)
- Possible message loss (on error process)
 
- At least once
- Once or more
- Something wrong, able to re-process
- Create idempotent consumer
 
- Exactly once
- Once only
- Hard to implement
- Can be achieved through:
- Idempotent producer
- Atomic transaction
- spring.kafka.producer.properties.enable.idempotence: true
- spring.kafka.streams.properties.processing.guarantee: exactly_once_v2
 
 
- Kafka designed for high availability
- Kafka cluster: group of kafka brokers (servers)
- Zookeper manages those brokers
- Add, Remove(if broker dies), Sync data
 
- Spring for Apache Kafka (starter)
- spring.kafka.auto-offset-reset=latest(default) - ignore old messages, even if they weren't processed
- spring.kafka.auto-offset-reset=earliest- continue from the last commited offset of the topic
- spring.kafka.auto-offset-reset=none- throws exception if there is no offset or if offset is out of range (ensures that consumer does not consumes any message if offset is invalid)
- sudo nix-channel --update
- sudo nixos-rebuild switch --upgrade
- docker-compose -f ./spring-kafka-scripts/docker-compose-core.yml -p core up -d --force-recreate
- sudo chmod -R g+rwX ./spring-kafka-scripts/data/
- docker-compose -p core down
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-hello --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --list
- kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic t-hello
- kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic t-hello
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-hello --from-beginning
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-fixedrate --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-fixedrate --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-multi-partitions --partitions 3
- kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic t-multi-partitions
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-multi-partitions --offset earliest --partition 0
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-multi-partitions --offset earliest --partition 1
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-multi-partitions --offset earliest --partition 2
- kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic t-multi-partitions --partitions 4
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-employee --partitions 1
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-employee --offset earliest --partition 0
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-commodity --partitions 1
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-commodity --offset earliest --partition 0
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-counter --partitions 1
- kafka-console-consumer.sh --from-beginning --bootstrap-server localhost:9092 --property print.key=false --property print.value=false --topic t-counter --timeout-ms 5000 | tail -n 10 | grep "Processed a total of"- show total topic messages
- kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group counter-group-fast --describe
- kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group counter-group-slow --describe
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-alpha --partitions 4 --replication-factor 2
- kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic t-beta --replication-factor 3
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-alpha --partitions 3 --config min.insync.replicas=2
- kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic t-beta --add-config min.insync.replicas=3
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-rebalance-alpha --partitions 2
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-rebalance-beta --partitions 2
- kafka-topics.sh --bootstrap-server localhost:9092 --alter --topic t-rebalance-alpha --partitions 5
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-location --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-purchase-request --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-payment-request --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-payment-request2 --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-food-order --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-simple-number --partitions 1
- kafka-topics.sh --bootstrap-server localhost:9092 --list | grep -E 'food|number'
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-image --partitions 2
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-invoice --partitions 2
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-invoice-dead --partitions 2
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-invoice --offset earliest --partition 0
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-invoice --offset earliest --partition 1
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-image-2 --partitions 2
- kafka-topics.sh --bootstrap-server localhost:9092 --create --topic t-general-ledger --partitions 1
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-general-ledger --from-beginning
- kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-promotion
- kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-promotion-uppercase
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-commodity-promotion --from-beginning --timeout-ms 5000
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-commodity-promotion-uppercase --from-beginning --property print.key=true --timeout-ms 5000
- kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-pattern-six-plastic
- kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-pattern-six-notplastic
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-commodity-pattern-six-plastic --from-beginning --property print.key=true --timeout-ms 5000
- kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic t-commodity-feedback
- kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic t-commodity-feedback --from-beginning --property print.key=true --timeout-ms 5000
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-feedback-six-bad
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-feedback-six-bad-count
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-feedback-six-bad-count-word
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-feedback-six-good
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-feedback-six-good-count
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-feedback-six-good-count-word
# ---
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer --timeout-ms 5000 --topic t-commodity-feedback-six-bad-count
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer --timeout-ms 5000 --topic t-commodity-feedback-six-bad-count-word
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-customer-preference-wishlist
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-customer-preference-shopping-cart
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-customer-preference-all
#
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.StringDeserializer --timeout-ms 5000 --topic t-commodity-customer-preference-wishlist
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.StringDeserializer --timeout-ms 5000 --topic t-commodity-customer-preference-shopping-cart
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.StringDeserializer --timeout-ms 5000 --topic t-commodity-customer-preference-allkafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-flashsale-vote
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.StringDeserializer --timeout-ms 5000 --topic t-commodity-flashsale-vote
// --
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-flashsale-vote-user-item
kafka-topics.sh --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic t-commodity-flashsale-vote-two-result
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.StringDeserializer --timeout-ms 5000 --topic t-commodity-flashsale-vote-user-item
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning  --property print.key=true --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer --timeout-ms 5000 --topic t-commodity-flashsale-vote-two-resultkafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --property print.key=true --timeout-ms 5000 --topic t-commodity-feedback
kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --property print.key=true --timeout-ms 5000 --topic t-commodity-feedback-rating-onekafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --property print.key=true --timeout-ms 5000 --topic t-commodity-feedback-rating-twokafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --property print.key=true --property print.timestamp=true --timeout-ms 5000 --topic t-commodity-feedback-rating-twoBoth are excelent solutions for messaging systems
- 
Message Retention - Kafka stores all messages on disk even if they were processed, default TTL is 7 days
- RabbitMQ stores on disk or memory and as soon as the message is consumed, it is gone
- If we need to re-process the same message, the publisher must re-publish the message
 
 
- 
Message Routing - Kafka - no routing mechanism
- Publisher tells topic and optionally partition
 
- RabbitMQ - routing mechanism using exchange
- Publisher attach routing key and RabbitMQ decides into what queue will it go
 
 
- Kafka - no routing mechanism
- 
Multiple Consumers - Kafka: topic - partition - one consumer per partition
- RabbitMQ: multiple conumer per queue, not guaranteed order
 
- 
Push/Pull Model - Kafka: consumer pull from kafka topic
- Rabbitmq: push message to consumer
 
- 
Which one to chose? - Kafka for event sourcing, big data processing, sensitive data handling
- RabbitMQ event driven architecture
 
- Commodity Trading Company with multiple branches
- Branch submit purchase order to head office
- After process
- Pattern analysis
- Reward
- Big Data Storage
 
- Speed is the key
- Branch office submit each order once
- HEad office process using kafka
Hexagonal Architecture, also known as Ports and Adapters, is a software design pattern that promotes separation of concerns by structuring an application around a core business logic (the hexagon) and defining ports (interfaces) that connect to the outside world via adapters.
- 
Inbound Adapter - REST Controllers 
- 
Outbound Adapter - DB, External Endpoints 
- 
Benifit: decouple business logic with data access 
- 
Testability 
Separated writes from reads
- Command - write
- Query - read
- Set enable.idempotency=true
- Kafka will use this configuration,
- producer sends a message
- kafka persists and commits it
- kafka return akcnowledged
- due to networking issue, result isn't got from sender
- sender retries message transmision
- kafka does not persist the second time, but it returns akcnowledged
 
- Header: key value pairs
- Body/Payload: usually JSON
- Method of communication among microservices, when M1 sends E which is consumed by M2, M2 sends E(confirm) that is confirmation of success/failure which is consumed by M1
- Ideally number of consumers should be equal to the number of partitions, in case of 3 partitions and 4 consumers, one consumer will be idle, but there is an option to add a new partition which will automatically be assigned by kafka to a consumer
- 2 or more @KafkaListeners assigned to the same topic but with differentgroupIdwill process events isolately
- Delete topic = delete all partitions
- Can not delete partition
- Can cause data loss
- Wrong key distribution
 
- The only way to decrease partitions is to delete & create a new topic
- Consumer expected to read sequentially
- Kafka default retention period is 7 days
- Offset is invalid since data no longer exists after 7 days
 
- Set retention period using broker config offset.retention.minutes
- Difference between latest message vs. message consumer group has processed
- Monitor and identify potential bottlenecks, to improve the situation:
- Increase number of partitions and consumers
- Improve performance for consumers
 
- Lag = IndexOfTheLatestEvent - Offset
- Reprocess previously consumed messages
- Use cases:
- Correcting errors
- Data recovery
- Historical analisys
- Testing
 
- true - auto commit according to broker config
- false set manually
Spring Boot 3 by default uses false. It commits only when offset reached the last message in the topic, a.k.a lag is 0
spring.kafka.listener.ack-mode
- record- commit after a record is processed
- batchprocess N=- spring.kafka.consumer.max-poll-recordsand commit
- time
- count
- count_time
- manual
- manual_immediate
On startup spring boot shows kafka configuration
2025-03-08 14:48:23 [main] INFO  o.a.k.c.consumer.ConsumerConfig - ConsumerConfig values: 
 ...
 metrics.num.samples = 2
 metrics.recording.level = INFO
 metrics.sample.window.ms = 30000
In order to change metrics.num.samples set in application.properties spring.kafka.properties.metrics.num.samples or via KafkaConfig configuration bean
- Kafka for data communication backbone
- Single kafka broker for dev.
- At least 3 kafka brokers for prod.
- Kafka cluster: multiple brokers work as unity
- 
Copy data from one broker to another 
- 
Data redundancy benifit 
- 
Increase replication factor (1,2,...) - means how many copies of the same data will be done 
- 
If a broker is down, other broker will still have the copy of data 
- 
Partition leader - partiton which is used by producer (producer writes to it), and listener reads from it 
- 
ISR - InSyncReplica 
Replication factor can be set globally or when the topic is created
Consumer consumes events from the nearest replica (geographically) to reduce latency and networking costs
- Kafka 2.4 or newer
- Broker setting
- rack.id = XYZ
- replica.selector.class
 
- Consumer setting
- client.rack=XYZ
 
- Producer can choose to receive acknowlengement (akc) of data writes
- ack 0- producer won't wait for acknowledgement (possible data loss), consumer doesn't wait for success status, it is automatically considered successful
- ack 1(default) - producer will wait for leader acknowledgement (limited data loss), the only risk is error on the phase of replication
- ack -1 / all- producer will wait for leader + replicas acknowledgement (no data loss)
 
- if min.insync.replicas=2andreplication factor = 3
- if 2 replicas are down
- producer tries to write data
- it fails with exception Not enough replica
- Rebalancing - redistributing partitions across consumers
- Triggers:
- Consumer joining or leaving particular consumer group
- Partition count on topic changes
 
- Impact
- Temporary unavailability
- Resource overhead
 
2 consumers, 3 partitions. Add a new consumer. All relationships between consumers and partitions are recreated. For a short period of time events aren't consumed at all
- group-aflpaC1, C2, C3
- t-aflpap0, p1
- t-betap0, p1
- C1 - t-aflpa p0, t-beta p0
- C2 - t-aflpa p1, t-beta p1
- group-aflpaC1, C2, C3
- t-aflpap0, p1
- t-betap0, p1
- C1 - t-alfa p0
- C2 - t-alfa p1
- C3 - t-beta p0
- C1 - t-beta p1
If consumer is removed, (all assignments are revoked - no consume process during this period), relationships between consumers and partitions are re-created
- C1 - t-alfa p0
- C2 - t-alfa p1
- C3 - t-beta p0
- C1 - t-beta p1
C2 is removed
- C1 - t-alfa p0
- C1 - t-beta p0
- C3 - t-alfa p1
- C3 - t-beta p1
Similar to round robin, the only difference is it minimizes partitions movements when consumer is removed
- group-aflpaC1, C2, C3
- t-aflpap0, p1
- t-betap0, p1
- C1 - t-alfa p0
- C2 - t-alfa p1
- C3 - t-beta p0
- C1 - t-beta p1
C2 is removed
- C1 - t-alfa p0
- C3 - t-alfa p1
- C3 - t-beta p0
- C1 - t-beta p1
Cooperative Sticky
C1->P1, C2->P2,P3. Add C3. Rebalance. C1->P1, C2->P2, C3 -> P3
Consumers keep consuming during rebalance
Kafka clients (producers and consumers) will cache metadata about topics and brokers before they check for fresh metadata
- Match criteria: processed
- Not match criteria: ignored, but present in the topic
- Filter for each listener
- RecordFilterStrategyif- falsethen message is consumed, otherwise not!
A mechanism which guarante that even if a message is consumed 2 or more times, it will be processed only once
- Exception goes first to the explicit handler (if present)
- @KafkaListener(topics = "t-food-order", errorHandler = "myFoodOrderErrorHandler")
 
- Whether there is no explicit handler or exceptions is re-thrown, it is handled by global handler
- Topic: t-image, 2 partitions
- Publish message represents image
- Consumer:
- Simulate API call to convert image
- Simulate failed API call
- Retry after 10 seconds
- Retry 3 times
 
- M1 -> M2 -> M3
If M2 fails, it will be retried up to N times. M3 will only be handled after M2 has either succeeded or reached the retry limit.
Example image consumer/froducer
- I1, I2(problematic), I3 -> topic 0
- I4, I5, I6 -> topic 1
I2 is retried 3 times, and after that I3 is processed
- M1 -> M2 -> M3
If M2 fails, M3 will be handled immediately.
See: Invoice Demo consumer/producer + container factory for consumer
- Combination of retry mechanism + producer which moves problematic event after N retries into a specific topic
Default impl. DeadLetterPublishingRecoverer moves message with error from t-original -> t-original.DLT
- Publish to t-invoice
- If amount is less than 1 throw an exception
- Retry 5 times
- After 5 failed retry attempts, publish to t-invoice-dead
- Another consumer will consume from t-invoice-dead
Payment App (Android/iOS,web) -> Payment App API -> publish -> Kafka -> consume -> General Ledger App Server
- 24/7 Payment App (Android/iOS,web)
- 24/7 Payment App API
- 12/7 General Ledger App Server
Deserialization error if full class name is different, e.g. producer uses com.abc1.OrderMessage and consumer has a class with exactly same signature, but with difference package name com.abc2.OrderMessage
Solution: shared module with classes for messages
- Configurable in producer or broker
- Available compression types: none, gzip, lz4, snappy, zstd
- Disadvantages: CPU Cycles on consumer/producer
- spring.kafka.producer.compression-type
- linger.mshow long to send a batch- defaut: 0 means immediately send message
 
- batch.sizemaximum batch size (bytes) before start sending (if linger has not been elapsed)
- Maximum byte size for each batch
- default: 16384 (16KB)
- 64KB could be better
 
 
- default: 16384 (16KB)
- Increase batch size leads to eficiency
- Message bigger than batch size won't be batched
- Set batch size to 0 turn off batching
- spring.kafka.producer-batch-size
spring:
  kafka:
    compression-type: snappy
    batch-size: 32768
    properties:
      request:
        timeout:
          ms: 10000
      delivery:
        timeout:
          ms: 30000
      enable:
        idempotence: true
      linger:
        ms: 15000Generate 3 events with delay of 1 second
2025-03-10 23:43:55 [kafka-producer-network-thread | producer-1] INFO  c.i.k.producer.OrderProducer - Order 2UNAG8PP, item item1 published succesfully
2025-03-10 23:43:55 [kafka-producer-network-thread | producer-1] INFO  c.i.k.producer.OrderProducer - Order HB23BI0E, item item1 published succesfully
2025-03-10 23:43:55 [kafka-producer-network-thread | producer-1] INFO  c.i.k.producer.OrderProducer - Order MAGYCCRL, item item1 published succesfully
Each partition is divided into segments, and the active segment is the last segment to which writes occur.
Partition 0: seg1(0..X events) -> seg2(X..X2) -> seg3(X2..last)
Each segment is a file on the disk
- log.segment.bytesmaximum size of segments in bytes, default 1GB, closed when the size limit is reached
- log.segment.mshow long before a segment is closed, default 1 week- Closed when exceed one week (albeit size is less than configured segment bytes), new segment created
 
Kafka log is series of segments
- Apped only data sequence stored as segments
- All published data stored in kafka log
- Retain messages (records) for certain period
- Stored at disk, size can grow
- Immutable
- High throughput & fault tolerance
- 
Managing data & retention size 
- 
Removing outdated / unnecessary data on predefined policies 
- 
Policies: - delete (remove data)
- default policy for user topics
- data will be deleted based on log retention period
- log.retention.period- log.retention.minutes
- log.retention.ms
 
- default (168 hours - 1 week)
 
- data can also be deleted by the maximum segment size
- log.retention.bytes
- default -1 (infinite)
 
 
- compact
- (retain only latest value per key in partition)
- default policy for kafka internal topic
- When to use? Chat app where N latest messages are used for user notification
- Discard redundant records and maintain compact data representation
- min.compaction.lag.msdefault 0, how long to wait before the message can be compacted
- delete.retention.msdefault (24hours) how long to wait before compacted data is deleted
- min.clenable.dirty.ratiodefault 0.5, higher means less frequent cleaning, lower means more frequent clean
- compaction frequency is affected by traffic, segment.msandsegment.bytes- log compaction precess will open segment file
- risk to cause "too many open files" if too many segments compacted
 
 
 
- delete (remove data)
- 
  
- Kafka Stream - client library for building real-time stream processing applications on top of Apache Kafka
- It process data in real time as it flows through kafka topics
- Steam Processing - processing of data that continuously flows in as a stream
- Data is processed in batches
- Latency between data arrives and is processed
 
Example of usage: Steps Counter App
- Mobile device counters steps and propagates that information through web interface to a specific topic
- Once in N period of time that data is collected and using it are calculated statictics per user (t_user_<userId>) and general statisticst_general
Example of usage: Marathon App + Dashboard
Example of usage: Stock Trading
Example of usage: Credit Card Fraud (if in one minute from the same card there are 2 transactions from completely different locations)
- 
Source Processor - Consumes data from one or more kafka topics
- Forwarding data to downstream
 
- 
Sink Processor - Does not have a downstream
- Sends data to specific kafka topic
 
- Low Level API
- Kafka Stream DSL (Domain Specific Language)
- Serde means (serializer/deserializer)
- KStream
- Ordered sequence of messages
- Unbounded (does not have ends)
- Insert operation only
 
- KTable
- Unbounded
- Upsert operation (insert or update based on key)
- Delete on null value
 
- Intermediate
- KStream -> KStream
- KTable -> KTable
 
- Terminal
- KStream -> void
- KTable -> void
 
A powerful operation that lets combining records from two streams or tables based on some common key — similar to SQL joins.
State is the ability to connect previous information to the current data, this information is placed in state store (persistent storage, key-value)
- 
State Store is on the same machine with processor, it is not shared through the network 
- 
stream.malValues()- stateless
- 
(deptecated) stream.transformValues()- statefull (has an access to the state store)
- 
(best) stream.processValues()- statefull (has an access to the state store)
A component of kafka ecosystem which allows pulling data from external systems and pushing it into the kafka topic.
Kafka Connect is designed to simplify the process of moving large amounts of data between Kafka and external sources or sinks
- Source connector - a plugin to read (ingest) data into the kafka (producer)
- Sink connector - a plugin for writing from kafka to non-kafka
- General Ledger Accounting Entry - is a financial record that captures a business transaction within an organization's accounting system. It consists of debits and credits that affect different general ledger accounts, ensuring that the accounting equation (Assets = Liabilities + Equity) remains balanced.
Demo based on https://kafka.apache.org/quickstart
- docker build -f spring-kafka-scripts/my-kafka-image.Dockerfile -t my-kafka-image .
- docker run -it my-kafka-image
- bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092
- 
In container with standalone kafka: echo "plugin.path=libs/connect-file-4.0.0.jar" >> config/connect-standalone.properties echo -e "foo\nbar" > test.txt bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties more test.sink.txt foo bar bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic connect-test --from-beginning {"schema":{"type":"string","optional":false},"payload":"foo"} {"schema":{"type":"string","optional":false},"payload":"bar"} echo "Another line" >> test.txt more test.sink.txt foo bar Another line 
- docker-compose -f [script-file] -p [project] up -d
- docker build -f spring-kafka-scripts/my-kafka-image.Dockerfile -t my-kafka-image .
- docker run -it my-kafka-image
- docker run -u root -it my-kafka-image
- Replace string in a file: https://stackoverflow.com/a/27276110
- sed -i '[email protected]=/tmp/[email protected]=/home/app_user/kafka_2.13-4.0.0/data@g' ./config/server.properties
 
- docker-compose -f ./ -p [project] up -d



