Skip to content

Commit 1b83903

Browse files
committed
Add notes on "JVM Anatomy 101" by Nikita Lipsky
1 parent 726a2b1 commit 1b83903

File tree

1 file changed

+309
-0
lines changed

1 file changed

+309
-0
lines changed
Lines changed: 309 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,309 @@
1+
---
2+
title: "My notes on 'JVM Anatomy 101' talk by Nikita Lipsky"
3+
date: 2025-09-20
4+
draft: false
5+
---
6+
7+
My notes from the insightful talk "JVM Anatomy 101" by Nikita Lipsky. The
8+
talk provides a deep dive into the internals of Java Virtual Machine (JVM), covering everything from bytecode and class
9+
loading to memory management and garbage collection. You can watch the original
10+
talk [here](https://www.youtube.com/watch?v=BeMi8K0AFAc).
11+
12+
### Java class file and bytecode
13+
14+
#### Class file
15+
16+
1. Version
17+
2. Constant Pool
18+
3. Class name, modifiers
19+
4. Superclass, superinterfaces
20+
5. Fields
21+
6. Methods
22+
7. Attributes
23+
24+
* Fields, methods may have attributes (e.g., values of constant fields)
25+
* Main attributes of a method is its code: Java bytecode
26+
27+
#### Java bytecode
28+
29+
1. Instruction array
30+
2. Operand stack
31+
3. Local variables array (method arguments, local variables)
32+
33+
>In the JVM specification each instruction is strictly defined. Two different JVMs that obey JVM specification have no chance to execute the same bytecode differently.
34+
35+
#### Java Runtime
36+
37+
It is not enough to have a JVM only to run a Java program. Java Runtime is needed for it:
38+
39+
* JVM
40+
* Platform classes
41+
* core classes (Object, String, etc)
42+
* Java standard APIs (IO, NET, NIO, AWT/Swing, etc.)
43+
* Implementation of native methods of platform classes (OS-specific, distributed as native dynamic libraries — .dll, .so, .dylib)
44+
* Auxiliary files (time zones, locales descriptions, media resources, etc.)
45+
### Classloading engine
46+
47+
#### Classloading
48+
49+
JVM executes classes from the following sources
50+
51+
- Java Runtime (platform classes)
52+
- Application classpath
53+
- Autogenerated on-the-fly (Proxy, Reflection accessors, invoke dynamic implementation)
54+
- Provided by the application itself
55+
56+
Every class is loaded by a class loader:
57+
58+
* Platform classes are loaded by the bootstrap class loader
59+
* Classes from application are loaded by the system class loader (AppClassLoader)
60+
* Application classes may create user-defined class loaders
61+
62+
> JVM can load two different classes with the same name provided that they're loaded with different class loaders
63+
64+
* A class loader forms a unique class names space
65+
66+
#### Class loading process
67+
68+
* Class file parsing: class format is checked (may throw ClassFormatError)
69+
* Creation of a runtime representation of the class in a special JVM memory area: runtime constant pool in Method Area aka **Meta Space** aka Permanent Generation
70+
* Loading of a superclass and superinterfaces
71+
72+
#### Linking
73+
74+
* Java bytecode verification
75+
* Preparation
76+
* Resolution of symbolic reference
77+
78+
#### Java bytecode verification
79+
80+
* Performed once for a class
81+
* Instructions correctness checks
82+
* Operand stack and local variables out of bounds checks
83+
* Type assign compatibility checks
84+
85+
#### Class initialization
86+
87+
> Before any method of a class can be executed, class initialization should happen, which is a call of a static initializer of a class
88+
89+
```java
90+
class MyClass {
91+
static int i = 10;
92+
static String s = "Hello";
93+
static {
94+
System.out.println(s)
95+
}
96+
}
97+
```
98+
99+
- Happens on first use
100+
- new
101+
- static field access
102+
- static method call
103+
- Provokes initialization of a super class and super interfaces with default methods
104+
105+
### Execution engine: interpreters, JIT, AOT
106+
107+
JVM may execute bytecode via
108+
109+
- Interpretation
110+
- Translation into native code, that will run directly on CPU
111+
112+
#### Interpreter (Simple)
113+
114+
```java
115+
pc = 0
116+
do {
117+
fetch opcode at pc;
118+
if (operands) fetch operands;
119+
execute the opcode;
120+
calculate pc;
121+
} while (there is more);
122+
```
123+
124+
#### Template interpreter
125+
126+
* Every bytecode instruction is implemented as a sequence of target CPU instructions (template)
127+
* Instruction interpretation is just jump to a corresponding template
128+
129+
#### Compilers
130+
131+
* Non-optimizing
132+
* make it up as I go along
133+
* Simple Optimizing
134+
* HotSpot Client (C1)
135+
* Sophisticated Optimizing
136+
* HotSpot Server (C2)
137+
138+
- Dynamic (Just-In-Time - **JIT**)
139+
- Translation into native code happens at application runtime
140+
- Static (Ahead-Of-Time - **AOT**)
141+
- Translation happens before program execution
142+
143+
##### Dynamic Compilers (JIT)
144+
145+
* Work concurrently with program execution
146+
* Compiler hot code only (determined by profiling)
147+
* Profiling information is used for optimal optimizations
148+
##### Static Compilers (AOT)
149+
150+
* Are not limited in resources for optimizations
151+
* Compile every method of a program using the most aggressive optimizations
152+
* No overheads at run-time (fast startup)
153+
### Meta information access subsystem: reflection, indy, JNI
154+
155+
#### Reflection
156+
157+
* Allows access to classes, fields, methods via name (by string literal) from a Java program
158+
* Is implemented in the JVM via access to Meta Space
159+
* Key feature of Java for many popular frameworks and JVM-based programming languages implementations (Groovy, Clojure, Ruby, etc.)
160+
161+
#### Method Handles and invokedynamic (JSR-292, indy)
162+
163+
> To allow dynamic languages to be executed efficiently on JVM, a new instruction called `invokedynamic` was added to JVM instruction set
164+
165+
#### Java Native Interface (JNI)
166+
167+
* Binds the JVM with the outside world (OS)
168+
* C interface of the JVM
169+
* Does not depend on implementation details of a JVM
170+
* Is used for implementation of native methods in C language (or another low-level language)
171+
* JNI is used to implement platform specific parts of Java SE API: IO, NET, AWT
172+
* JNI is implemented in the JVM as an access to Meta Space
173+
174+
#### Project Panama
175+
176+
* C interop without coding in C:
177+
* Direct external C functions call from Java
178+
* C data structures access from Java
179+
180+
### Threading, exception handling, synchronization
181+
182+
#### Threads
183+
184+
* Java thread is mapped to a native thread in a 1-1
185+
* Each thread has a reserved region of memory referred to as its stack containing local variables and operand stacks of methods (method frames) being executed within a thread
186+
* Thread stack size is a JVM argument: -Xss
187+
* Java thread has a detailed information about its stack (stack trace)
188+
* You may print or examine stack from a Java program
189+
190+
#### Project Loom
191+
192+
**Problem:** there are limitations on how many native threads can be created (native threads are expensive)
193+
194+
**Solution:** virtual threads (light-weight threads) managed by the JVM (quitting 1-1 scheme)
195+
196+
#### Exception handling
197+
198+
#### Threads and Java Memory Model
199+
200+
>JVM knows everything about the call stack, it helps it in exception handling implementation.
201+
202+
#### Synchronization
203+
204+
* For safe access to a shared memory between threads
205+
* Naive implementation may use OS synchronization primitives
206+
* Java object has an OS monitor as a hidden field
207+
* Highly optimized when a resource contention happens less rarely than an enter to a synchronized block
208+
* Today, it is recommended to use `java.util.concurrent` primitives instead of built-in synchronization
209+
### Memory management: heap, allocation, GC
210+
211+
#### Memory allocation
212+
213+
* Implementation of the **new** operator
214+
* Objects allocated with the **new** operator reside in so called Java heap
215+
* Java heap structure is JVM implementation specific
216+
* Java objects layout is JVM implementation specific as well
217+
218+
* Must be fast
219+
* JVM queries OS for memory for many objects at once, not for one
220+
* Allocation by bump pointer technic
221+
* Must be thread-safe but parallel (non-blocking)
222+
* Thread local heaps: every thread consumes thread local memory region
223+
224+
#### Java Object Layout
225+
226+
The layout is not specified by the JVM, however it requires:
227+
228+
- Java Object header
229+
- Reference to a class object
230+
- Monitor (lock word)
231+
- Identity hashcode
232+
- GC flags
233+
- Fields
234+
- May be reordered for the sake of size optimization, alignment, or target architecture specifics
235+
236+
#### Project Valhala
237+
238+
- The main idea is to introduce the concept of objects in the JVM, which do not require a header at all
239+
- Object erasure to its primitive types data
240+
- Removing unnecessary indirection in arrays
241+
242+
#### Garbage collection
243+
244+
>Garbage is objects that cannot be used by a program.
245+
246+
**Question:** What objects can be used by a program?
247+
##### Not Garbage
248+
249+
1. Objects in static fields of classes
250+
2. Objects that are accessible from all method frames (local variables and operand stack)
251+
3. Objects referenced by "not garbage"
252+
253+
##### GC roots
254+
255+
1. Objects in static fields of classes
256+
2. Objects that are accessible from threads stacks
257+
3. Objects that are referenced by JNI references in native methods
258+
259+
Not garbage aka live objects:
260+
261+
1. Objects from GC roots
262+
2. Objects that are referenced by live objects
263+
264+
Everything else is garbage.
265+
##### Tracing garbage collectors
266+
267+
* Mark-and-sweep
268+
* Marks live objects, sweeps (frees) the garbage
269+
* Stop-and-copy
270+
* Heap is divided into two semi-spaces
271+
* Copies live objects to the second semi-space
272+
* First semi-space is used as a second semi-space on the next collection
273+
274+
##### Stop the World
275+
276+
* Live objects are defined for a specific moment of a program execution: the set of live objects is being changed during the execution
277+
* To collect the garbage, all threads should be paused to determine the garbage (STW pause)
278+
279+
One of the main tasks of modern garbage collectors is to reduce the STW pause. Methods to reduce the STW pause:
280+
281+
- Incremental
282+
- Do not collect all the garbage within GC pause
283+
- Parallel
284+
- Collect the garbage in parallel threads within GC pause
285+
- Concurrent
286+
- Collect the garbage concurrently with program execution
287+
288+
##### Generation Garbage Collection
289+
290+
**Weak Generation Hypothesis**
291+
292+
- Most objects die young
293+
- Old objects rarely reference young objects
294+
295+
**Generation GC:**
296+
297+
- Particular case of incremental GC
298+
- During minor collection cycles, only young objects are traced
299+
- Objects that survived several cycles are moved to old generation
300+
301+
##### Thread Local Garbage Collection
302+
303+
**Thread Local Hypothesis**
304+
305+
- Most objects die in a thread that created them
306+
307+
**Thread local GC:**
308+
309+
- Collects thread local garbage within a respective thread, not pausing other threads

0 commit comments

Comments
 (0)