|
| 1 | +--- |
| 2 | +title: "My notes on 'JVM Anatomy 101' talk by Nikita Lipsky" |
| 3 | +date: 2025-09-20 |
| 4 | +draft: false |
| 5 | +--- |
| 6 | + |
| 7 | +My notes from the insightful talk "JVM Anatomy 101" by Nikita Lipsky. The |
| 8 | +talk provides a deep dive into the internals of Java Virtual Machine (JVM), covering everything from bytecode and class |
| 9 | +loading to memory management and garbage collection. You can watch the original |
| 10 | +talk [here](https://www.youtube.com/watch?v=BeMi8K0AFAc). |
| 11 | + |
| 12 | +### Java class file and bytecode |
| 13 | + |
| 14 | +#### Class file |
| 15 | + |
| 16 | +1. Version |
| 17 | +2. Constant Pool |
| 18 | +3. Class name, modifiers |
| 19 | +4. Superclass, superinterfaces |
| 20 | +5. Fields |
| 21 | +6. Methods |
| 22 | +7. Attributes |
| 23 | + |
| 24 | +* Fields, methods may have attributes (e.g., values of constant fields) |
| 25 | +* Main attributes of a method is its code: Java bytecode |
| 26 | + |
| 27 | +#### Java bytecode |
| 28 | + |
| 29 | +1. Instruction array |
| 30 | +2. Operand stack |
| 31 | +3. Local variables array (method arguments, local variables) |
| 32 | + |
| 33 | +>In the JVM specification each instruction is strictly defined. Two different JVMs that obey JVM specification have no chance to execute the same bytecode differently. |
| 34 | +
|
| 35 | +#### Java Runtime |
| 36 | + |
| 37 | +It is not enough to have a JVM only to run a Java program. Java Runtime is needed for it: |
| 38 | + |
| 39 | +* JVM |
| 40 | +* Platform classes |
| 41 | + * core classes (Object, String, etc) |
| 42 | + * Java standard APIs (IO, NET, NIO, AWT/Swing, etc.) |
| 43 | +* Implementation of native methods of platform classes (OS-specific, distributed as native dynamic libraries — .dll, .so, .dylib) |
| 44 | +* Auxiliary files (time zones, locales descriptions, media resources, etc.) |
| 45 | +### Classloading engine |
| 46 | + |
| 47 | +#### Classloading |
| 48 | + |
| 49 | +JVM executes classes from the following sources |
| 50 | + |
| 51 | +- Java Runtime (platform classes) |
| 52 | +- Application classpath |
| 53 | +- Autogenerated on-the-fly (Proxy, Reflection accessors, invoke dynamic implementation) |
| 54 | +- Provided by the application itself |
| 55 | + |
| 56 | +Every class is loaded by a class loader: |
| 57 | + |
| 58 | +* Platform classes are loaded by the bootstrap class loader |
| 59 | +* Classes from application are loaded by the system class loader (AppClassLoader) |
| 60 | +* Application classes may create user-defined class loaders |
| 61 | + |
| 62 | +> JVM can load two different classes with the same name provided that they're loaded with different class loaders |
| 63 | +
|
| 64 | +* A class loader forms a unique class names space |
| 65 | + |
| 66 | +#### Class loading process |
| 67 | + |
| 68 | +* Class file parsing: class format is checked (may throw ClassFormatError) |
| 69 | +* Creation of a runtime representation of the class in a special JVM memory area: runtime constant pool in Method Area aka **Meta Space** aka Permanent Generation |
| 70 | +* Loading of a superclass and superinterfaces |
| 71 | + |
| 72 | +#### Linking |
| 73 | + |
| 74 | +* Java bytecode verification |
| 75 | +* Preparation |
| 76 | +* Resolution of symbolic reference |
| 77 | + |
| 78 | +#### Java bytecode verification |
| 79 | + |
| 80 | +* Performed once for a class |
| 81 | +* Instructions correctness checks |
| 82 | +* Operand stack and local variables out of bounds checks |
| 83 | +* Type assign compatibility checks |
| 84 | + |
| 85 | +#### Class initialization |
| 86 | + |
| 87 | +> Before any method of a class can be executed, class initialization should happen, which is a call of a static initializer of a class |
| 88 | +
|
| 89 | +```java |
| 90 | +class MyClass { |
| 91 | + static int i = 10; |
| 92 | + static String s = "Hello"; |
| 93 | + static { |
| 94 | + System.out.println(s) |
| 95 | + } |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +- Happens on first use |
| 100 | + - new |
| 101 | + - static field access |
| 102 | + - static method call |
| 103 | +- Provokes initialization of a super class and super interfaces with default methods |
| 104 | + |
| 105 | +### Execution engine: interpreters, JIT, AOT |
| 106 | + |
| 107 | +JVM may execute bytecode via |
| 108 | + |
| 109 | +- Interpretation |
| 110 | +- Translation into native code, that will run directly on CPU |
| 111 | + |
| 112 | +#### Interpreter (Simple) |
| 113 | + |
| 114 | +```java |
| 115 | +pc = 0 |
| 116 | + do { |
| 117 | + fetch opcode at pc; |
| 118 | + if (operands) fetch operands; |
| 119 | + execute the opcode; |
| 120 | + calculate pc; |
| 121 | + } while (there is more); |
| 122 | +``` |
| 123 | + |
| 124 | +#### Template interpreter |
| 125 | + |
| 126 | +* Every bytecode instruction is implemented as a sequence of target CPU instructions (template) |
| 127 | +* Instruction interpretation is just jump to a corresponding template |
| 128 | + |
| 129 | +#### Compilers |
| 130 | + |
| 131 | +* Non-optimizing |
| 132 | + * make it up as I go along |
| 133 | +* Simple Optimizing |
| 134 | + * HotSpot Client (C1) |
| 135 | +* Sophisticated Optimizing |
| 136 | + * HotSpot Server (C2) |
| 137 | + |
| 138 | +- Dynamic (Just-In-Time - **JIT**) |
| 139 | + - Translation into native code happens at application runtime |
| 140 | +- Static (Ahead-Of-Time - **AOT**) |
| 141 | + - Translation happens before program execution |
| 142 | + |
| 143 | +##### Dynamic Compilers (JIT) |
| 144 | + |
| 145 | +* Work concurrently with program execution |
| 146 | +* Compiler hot code only (determined by profiling) |
| 147 | +* Profiling information is used for optimal optimizations |
| 148 | +##### Static Compilers (AOT) |
| 149 | + |
| 150 | +* Are not limited in resources for optimizations |
| 151 | +* Compile every method of a program using the most aggressive optimizations |
| 152 | +* No overheads at run-time (fast startup) |
| 153 | +### Meta information access subsystem: reflection, indy, JNI |
| 154 | + |
| 155 | +#### Reflection |
| 156 | + |
| 157 | +* Allows access to classes, fields, methods via name (by string literal) from a Java program |
| 158 | +* Is implemented in the JVM via access to Meta Space |
| 159 | +* Key feature of Java for many popular frameworks and JVM-based programming languages implementations (Groovy, Clojure, Ruby, etc.) |
| 160 | + |
| 161 | +#### Method Handles and invokedynamic (JSR-292, indy) |
| 162 | + |
| 163 | +> To allow dynamic languages to be executed efficiently on JVM, a new instruction called `invokedynamic` was added to JVM instruction set |
| 164 | +
|
| 165 | +#### Java Native Interface (JNI) |
| 166 | + |
| 167 | +* Binds the JVM with the outside world (OS) |
| 168 | +* C interface of the JVM |
| 169 | + * Does not depend on implementation details of a JVM |
| 170 | + * Is used for implementation of native methods in C language (or another low-level language) |
| 171 | + * JNI is used to implement platform specific parts of Java SE API: IO, NET, AWT |
| 172 | +* JNI is implemented in the JVM as an access to Meta Space |
| 173 | + |
| 174 | +#### Project Panama |
| 175 | + |
| 176 | +* C interop without coding in C: |
| 177 | + * Direct external C functions call from Java |
| 178 | + * C data structures access from Java |
| 179 | + |
| 180 | +### Threading, exception handling, synchronization |
| 181 | + |
| 182 | +#### Threads |
| 183 | + |
| 184 | +* Java thread is mapped to a native thread in a 1-1 |
| 185 | +* Each thread has a reserved region of memory referred to as its stack containing local variables and operand stacks of methods (method frames) being executed within a thread |
| 186 | + * Thread stack size is a JVM argument: -Xss |
| 187 | +* Java thread has a detailed information about its stack (stack trace) |
| 188 | + * You may print or examine stack from a Java program |
| 189 | + |
| 190 | +#### Project Loom |
| 191 | + |
| 192 | +**Problem:** there are limitations on how many native threads can be created (native threads are expensive) |
| 193 | + |
| 194 | +**Solution:** virtual threads (light-weight threads) managed by the JVM (quitting 1-1 scheme) |
| 195 | + |
| 196 | +#### Exception handling |
| 197 | + |
| 198 | +#### Threads and Java Memory Model |
| 199 | + |
| 200 | +>JVM knows everything about the call stack, it helps it in exception handling implementation. |
| 201 | +
|
| 202 | +#### Synchronization |
| 203 | + |
| 204 | +* For safe access to a shared memory between threads |
| 205 | +* Naive implementation may use OS synchronization primitives |
| 206 | + * Java object has an OS monitor as a hidden field |
| 207 | +* Highly optimized when a resource contention happens less rarely than an enter to a synchronized block |
| 208 | +* Today, it is recommended to use `java.util.concurrent` primitives instead of built-in synchronization |
| 209 | +### Memory management: heap, allocation, GC |
| 210 | + |
| 211 | +#### Memory allocation |
| 212 | + |
| 213 | +* Implementation of the **new** operator |
| 214 | +* Objects allocated with the **new** operator reside in so called Java heap |
| 215 | +* Java heap structure is JVM implementation specific |
| 216 | +* Java objects layout is JVM implementation specific as well |
| 217 | + |
| 218 | +* Must be fast |
| 219 | + * JVM queries OS for memory for many objects at once, not for one |
| 220 | + * Allocation by bump pointer technic |
| 221 | +* Must be thread-safe but parallel (non-blocking) |
| 222 | + * Thread local heaps: every thread consumes thread local memory region |
| 223 | + |
| 224 | +#### Java Object Layout |
| 225 | + |
| 226 | +The layout is not specified by the JVM, however it requires: |
| 227 | + |
| 228 | +- Java Object header |
| 229 | + - Reference to a class object |
| 230 | + - Monitor (lock word) |
| 231 | + - Identity hashcode |
| 232 | + - GC flags |
| 233 | +- Fields |
| 234 | + - May be reordered for the sake of size optimization, alignment, or target architecture specifics |
| 235 | + |
| 236 | +#### Project Valhala |
| 237 | + |
| 238 | +- The main idea is to introduce the concept of objects in the JVM, which do not require a header at all |
| 239 | + - Object erasure to its primitive types data |
| 240 | + - Removing unnecessary indirection in arrays |
| 241 | + |
| 242 | +#### Garbage collection |
| 243 | + |
| 244 | +>Garbage is objects that cannot be used by a program. |
| 245 | +
|
| 246 | +**Question:** What objects can be used by a program? |
| 247 | +##### Not Garbage |
| 248 | + |
| 249 | +1. Objects in static fields of classes |
| 250 | +2. Objects that are accessible from all method frames (local variables and operand stack) |
| 251 | +3. Objects referenced by "not garbage" |
| 252 | + |
| 253 | +##### GC roots |
| 254 | + |
| 255 | +1. Objects in static fields of classes |
| 256 | +2. Objects that are accessible from threads stacks |
| 257 | +3. Objects that are referenced by JNI references in native methods |
| 258 | + |
| 259 | +Not garbage aka live objects: |
| 260 | + |
| 261 | +1. Objects from GC roots |
| 262 | +2. Objects that are referenced by live objects |
| 263 | + |
| 264 | +Everything else is garbage. |
| 265 | +##### Tracing garbage collectors |
| 266 | + |
| 267 | +* Mark-and-sweep |
| 268 | + * Marks live objects, sweeps (frees) the garbage |
| 269 | +* Stop-and-copy |
| 270 | + * Heap is divided into two semi-spaces |
| 271 | + * Copies live objects to the second semi-space |
| 272 | + * First semi-space is used as a second semi-space on the next collection |
| 273 | + |
| 274 | +##### Stop the World |
| 275 | + |
| 276 | +* Live objects are defined for a specific moment of a program execution: the set of live objects is being changed during the execution |
| 277 | +* To collect the garbage, all threads should be paused to determine the garbage (STW pause) |
| 278 | + |
| 279 | +One of the main tasks of modern garbage collectors is to reduce the STW pause. Methods to reduce the STW pause: |
| 280 | + |
| 281 | +- Incremental |
| 282 | + - Do not collect all the garbage within GC pause |
| 283 | +- Parallel |
| 284 | + - Collect the garbage in parallel threads within GC pause |
| 285 | +- Concurrent |
| 286 | + - Collect the garbage concurrently with program execution |
| 287 | + |
| 288 | +##### Generation Garbage Collection |
| 289 | + |
| 290 | +**Weak Generation Hypothesis** |
| 291 | + |
| 292 | +- Most objects die young |
| 293 | +- Old objects rarely reference young objects |
| 294 | + |
| 295 | +**Generation GC:** |
| 296 | + |
| 297 | +- Particular case of incremental GC |
| 298 | +- During minor collection cycles, only young objects are traced |
| 299 | +- Objects that survived several cycles are moved to old generation |
| 300 | + |
| 301 | +##### Thread Local Garbage Collection |
| 302 | + |
| 303 | +**Thread Local Hypothesis** |
| 304 | + |
| 305 | +- Most objects die in a thread that created them |
| 306 | + |
| 307 | +**Thread local GC:** |
| 308 | + |
| 309 | +- Collects thread local garbage within a respective thread, not pausing other threads |
0 commit comments