Skip to content

Commit 78b71d5

Browse files
committed
A lot of work on a lot of post drafts. All for the C4 series
1 parent ca38a5d commit 78b71d5

9 files changed

+1364
-668
lines changed

_drafts/2025-09-07-fn-star-talkin-bout-my-generation.md

Lines changed: 0 additions & 603 deletions
This file was deleted.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
---
2+
layout: post
3+
title: C4 - A time for reflection
4+
date: 2025-09-14 00:00:00 -0500
5+
categories: general
6+
---
7+
8+
How the ClojureJVM compiler handles host interop reqriring reflection, and how the ClojureCLR compiler uses the Dynamic Language Runtime for this purpose.
9+
10+
Lines changed: 387 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,387 @@
1+
---
2+
layout: post
3+
title: C4 - fn* -- talkin' 'bout my generation
4+
date: 2025-10-15 00:00:00 -0500
5+
categories: general
6+
---
7+
8+
We look at code generation for functions in ClojureCLR.
9+
10+
In a previous post ([C4: Functional anatomy]({{site.baseurl}}{% post_url 2025-09-04-functional-anatomy %})), we looked at how functions are represented in ClojureCLR. That post focused on the interfaces and classes that form the basis of the representation of functions. When we define a function in ClojureCLR, we generate a class derived from one of the base classes (typically `AFunction` or `RestFn`); there is a significant amount of support code that gets added. That support code is the topic of this post.
11+
12+
## Our playground
13+
14+
The primary classes involved in function code generation are:
15+
16+
<img src="{{site.baseurl | prepend: site.url}}/assets/images/objexpr.png" alt="Graph of all types related to ObjExpr" />
17+
18+
I have no idea why `ObjExpr` and `ObjMethod` are named what they are.
19+
`FnExpr` is the AST node type that represents an `fn*` form; `FnMethod` represents an `invoke` method of the generated class.
20+
`NewInstanceExpr` represents a `deftype` or `reify` form; `NewInstanceMethod` represents a method of the generated class.
21+
There is a significant amount of shared code in the base classes `ObjExpr` and `ObjMethod`.
22+
23+
Note: Do not confuse `NewInstanceExpr` with `NewExpr` -- the latter represents a `new` form that creates an instance of a class; that falls under host platform interop.
24+
25+
## Out of control
26+
27+
If you look at code for the these classes and try to track the flow of data, well, good luck to you. It is a pit of despair.
28+
We are going to ignore completely (for now, at least) those grungy details. You will be able to get the details in the upcoming [C4: Out of control][TBD] post.
29+
30+
Just know that as we parse the forms in the body of the function, the parser is not only creating the AST node at that point but also pushing information into the `FnMethod` and `FnExpr` instances being built.
31+
32+
## What's inside
33+
34+
I'll focus on `FnExpr` and `FnMethod` here. Most of this analysis applies to `NewInstanceExpr` and `NewInstanceMethod` as well.
35+
36+
We parse an individual `invoke` method of the function (one arity of the `fn*` definition) as follows:
37+
38+
- Determine the return type of the method. This is typically `object`, but can be a more specific type if type hints are present. The only primitive types supported are `long` and `double`.
39+
40+
- Process the parameters. This includes:
41+
- Determine the number of fixed parameters and whether there is a rest parameter (and not more than one)..
42+
- Determine the type of each parameter. This is typically `object`, but can be a more specific type if type hints are present. The only primitive types supported are `long` and `double`.
43+
- Create a list of `LocalBinding` instances, one for each parameter. These are used to represent the parameters in the body of the method.
44+
- Process the body of the method. This involves parsing the forms in the body and generating an abstract syntax tree (AST) for the body of the method; this AST has a `BodyExpr` as the root. The context of this parsing is the `FnExpr` itself, as well as the `FnMethod` being generated.
45+
46+
In addition to its `BodyExpr`, the `FnMethod` object holds a list of local bindings. This list is consulted when we are trying to resolve symbols occurring in the body code. This list will include the function parameters. However, special forms that define local bindings--`let*`, `letfn*`, and `try` (in its `catch` forms)--piggy-back on the `FnMethod` by adding their local bindings to the local bindings of the method. (In fact, if you try to evaluate a `let*` form not in the body of a function, it wraps itself in an anonymous function and parses that, so that it has an `FnMethod` to manage its local bindings.)
47+
48+
What information from the parsing of the body is collected in the `FnExpr` object we are building? The class we will ultimately define to represent the function we are defining is fairly simple: various flavors of `invoke` methods and static fields holding data needed by those methods. And a constructor. The information on the static fields is contributed by the forms parsed in the bodies of the methods.
49+
50+
The `ObjExpr` class maintains the following collections:
51+
52+
- `Constants`: a list of all the constants (literal values) used in the function. These are contributed by nodes of type `ConstantExpr`, `NumberExpr` (when not `long` or `double`), `KeywordExpr`, and any references to `Var`s (when the symbol is not resolved to a local binding). Any constant that is needed in the final code generation will be stored in static fields in the generated class.
53+
- `Closes`: a map of local bindings that are closed over by the methods. These are references to local bindings from outer scopes.
54+
- `KeywordCallsites`: places where keywords are used as functions, as in `(:key map)`.
55+
- `ProtocolCallsites`: places where protocol methods are called.
56+
57+
58+
`Constants` contributes static fields to the function class. `Closes` defines the values needed for the constructor of the function class. `KeywordCallsites` and `ProtocolCallsites` also contribute static fields. We'll discuss keyword callsites in [C4: Key in-site][TBD]. We'll discuss protocol callsites in [C4: Is there a protocol for that?][TBD].
59+
60+
Let's look at some examples.
61+
62+
## Basic examples
63+
64+
For the examples, we will show the Clojure code, then the generated code. The generated code is the decompilation into C# of the IL generated by ClojureCLR. Thanks goes to ILSpy.
65+
66+
Let's start with a simple function:
67+
68+
```clojure
69+
(defn f1 [x] (str x))
70+
```
71+
72+
We will give our examples as `defn` forms for convenience. However, this obscures a few important details. The `defn` form expands to a `def` of a `fn` form.
73+
74+
```clojure
75+
(def f (clojure.core/fn ([x] (str x))))
76+
```
77+
78+
The `clojure.core/fn` itself is a macro that expands to an `fn*` form; in this case, it has the same body as shown here. The parser for `def` creates a context for the parsing the `fn*` form that provides the name `f` for the function. Parsing the `fn*` generates a class that holds the definition of the function. (That is actually a side-effect of parsing; even if the parse fails, you have a class floating around. What fun.)
79+
80+
The code generated for the `def` itself is roughly this:
81+
82+
```clojure
83+
RT.var("test.compiler", "f1").bindRoot(new compiler$f1());
84+
```
85+
86+
In other words, find the `Var` for `test.compiler/f1` and bind it to a new instance of the class `compiler$f1`, which is the class generated for the `fn*` form. Our interest here is the generated class. We'll talk more about how the code for the `def` initialization is generated and used during loading in [C4: Some assembly required][TBD].
87+
88+
Here is the generated code for the class `compiler$f1`. My comments are interposed.
89+
90+
```C#
91+
// AFunction is one of the standard base classes for function implementations.
92+
public class compiler$f1 : AFunction
93+
{
94+
// There is only one constant reference in the body: The Var for `clojure.core/str`.
95+
// We create a static field to hold that and initialize it in the static constructor.
96+
protected internal static Var const__0;
97+
98+
static compiler$f1()
99+
{
100+
const__0 = RT.var("clojure.core", "str");
101+
}
102+
103+
// There is only one arity defined: 1 parameter, no '& arg'.
104+
// Thus we need only a single `invoke` method.
105+
// This function allows direct linking.
106+
// For such a function, any invoke method will delegate to a static method of the same signature.
107+
public override object invoke(object P_0)
108+
{
109+
return invokeStatic(P_0);
110+
}
111+
112+
// This essentially is: (str x).
113+
public static object invokeStatic(object P_0)
114+
{
115+
return ((IFn)const__0.getRawRoot()).invoke(P_0);
116+
}
117+
118+
// Not in the JVM version, but CLR needs this for certain operations.
119+
// We support only arity 1.
120+
public override bool HasArity(int P_0)
121+
{
122+
if (P_0 != 1)
123+
{
124+
return false;
125+
}
126+
return true;
127+
}
128+
}
129+
```
130+
131+
Now would be a good time to review [C4: Functional anatomy]({{site.baseurl}}{% post_url 2025-09-04-functional-anatomy %}) to understand the base class `AFunction`, details on direct linking, and other background information.
132+
133+
If we have several arities, we get multiple `invoke` and `invokeStatic` methods. For example:
134+
135+
```clojure
136+
(defn f3
137+
([] (f3 1))
138+
([x] (f3 x 2))
139+
([x y] (str x y)))
140+
```
141+
142+
We also have some self-reference here. The generated code is:
143+
144+
```C#
145+
public class compiler$f3 : AFunction
146+
{
147+
// We have some additional constants.
148+
protected internal static Var const__0;
149+
protected internal static object const__1;
150+
protected internal static object const__2;
151+
protected internal static Var const__3;
152+
153+
static compiler$f3()
154+
{
155+
const__0 = RT.var("test.compiler", "f3");
156+
const__1 = 1L; // Note: implicit boxing
157+
const__2 = 2L; // Note: implicit boxing
158+
const__3 = RT.var("clojure.core", "str");
159+
}
160+
161+
// The invoke methods each delegate to the corresponding invokeStatic method.
162+
public override object invoke() => invokeStatic();
163+
public override object invoke(object P_0) => invokeStatic(P_0);
164+
public override object invoke(object P_0, object P_1) => invokeStatic(P_0, P_1);
165+
166+
// This is essentially: (f3 1)
167+
// Note that we need to box the long value 1.
168+
// So that we only box once, we have a static field holding the boxed value.
169+
public static object invokeStatic(object P_0)
170+
{
171+
return ((IFn)const__0.getRawRoot()).invoke(P_0, const__2);
172+
}
173+
174+
// Similarly (f2 x 2)
175+
public static object invokeStatic(object P_0)
176+
{
177+
return ((IFn)const__0.getRawRoot()).invoke(P_0, const__2);
178+
}
179+
180+
// This is essentially: (str x y)
181+
public static object invokeStatic(object P_0, object P_1)
182+
{
183+
return ((IFn)const__3.getRawRoot()).invoke(P_0, P_1);
184+
}
185+
186+
// We support arities 0, 1, and 2.
187+
public override bool HasArity(int P_0)
188+
{
189+
if (P_0 != 2 && P_0 != 1 && P_0 != 0)
190+
{
191+
return false;
192+
}
193+
return true;
194+
}
195+
}
196+
```
197+
198+
## Primitive typing
199+
200+
If we have type hints for our arguments or return type and either a `long` or `double` type hint is involved, there are additional interfaces implemented and additional methods generated. For example:
201+
202+
```clojure
203+
(defn t
204+
(^long [^String x ^double y] (long (+ (double (count x)) y))))
205+
```
206+
207+
We generate:
208+
209+
```C#
210+
// Note the additional interface ODL.
211+
// This is for the signature (object, double) -> long.
212+
// Note that Object is specified instead of String.
213+
// Reference types are always reduced to Object in the signatures.
214+
// It actually is valid Clojure to pass any reference type instance as the first argument.
215+
// The main purpose here is to avoid boxing of the double argument and the long return if possible.
216+
217+
public class compiler$t : AFunction, ODL
218+
{
219+
// No static fields needed.
220+
// calls to long, double and count are inlined by the compiler.
221+
static compiler$t()
222+
{
223+
}
224+
225+
// The invoke method still defers to the invokeStatic method.
226+
// However, we cast the second parameter to double because that is the required type.
227+
public override object invoke(object P_0, object P_1)
228+
{
229+
return invokeStatic(P_0, RT.doubleCast(P_1));
230+
}
231+
232+
// Our invokeStatic has the designated signature
233+
// Direct linking this can avoid boxing of the double argument and the long return value.
234+
public static long invokeStatic(object P_0, double P_1)
235+
{
236+
return RT.longCast((double)RT.count(P_0) + P_1);
237+
}
238+
239+
// This is the additional method needed for the ODL interface.
240+
public override long invokePrim(object P_0, double P_1)
241+
{
242+
return invokeStatic(P_0, P_1);
243+
}
244+
245+
public override bool HasArity(int P_0)
246+
{
247+
if (P_0 != 2)
248+
{
249+
return false;
250+
}
251+
return true;
252+
}
253+
}
254+
```
255+
256+
Code calling this function can use the `ODL` interface to avoid boxing the `double` argument and the `long` return value. This can lead to improved performance in scenarios where these functions are called frequently or in tight loops.
257+
258+
For example, the code:
259+
260+
```clojure
261+
(defn ut [x y](t (str x) (double y)))
262+
```
263+
compiles to having the `staticInvoke` method:
264+
265+
```C#
266+
public static object invokeStatic(object P_0, object P_1)
267+
{
268+
return ((ODL)const__0.getRawRoot())
269+
.invokePrim(((IFn)const__1.getRawRoot()).invoke(P_0),
270+
RT.doubleCast(P_1));
271+
}
272+
```
273+
274+
## Closures
275+
276+
If a function closes over local bindings from an outer scope, those local bindings are passed to the constructor of the function class and stored in instance fields. For example, consider a very silly function that returns a function that concatenates its argument to a designated string:
277+
278+
```clojure
279+
(defn h [x] (fn [y] (str x y)))
280+
```
281+
282+
Looking first at the generated code for the inner function:
283+
284+
```C#
285+
public class compiler$hfn__4264__4268 : AFunction
286+
{
287+
// An instance field to hold the closed-over binding value.
288+
public object x;
289+
290+
// The constant for the Var `clojure.core/str`.
291+
protected internal static Var const__0;
292+
static compiler$hfn__4264__4268()
293+
{
294+
const__0 = RT.var("clojure.core", "str");
295+
}
296+
297+
// There is no staticInvoke method.
298+
// A method with closed-over bindings cannot be directly linked.
299+
// Thus, there is no invokeStatic method defined.
300+
// The invoke method must do the work.
301+
public override object invoke(object P_0)
302+
{
303+
return ((IFn)const__0.getRawRoot()).invoke(x, P_0);
304+
}
305+
306+
public override bool HasArity(int P_0)
307+
{
308+
if (P_0 != 1)
309+
{
310+
return false;
311+
}
312+
return true;
313+
}
314+
315+
// HERE is the secret sauce.
316+
// The constructor takes the closed-over binding as a parameter.
317+
public compiler$hfn__4264__4268(object P_0)
318+
{
319+
x = P_0;
320+
}
321+
}
322+
```
323+
324+
Classes for functions that do not close over outer-scope bindings need only a no-argument constructor. Here, we require a constructor that takes the closed-over binding `x` as a parameter and stores it in an instance field.
325+
326+
The outer function `h` generates the following code:
327+
328+
```C#
329+
public static object invokeStatic(object P_0)
330+
{
331+
return new compiler$hfn__4264__4268(P_0);
332+
}
333+
```
334+
335+
When we call `(h 12)`, say, we get back a new instance of the inner function class, with its `x` field set to `12`.
336+
337+
338+
## Direct linking
339+
340+
Direct linking is a performance optimization that allows calls to functions to bypass some of the overhead of the general function call mechanism. It is described in [C4: Functional anatomy]({{site.baseurl}}{% post_url 2025-09-04-functional-anatomy %}).
341+
342+
The examples above were compiled with direct linking turned off; I wanted to show the generation of `Var` static fields in the function classes. When compiled with direct linking turned on,
343+
344+
```C#
345+
public class compiler$f1 : AFunction
346+
{
347+
// a static field for #'clojure.core/str
348+
protected internal static Var const__0;
349+
350+
static compiler$f1()
351+
{
352+
const__0 = RT.var("clojure.core", "str");
353+
}
354+
355+
// Lookup of Var value required
356+
public static object invokeStatic(object P_0)
357+
{
358+
return ((IFn)const__0.getRawRoot()).invoke(P_0);
359+
}
360+
361+
// ...
362+
}
363+
364+
```
365+
366+
becomes
367+
368+
```C#
369+
public class compiler$f1 : AFunction
370+
{
371+
// no static fields needed
372+
static compiler$f1()
373+
{
374+
}
375+
376+
public static object invokeStatic(object P_0)
377+
{
378+
return core$str.invokeStatic(P_0);
379+
}
380+
381+
// ...
382+
}
383+
```
384+
385+
Removing the `Var` value lookup measurably improves performance.
386+
387+
That's enough.

0 commit comments

Comments
 (0)