Skip to content

Add call frame information (CFI) annotations #214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from
Draft

Conversation

jargh
Copy link
Contributor

@jargh jargh commented Mar 26, 2025

Issue #, if available: #197

Description of changes:

This includes call frame information annotations in all the code, roughly following the explanation here: https://www.imperialviolet.org/2017/01/18/cfi.html

The change is made for both ARM and x86, largely by wrapping certain relevant instructions in macros that augment the core instruction with appropriate CFI directives, e.g. for ARM

#define CFI_PUSH2(lo,hi) stp lo, hi, [sp, #-16]! ; .cfi_adjust_cfa_offset 16 ; .cfi_rel_offset lo, 0 ; .cfi_rel_offset hi, 8

and for x86

#define CFI_PUSH(reg) push reg ; .cfi_adjust_cfa_offset 8 ; .cfi_rel_offset reg, 0

As part of this, two other systematic changes are made:

  • All local labels are prefixed with L to make their local-ness explicit
  • The internal header is split into separate ones for arm, x86 and x86_att

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jargh and others added 12 commits March 21, 2025 22:41
This is the start of a transformation that will annotate the code
properly with call frame information (CFI) directives, a lack in
the current s2n-bignum codebase noted in

  awslabs#197

So far, this picks the principal CFI-relevant instructions such
as calls and returns and those that modify the stack pointer,
and simply wraps the existing instructions inside macros that
can then later be augmented with CFI directives. The definitions
of the macros in "include/_internal_s2n_bignum.h" show those
instructions that are wrapped, for ARM and/or x86:

    both    CFI_START            <marker for entry point>
    both    CFI_RET              ret

    ARM     CFI_BL(target)       bl  target
    ARM     CFI_PUSH2(lo,hi)     stp  lo, hi, [sp, #-16]!
    ARM     CFI_POP2(lo,hi)      ldp  lo, hi, [sp], awslabs#16
    ARM     CFI_INC_SP(offset)   add  sp, sp, #(offset)
    ARM     CFI_DEC_SP(offset)   sub  sp, sp, #(offset)

    x86     CFI_CALL(target)     call target
    x86     CFI_PUSH(reg)        push reg
    x86     CFI_POP(reg)         pop  reg
    x86     CFI_INC_RSP(offset)  add  rsp, offset
    x86     CFI_DEC_RSP(offset)  sub  rsp, offset
The original script was a bit too stupid and changed these too.
For a relatively small number of functions, the Windows ABI form
literally calls the standard ABI form as a subroutine. For these
we add CFI_START to the start of the standard subroutine as well,
within the scope of the Windows ABI "#if ... #endif" block.
This adds directives for where the callee-saved registers are
directly stored onto the stack at an offset without modifying
the stack pointer. Not very many wrappers are written in this
style, but a few are.
Also fix up functions with multiple exits so we only wrap the
final ret as a CFI_RET.
This mostly seems to work as expected, but the combination of saving
and restoring registers in the Windows ABI form and having multiple
exit points seemed to make x86/generic/bignum_add.S need a manual
reset of the CFA offset after the first return.
This fixes an issue assembling the CFI-ful files with recent Mac OS
assemblers, which objected to labels inside the CFI blocks unless
explicitly stated to be local. The error messages are not very clear,
but this seems to be the issue that is here attributed to an LLVM17
change:

  https://gitlab.winehq.org/wine/wine/-/merge_requests/4547
This involves trifurcating the "_internal_s2n_bignum.h" header
into three different versions with _arm, _x86 and _x86_att suffixes.
This is useful in the _x86 and _x86_att cases to distinguish the
definitions of the CFI adjustment macros, and for _arm to eliminate
irrelevant CET tests.
This got missed earlier because of the space before it and then
not caught because it was only relevant for the Windows ABI.
jargh added 3 commits March 26, 2025 15:39
These files were just forgotten earlier. Although this code is
not part of the final build, it is used as a stepping-stone in
the proofs.
This is done via a macro S2N_BN_FUNCTION_TYPE_DIRECTIVE so that
it can be disabled on Apple, since the native Mach-O format does
not seem to use any analogous directive. Otherwise this unpacks
to the usual directive of the form (example for "bignum_mul")

  .type bignum_mul, %function
This seems to matter for some versions of objdump, which can affect
the ctCheck test.
@briansmith
Copy link

Have you tried using objtool to verify the annotations? See https://github.com/torvalds/linux/blob/3a90a72aca0a98125f0c7350ffb7cc63665f8047/tools/objtool/Documentation/objtool.txt for documentation on it. My understanding from reading the LKML is that it was expressly designed to support non-Kernel use cases too.

@jargh
Copy link
Contributor Author

jargh commented Mar 29, 2025

Thanks, objtool is very interesting, and I didn't know about it at all. I've built it and started to play around with it on this CFI-enabled s2n-bignum code. If I try running objtool -s x86/generic/bignum_mul.o I get:

x86/generic/bignum_mul.o: warning: objtool: bignum_mul() is missing an ELF size annotation

That happens on many (all?) functions. A further 28 have errors like this (presumably they are those with embedded function calls).

x86/curve25519/bignum_sqrt_p25519.o: warning: objtool: .text+0x89: unannotated intra-function call

I'm still not quite sure about whether I'm using the appropriate options nor how to interpret the output, but one encouraging difference is that the s2n-bignum mainline functions (not this CFI-enabled branch) generate zillions of errors with pretty much all instructions considered unreachable. So the CFI annotations have made a difference of some sort :-)

@torben-hansen
Copy link
Contributor

Btw, this change is currently not workable in AWS-LC's FIPS build mode. The delocator is choking.

As a quick sniff test. I took the following files, copied them into AWS-LC:

  • x86_att/curve25519/edwards25519_scalarmulbase.S, and
  • include/_internal_s2n_bignum_x86_att.h.

I then ran a typical FIPS build:

$ cmake3 -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$(pwd)/../awslc-install -DBUILD_SHARED_LIBS=OFF -DFIPS=1 ..
$ ninja-build
[...]
[2/242] Generating bcm-delocated.S                                                                                           
FAILED: crypto/fipsmodule/bcm-delocated.S /home/ec2-user/humble/aws-lc/awslc-build/crypto/fipsmodule/bcm-delocated.S         
[...]                                                              
error while parsing "curve25519/edwards25519_scalarmulbase.S.S": 
parse error near WS (line 47 symbol 24 - line 47 symbol 25):
"  .cfi_adjust_cfa_offset ((15*32 +8))\n\n\n\n    "

I don't think it will be terribly hard to support, but it needs to be added. The PEG parser already contains some directive argument support:

Directive <- '.' DirectiveName (WS Args)?
DirectiveName <- [[A-Z0-9_]]+
[...[
Args <- Arg ((WS? ',' WS?) Arg)*
Arg <- QuotedArg / [[0-9a-z%+\-*_@.]]*

@aqjune-aws
Copy link
Collaborator

I wonder whether 'canonicalizing' an .S file by (1) compiling it into .o using an assembler and (2) disassembling it through objdump and (3) using the disassembled file is supposed to pass the delocator.
I suppose delocator is failing when one is using a kind of exotic syntax; if delocator is supposed to accept syntax printed from objdump, we can reduce the complexity of this problem.

@briansmith
Copy link

Btw, this change is currently not workable in AWS-LC's FIPS build mode.

See also https://issues.chromium.org/issues/42290158.

@briansmith
Copy link

x86/generic/bignum_mul.o: warning: objtool: bignum_mul() is missing an ELF size annotation

Look at this fragment of BoringSSL assembly code:

.globl	Whatever
.type	Whatever,\@abi-omnipotent
.align	16
Whatever:
.cfi_startproc
	_CET_ENDBR
	ret
.cfi_endproc
.size	Whatever, .-Whatever

It looks like your assembly is missing the .size line. . means "the current address" so ".-Whatever" means "the difference between the current address and the address of the Whatever label," i.e. the size of the function. OpenSSL had these .size directives even before it added CFI.

Probably you want to create a macro similar to the Linux kernel's SYM_FUNC_END(name) macro that is used at the end of every function, which would do this .size directive and other boilerplate.

@briansmith
Copy link

briansmith commented Mar 31, 2025

x86/curve25519/bignum_sqrt_p25519.o: warning: objtool: .text+0x89: unannotated intra-function call

You can see how somebody fixed it here, in the kernel: https://lore.kernel.org/linuxppc-dev/[email protected]/. Basically they used the SYM_FUNC_START_LOCAL(name) macro at the start of the function and SYM_FUNC_END(name) at the end. Those two macros expand to some CFI-related metadata.

@torben-hansen
Copy link
Contributor

I wonder whether 'canonicalizing' an .S file by (1) compiling it into .o using an assembler and (2) disassembling it through objdump and (3) using the disassembled file is supposed to pass the delocator. I suppose delocator is failing when one is using a kind of exotic syntax; if delocator is supposed to accept syntax printed from objdump, we can reduce the complexity of this problem.

Maybe(?) But then you assume objdump is available. I don't think this is too hard to fix. Probably just have to extend PEG to understand the argument.

@jargh
Copy link
Contributor Author

jargh commented Apr 1, 2025

Thanks for the interpretation of the objtool errors and the advice on fixing them; I'll give that a try.

I've not been successful in finding an objtool that works for aarch64 - any ideas there? I did find this but it seems to be work in progress:https://lwn.net/Articles/893115/. if nothing else I can use the errors that pop up on x86 as guidance for analogous things on ARM.

@@ -44,6 +47,7 @@

S2N_BN_SYMBOL(bignum_cmul_p25519):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here. Is the idea that bignum_cmul_p25519 and bignum_cmul_p25519_alt are the same function, i.e. one is an alias for the other? I wonder if this pattern is allowed by all ABIs and/or if it might be the source of some of the issues here. In OpenSSL when they have such aliases, the alias is implemented as a whole separate function that jmps to the function it is aliasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were initially trying to support a uniform naming scheme for both ARM and x86. The bifurcation into "alt" and "non-alt" arises when for at least one architecture there is value in different variants with different microarchitectural targets. Cases like this where the same function has both names reflect those where on the other architecture (only) there is value in two versions, but on this architecture it didn't seem worth it. Currently, consumers like AWS-LC handle the dispatch according to uarch. Now, we no longer quite maintain the uniformity of this naming scheme since there are a few extra functions for ARM, so perhaps we should revisit this.

// Variants of instructions including CFI (call frame information) annotations

#define CFI_START .cfi_startproc
#define CFI_RET retq ; .cfi_endproc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why CFI_RET is used instead of just keeping normal ret and then defining a CFI_END or similar?

Why combine ret and .cfi_endproc with a macro but not .cfi_startproc and _CET_ENDBR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was attracted to wrapping things up into a unified single pseudo-instruction on the grounds that it perhaps makes it less likely that code and annotations will get out of step. I agree the current approach is not quite consistent in how it does it, though; as for _CET_ENDBR, that arose in a different PR and only applies to x86, so I'd not been thinking of it in combination, but indeed it could make sense to unify it as an "all the initial boilerplate for a function" pseudo-instruction. One slight difference is that I'm also using .cfi_startproc at internal subroutines, but those don't have the ENDBR decorations since they are only the target of internal direct calls. But I may have to think again about what CFI annotations are appropriate for internal subroutines anyway since that was one of the things that objtool was complaining about.

push r12
push r13
push r14
push r15
Copy link

@briansmith briansmith Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed much of the SM2 code uses rbp as a GP register whereas almost none of the other code seems to? I didn't look at the SM2 code but I suspect that wherever RBP is changed, more work needs to be done. [Edit: nevermind.]


subq $NSPACE, %rsp
CFI_DEC_RSP(NSPACE)

movq %rdx, %rbp

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one of the few places outside of SM2-specific code where RBP is changed, AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think there are many more RBP changes, since very often it's hidden under macros. If you look at the objdump output and MAYCHANGE fields in the proof, you can see them all. Try for example:
grep define x86/*/*.S | grep rbp$
In particular, it's often used as a zero for ADCX/ADOX which do not allow immediate arguments.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I think I overlooked the other uses because it is referenced as ebp elsewhere, presumably as an optimization (e.g. xor ebp, ebp).

In any case, I seems like I was misremembering, since CFI will use rsp as the default base instead of rbp like I thought, so AFAICT we don't have to pay any special attention to rbp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes indeed, there is also the ebp optimization. This codebase started with NASM, which makes a few optimizations such as xor ebp, ebp for you automatically if you write rbp everywhere. But we made it explicit when switching to the GNU assembler (or its clang counterpart).

#define CFI_START .cfi_startproc
#define CFI_RET ret ; .cfi_endproc

#define CFI_BL(target) bl target

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these function call wrapper macros? it seems like they don't do anything interesting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's true - I must have thought there would be some annotation to add later. I am still wondering if I might need something to pacify objtool, as suggested in its documentation, but perhaps the real issue is that the subroutine itself isn't properly annotated or is not considered as a distinct function from a CFI point of view (as was, provisionally at least, intended). I'll have to look more at the pointers you sent above.

  1. file.o: warning: unannotated intra-function call

    This warning means that a direct call is done to a destination which
    is not at the beginning of a function. If this is a legit call, you
    can remove this warning by putting the ANNOTATE_INTRA_FUNCTION_CALL
    directive right before the call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants