-
Notifications
You must be signed in to change notification settings - Fork 30
Add call frame information (CFI) annotations #214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is the start of a transformation that will annotate the code properly with call frame information (CFI) directives, a lack in the current s2n-bignum codebase noted in awslabs#197 So far, this picks the principal CFI-relevant instructions such as calls and returns and those that modify the stack pointer, and simply wraps the existing instructions inside macros that can then later be augmented with CFI directives. The definitions of the macros in "include/_internal_s2n_bignum.h" show those instructions that are wrapped, for ARM and/or x86: both CFI_START <marker for entry point> both CFI_RET ret ARM CFI_BL(target) bl target ARM CFI_PUSH2(lo,hi) stp lo, hi, [sp, #-16]! ARM CFI_POP2(lo,hi) ldp lo, hi, [sp], awslabs#16 ARM CFI_INC_SP(offset) add sp, sp, #(offset) ARM CFI_DEC_SP(offset) sub sp, sp, #(offset) x86 CFI_CALL(target) call target x86 CFI_PUSH(reg) push reg x86 CFI_POP(reg) pop reg x86 CFI_INC_RSP(offset) add rsp, offset x86 CFI_DEC_RSP(offset) sub rsp, offset
The original script was a bit too stupid and changed these too.
For a relatively small number of functions, the Windows ABI form literally calls the standard ABI form as a subroutine. For these we add CFI_START to the start of the standard subroutine as well, within the scope of the Windows ABI "#if ... #endif" block.
This adds directives for where the callee-saved registers are directly stored onto the stack at an offset without modifying the stack pointer. Not very many wrappers are written in this style, but a few are.
Also fix up functions with multiple exits so we only wrap the final ret as a CFI_RET.
This mostly seems to work as expected, but the combination of saving and restoring registers in the Windows ABI form and having multiple exit points seemed to make x86/generic/bignum_add.S need a manual reset of the CFA offset after the first return.
This fixes an issue assembling the CFI-ful files with recent Mac OS assemblers, which objected to labels inside the CFI blocks unless explicitly stated to be local. The error messages are not very clear, but this seems to be the issue that is here attributed to an LLVM17 change: https://gitlab.winehq.org/wine/wine/-/merge_requests/4547
This involves trifurcating the "_internal_s2n_bignum.h" header into three different versions with _arm, _x86 and _x86_att suffixes. This is useful in the _x86 and _x86_att cases to distinguish the definitions of the CFI adjustment macros, and for _arm to eliminate irrelevant CET tests.
This got missed earlier because of the space before it and then not caught because it was only relevant for the Windows ABI.
These files were just forgotten earlier. Although this code is not part of the final build, it is used as a stepping-stone in the proofs.
This is done via a macro S2N_BN_FUNCTION_TYPE_DIRECTIVE so that it can be disabled on Apple, since the native Mach-O format does not seem to use any analogous directive. Otherwise this unpacks to the usual directive of the form (example for "bignum_mul") .type bignum_mul, %function
This seems to matter for some versions of objdump, which can affect the ctCheck test.
Have you tried using |
Thanks,
That happens on many (all?) functions. A further 28 have errors like this (presumably they are those with embedded function calls).
I'm still not quite sure about whether I'm using the appropriate options nor how to interpret the output, but one encouraging difference is that the s2n-bignum mainline functions (not this CFI-enabled branch) generate zillions of errors with pretty much all instructions considered unreachable. So the CFI annotations have made a difference of some sort :-) |
Btw, this change is currently not workable in AWS-LC's FIPS build mode. The delocator is choking. As a quick sniff test. I took the following files, copied them into AWS-LC:
I then ran a typical FIPS build:
I don't think it will be terribly hard to support, but it needs to be added. The PEG parser already contains some directive argument support:
|
I wonder whether 'canonicalizing' an .S file by (1) compiling it into .o using an assembler and (2) disassembling it through |
|
Look at this fragment of BoringSSL assembly code:
It looks like your assembly is missing the Probably you want to create a macro similar to the Linux kernel's |
You can see how somebody fixed it here, in the kernel: https://lore.kernel.org/linuxppc-dev/[email protected]/. Basically they used the |
Maybe(?) But then you assume |
Thanks for the interpretation of the I've not been successful in finding an |
@@ -44,6 +47,7 @@ | |||
|
|||
S2N_BN_SYMBOL(bignum_cmul_p25519): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused here. Is the idea that bignum_cmul_p25519
and bignum_cmul_p25519_alt
are the same function, i.e. one is an alias for the other? I wonder if this pattern is allowed by all ABIs and/or if it might be the source of some of the issues here. In OpenSSL when they have such aliases, the alias is implemented as a whole separate function that jmp
s to the function it is aliasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were initially trying to support a uniform naming scheme for both ARM and x86. The bifurcation into "alt" and "non-alt" arises when for at least one architecture there is value in different variants with different microarchitectural targets. Cases like this where the same function has both names reflect those where on the other architecture (only) there is value in two versions, but on this architecture it didn't seem worth it. Currently, consumers like AWS-LC handle the dispatch according to uarch. Now, we no longer quite maintain the uniformity of this naming scheme since there are a few extra functions for ARM, so perhaps we should revisit this.
// Variants of instructions including CFI (call frame information) annotations | ||
|
||
#define CFI_START .cfi_startproc | ||
#define CFI_RET retq ; .cfi_endproc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why CFI_RET is used instead of just keeping normal ret
and then defining a CFI_END
or similar?
Why combine ret
and .cfi_endproc
with a macro but not .cfi_startproc
and _CET_ENDBR
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was attracted to wrapping things up into a unified single pseudo-instruction on the grounds that it perhaps makes it less likely that code and annotations will get out of step. I agree the current approach is not quite consistent in how it does it, though; as for _CET_ENDBR
, that arose in a different PR and only applies to x86, so I'd not been thinking of it in combination, but indeed it could make sense to unify it as an "all the initial boilerplate for a function" pseudo-instruction. One slight difference is that I'm also using .cfi_startproc
at internal subroutines, but those don't have the ENDBR
decorations since they are only the target of internal direct calls. But I may have to think again about what CFI annotations are appropriate for internal subroutines anyway since that was one of the things that objtool
was complaining about.
push r12 | ||
push r13 | ||
push r14 | ||
push r15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed much of the SM2 code uses [Edit: nevermind.]rbp
as a GP register whereas almost none of the other code seems to? I didn't look at the SM2 code but I suspect that wherever RBP is changed, more work needs to be done.
|
||
subq $NSPACE, %rsp | ||
CFI_DEC_RSP(NSPACE) | ||
|
||
movq %rdx, %rbp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is one of the few places outside of SM2-specific code where RBP is changed, AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think there are many more RBP
changes, since very often it's hidden under macros. If you look at the objdump
output and MAYCHANGE
fields in the proof, you can see them all. Try for example:
grep define x86/*/*.S | grep rbp$
In particular, it's often used as a zero for ADCX
/ADOX
which do not allow immediate arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I think I overlooked the other uses because it is referenced as ebp
elsewhere, presumably as an optimization (e.g. xor ebp, ebp
).
In any case, I seems like I was misremembering, since CFI will use rsp
as the default base instead of rbp
like I thought, so AFAICT we don't have to pay any special attention to rbp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes indeed, there is also the ebp
optimization. This codebase started with NASM, which makes a few optimizations such as xor ebp, ebp
for you automatically if you write rbp
everywhere. But we made it explicit when switching to the GNU assembler (or its clang counterpart).
#define CFI_START .cfi_startproc | ||
#define CFI_RET ret ; .cfi_endproc | ||
|
||
#define CFI_BL(target) bl target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these function call wrapper macros? it seems like they don't do anything interesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's true - I must have thought there would be some annotation to add later. I am still wondering if I might need something to pacify objtool
, as suggested in its documentation, but perhaps the real issue is that the subroutine itself isn't properly annotated or is not considered as a distinct function from a CFI point of view (as was, provisionally at least, intended). I'll have to look more at the pointers you sent above.
file.o: warning: unannotated intra-function call
This warning means that a direct call is done to a destination which
is not at the beginning of a function. If this is a legit call, you
can remove this warning by putting the ANNOTATE_INTRA_FUNCTION_CALL
directive right before the call.
fixing a few automerge failures for the combination of CFI updates and the __LF change to instruction separators.
This adopts the change awslabs#230 but for the newly bifurcated structure of the include file(s).
Issue #, if available: #197
Description of changes:
This includes call frame information annotations in all the code, roughly following the explanation here: https://www.imperialviolet.org/2017/01/18/cfi.html
The change is made for both ARM and x86, largely by wrapping certain relevant instructions in macros that augment the core instruction with appropriate CFI directives, e.g. for ARM
#define CFI_PUSH2(lo,hi) stp lo, hi, [sp, #-16]! ; .cfi_adjust_cfa_offset 16 ; .cfi_rel_offset lo, 0 ; .cfi_rel_offset hi, 8
and for x86
#define CFI_PUSH(reg) push reg ; .cfi_adjust_cfa_offset 8 ; .cfi_rel_offset reg, 0
As part of this, two other systematic changes are made:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.