Add Comments HERE
The primary goal of this document is to specify the Itanium C++ ABI for the contract entrypoint function. By fully specifying the ABI, we ensure interoperability between different compilers (GCC and Clang) and standard libraries (libc++ and libstdc++).
The contract entrypoint function has the following responsibilities:
- Unpack the compiler-generated contract violation data and use it to construct the
std::contract_violation
object. - Select and call the user-provided contract violation handler, if one is provided, or the default handler otherwise;
passing the
std::contract_violation
object to the handler. - If the contract violation has an enforced semantic, the entrypoint function must terminate the program.
The ABI proposed in this document is designed to be:
Future changes cannot break existing code
The ABI cannot preclude future extensions.
This allows users control over object size increases generated by contracts.
This section briefly describes the motivations and concerns that this design addresses. It assumes most readers of this document have heard Eric and Josh drone on about these ad nauseum.
The presence of contracts should:
- have minimal effect on code generation for the surrounding code.
- produce minimal code size overhead to represent the contract violation data.
When (2) (and to a lesser extent (1)) cannot be achieved to a user's satisfaction, we must provide a means of recourse that doesn't require the user to disable contracts entirely.
libc++ and libstdc++ must support contracts generated by both GCC and Clang. The compiler must generate the same calls to the runtime entrypoint function, regardless of which runtime it's targeting (and often it doesn't know, MSVC notwithstanding).
Therefore, this specification aims to define a portable ABI for the entrypoint function, which will be used by both compilers.
TODO Maybe?
- Currently needed data
std::source_location
- Source text
- Assertion kind (pre/post/contract_assert)
- Evaluation semantic (enforced/observed)
- Failure kind (exception thrown / assertion failed)
Data | Data Type | Static/Dynamic | Description |
---|---|---|---|
Source location | std::source_location |
Static | Location of the contract in the source code. |
Source text | const char* |
Static | The source text of the contract assertion. |
Assertion kind | std::assertion_kind |
Static | pre/post/contract_assert (may be parsed from source string?) |
Evaluation semantic | std::evaluation_semantic |
Static or Dynamic | In future, may be a runtime property, must support both modes |
Detection mode | std::detection_mode |
Static or Dynamic | Known at code generation time, but storing in static storage requires duplicating the data. |
Note: the above table describes the data which a std::contract_violation
object must
provide to the user. The datatypes used in the table are the standard library types, and not
those used in the Itanium C++ ABI -- which are specified in the "Descriptor Table" section below.
- Future needed data (likely)
- Custom labels to identify or group the contract
- Custom violation handler for the contract
- Custom source text (in addition to the source text, or as a replacement for)
Example:
#define CONTRACT(assertion, message) \ contract_assert [[clang::assertion_message(message)]] (assertion) void f(int x) { CONTRACT(x > 0, "x must be positive"); }
The compiler must generate a call to the entrypoint function without seeing it, or even knowing which runtime it will eventually be linked against.
Because we will be unable to add and deploy additional entrypoints quickly in the future, we must ensure the entrypoint function(s) are generic enough to be "future-proof" and flexible enough to accommodate various representations of the contract violation data.
-
The more arguments the compiler is required to pass to the entrypoint, the worse the code generation becomes at the contract violation site.
-
The fewer arguments the entrypoint function takes, the more data must either be (1) stored in read-only storage, or (2) materialized at runtime on the stack.
Neither of these options is ideal for our goals.
Instead, we propose a compromise:
The runtime shall provide:
- a single generic entrypoint function, sufficient to provide an extensible conforming implementation alone, at the cost of efficiency.
- A set of "wrapper overloads" which differently encode the contract violation data in the function name, allowing the compiler to generate more efficient code for the most common cases.
A compiler may choose to call the generic entrypoint directly, use a provided wrapper overload, or to emit its own wrapper overloads as appropriate (in the future, as new overloads become needed, Clang will likely need to emit its own definitions until it's certain that all supported runtime libraries provide them)
This document proposes a generic signature for the entrypoint function.
extern "C"
void __handle_contract_violation(
// A descriptor and its matching data,
// for data which can be stored in the data segment of the binary.
descriptor_t *static_descriptor,
void *static_data,
// predicate_false/evaluation_exception
// Special case because it's always needed.
detection_mode_t mode,
// evaluation_semantic. Currently, implementations only support compile-time
// evaluation semantics, but this may change in the future.
evaluation_semantic_t semantic,
// Dynamic data, which is only known at runtime. Because the data is dynamic,
// it doesn't make sense to emit the descriptor statically, so instead
// a descriptor is attached to each piece of dynamic data inline.
//
runtime_data_t *dynamic_data,
void * oh_fudge_the_future_is_weird_and_we_didnt_see_it_coming_so_we_provide_this_additional_escape_hatch?
)
Implementations should provide "wrapper functions" for the generic handler. The wrapper functions with the following signatures, which call the generic entrypoint function with the appropriate arguments.
The table below describes the manual mangling of the entrypoint names, the data types and values are mangled into the function names using the following mangling abbreviation scheme. All function signatures accept a static descriptor and static data, which are not encoded in the name.
Data Type | Value | Mangling | Order | Optional |
---|---|---|---|---|
std::detection_mode |
m |
0 | N | |
std::detection_mode |
predicate_false |
pf |
0 | . |
std::detection_mode |
evaluation_exception |
pe |
0 | . |
std::evaluation_semantic |
s |
1 | Y (may also be passed as static data) | |
std::evaluation_semantic |
observed |
so |
1 | . |
std::evaluation_semantic |
enforced |
se |
1 | . |
'runtime_data_t' | r |
2 | N |
(1) If the signature contains an argument of a particular type, the single letter encoding is appended to the function name, in the order specified in the table above. (2) If the signature encodes the value of a particular type in the name, the multi letter encoding is appended to the function name, in the order specified in the table above. No argument of that type is passed to the function. (3) If the signature does not encode a particular type, no encoding is appended to the function name. Instead, the function will use the default value for that type when invoking the generic entrypoint function.
The value of std::detection_mode
and std::evaluation_semantic
must be passed or encoded in all signatures.
The runtime_data_t
parameter may be omitted, and the function will forward it as nullptr
to the generic entrypoint function.
If the signature specifies a fixed value of enforced
for std::evaluation_semantic
, that function shall be marked as [[noreturn]]
.
It is the responsibility of the generic entrypoint function to ensure the control flow does not return.
The initial runtime implementation must provide the following overloads of the entrypoint function, in addition to the generic entrypoint function.
// predicate_false, evaluation_semantic::enforced
extern "C" {
[[noreturn]]
void __handle_contract_violation_pf_se(
descriptor_t *static_descriptor,
void *static_data,
);
// predicate_false, evaluation_semantic::observed
void __handle_contract_violation_pf_so(
descriptor_t *static_descriptor,
void *static_data,
);
// evaluation_exception, evaluation_semantic::enforced
[[noreturn]]
void __handle_contract_violation_pe_se(
descriptor_t *static_descriptor,
void *static_data,
);
// evaluation_exception, evaluation_semantic::observed
void __handle_contract_violation_pe_so(
descriptor_t *static_descriptor,
void *static_data,
);
// predicate_false, evaluation_semantic::enforced, runtime data
[[noreturn]]
void __handle_contract_violation_pf_se_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
// predicate_false, evaluation_semantic::observed, runtime data
void __handle_contract_violation_pf_so_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
// evaluation_exception, evaluation_semantic::enforced
[[noreturn]]
void __handle_contract_violation_pe_se_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
// evaluation_exception, evaluation_semantic::observed
void __handle_contract_violation_pe_so_r(
descriptor_t *static_descriptor,
void *static_data,
runtime_data_t *runtime_data
);
} // extern "C"
The current proposal only mandates additional signatures which encode the
std::evaluation_semantic
or std::detection_mode
in the name, but proposes a mangling
to allow the passing of these values as arguments.
The intention is to accommodate future extensions of the C++ standard, which may add new values to these enumerations.
The quality of the generated code depends on the presence of the wrapper overloads to generate efficient code for each contract. If new features are added to the C++ standard which require additional function signatures, the compiler may not know if the runtime supports the new overloads.
In this case, the compiler should either:
- Use the generic entrypoint function, or
- Emit a weak or internal definition for the new overloads itself (these should be easy to emit, since they just rearrange the arguments).
We believe the ability for the compiler to emit its own definitions is critical for the success of this design, as it allows both efficient code generation and future extensibility.
When a program mixes code compiled with and without exceptions, bad things can happen. Yet, we should consider supporting this use case. As such, we may need to additionally encode whether the contract violation occurred in a context where exceptions are not enabled.
This would allow the runtime to prevent exceptions thrown by the user's violation handler from propagating to the caller, which isn't compiled to handle exceptions.
If we decide to support this, the additional mangling would be as follows:
Additionally, we may need the following additional manglings to support code compiled with
-fno-exceptions
, which will not safely tolerate an exception being thrown from the user provided violation handler.
Case | Value | Mangling | Order | Optional |
---|---|---|---|---|
Exceptions Enabled | False | n |
3 | Y |
The encoding of the function signature in the name is done as described above.
The C++ standard doesn't specify the exact size or layout of the data types used in the contract violation object. However, the Itanium C++ ABI must specify the exact size and layout for these types, and for enumerators, the exact values as well.
This document specifies the "itanium representation" of the standard library types used in the contract violation object, which are used when passing these types to the entrypoint function.
The types in question, and their corresponding "itanium representation" are:
Standard Type | Itanium Representation | Underlying Type in Itanium |
---|---|---|
std::source_location |
source_location_ptr_t | See Below |
std::assertion_kind |
assertion_kind_t | uint8_t |
std::evaluation_semantic |
evaluation_semantic_t | uint8_t |
std::detection_mode |
detection_mode_t | uint8_t |
With the exception of std::source_location
, this document places no requirements on the types or
values of the standard library types. It instead specifies a corresponding "itanium representation"
which should be used when passing these types to the entrypoint function.
std::source_location
contains a single pointer to its data, which itself has the
following layout for both libc++ and libstdc++.
struct _SourceLoc {
const char* file_name;
const char* function_name;
unsigned line;
unsigned column;
};
This section specifies the values to use when passing an enumerator to the entrypoint function. In addition to the standard library enumerators, this section also specifies the values for unspecified/uninitialized enumerators (which may or may not be useful in practice).
Enumerator Value | Itanium Representation |
---|---|
Not specified | 0x00 |
std::assertion_kind::pre |
0x01 |
std::assertion_kind::post |
0x02 |
std::assertion_kind::contract_assert |
0x03 |
Enumerator Value | Itanium Representation |
---|---|
Not specified | 0x00 |
std::evaluation_semantic::enforced |
0x01 |
std::evaluation_semantic::observed |
0x02 |
Enumerator Value | Itanium Representation |
---|---|
Not specified | 0x00 |
std::detection_mode::predicate_false |
0x01 |
std::detection_mode::evaluation_exception |
0x02 |
The static data descriptor is a fully-specified structure which can fully specify and identify the data pointed to by the static data argument.
The goals for the static data descriptor are:
- Easily identify the layout of the "standard" or "required data" in an efficient manner.
- Allow for future extensions to the static data descriptor, without breaking existing code.
- Allow size/security-concerned users to strip bits of the static data descriptor they don't need.
The static data descriptor is specified in two parts:
- The descriptor table
// We may need this, we may not. It aims to support vendor-specific extensions in a
// way that doesn't interfere with other vendors and their extensions.
//
// Suggestion: New vendors should hash the name of their runtime dylib and use the hash (truncated to 4 bits) as the vendor ID.
enum vendor_it_t : uint8_t {
VENDOR_GENERIC = 0x00, // Generic, no vendor-specific data.
VENDOR_CLANG = 0x01, // Clang
VENDOR_GCC = 0x02, // GCC
VENDOR_MSVC = 0x03, // MSVC
// Future vendors can be added here.
};
struct descriptor_table_t {
uint8_t version : 4; // in case we need it.
uint8_t vendor_id : 4; // The vendor ID
// The version info in the first 4 bits, and the vendor ID in the last 4 bits.
// The number of entries in the descriptor.
uint8_t num_entries;
// The entries in the descriptor.
// Each entry describes a single piece of data in the static data.
base_descriptor_entry_t *entries[];
};
enum descriptor_entry_kind_t : uint8_t {
// unknown/reserved = 0x00, // Unknown or reserved type.
// default summary representation, containing source location, source text, and assertion kind (see below)
summary = 0x01, // A summary descriptor, which contains the source location, source text, and assertion kind.
// builtins
// a pointer to a _SourceLoc
source_location_ptr = 0x11,
// A _SourceLoc inline structure.
source_location_inline = 0x12,
// A pointer to a null-terminated string
source_text = 0x13,
// The kind of assertion, such as pre/post/contract_assert.
assertion_kind = 0x14,
// reserved = 0x21, // Reserved for future use.
// reserved = 0x22, // Reserved for future use.
// reserved = 0x2F, // Reserved for future use.
extended = 0x30, // Extended descriptor entry, of type `extended_descriptor_entry_t`
vendor = 0x40, // Vendor-specific descriptor, of type `vendor_extended_descriptor_entry_t`
};
struct base_descriptor_entry_t {
// The type of the data.
// This is a vendor-specific type, which can be used to identify the data.
descriptor_entry_kind_t description_type;
// The offset of the data in the static data from the start of the static data.
uint16_t offset;
};
struct extended_descriptor_entry_t : base_descriptor_entry_t {
// The type of the data.
// This is a standard type, such as `std::source_location`, `std::assertion_kind`, etc.
// The type is specified by the `descriptor_entry_kind_t` enumeration.
// The size of the data, in bytes.
uint16_t size;
const char* data_type; // Or some other representation of the type
// The name of the data, which can be used to identify the data.
const char *name; // The name of the data, which can be used to identify the data.
};
struct vendor_extended_descriptor_entry_t : base_descriptor_entry_t {
// The vendor ID information is present in the first 4 bits of the `version_and_vendor_info` field.
// Whatever the vendor wants to put here.
};
The extended and vendor specific descriptor tables are not required for the initial implementation, but they are provided to allow for future extensibility and vendor-specific extensions (or at least to provide an idea of how to do it).
Further, this document proposes a default layout for the needed static data, which can be used to identify the entirety of the data in a single descriptor entry.
One possible layout is as follows:
Type | Offset in Static Data | Size in Bytes |
---|---|---|
_SourceLoc pointer |
0 | sizeof(void*) |
const char* (source text) |
sizeof(void*) | sizeof(void*) |
std::assertion_kind |
sizeof(void*) * 2 | sizeof(uint8_t) |
Implementations could omit the source location or source text by providing a null pointer, or by using a more complex descriptor table representation.
This would describe the same data layout as the more-detailed descriptor table below, but in a more compact form.
base_descriptor_entry_t source_location_ptr_entry = {
.description_type = descriptor_entry_kind_t::source_location_ptr,
.offset = 0,
};
base_descriptor_entry_t source_text_entry = {
.description_type = descriptor_entry_kind_t::source_text,
.offset = sizeof(void*),
};
base_descriptor_entry_t assertion_kind_entry = {
.description_type = descriptor_entry_kind_t::assertion_kind,
.offset = sizeof(void*) * 2,
};
descriptor_table_t default_descriptor = {
.version = 1,
.vendor_id = vendor_it_t::VENDOR_FOO,
.num_entries = 3,
.entries = {
&source_location_ptr_entry,
&source_text_entry,
&assertion_kind_entry,
}
};
The runtime data and descriptor can be specified at a later date, as long as the generic entrypoint function is defined to accept them.