Skip to content

Conversation

TravisCardwell
Copy link
Collaborator

New type constructor QualNameAnon is added to the QualName type to distinguish anonymous declarations in binding specifications and select predicates. The string representation uses an @ prefix.

Example: @SC_c

This character should not cause issues with regular expressions.

This character is reserved for future use by YAML and must therefore be quoted.

Example:

cname: '@S1_c'

@TravisCardwell
Copy link
Collaborator Author

In vector.h, the vector typed is defined as follows:

typedef struct { ... } vector;

Currently, the following binding specification is generated:

types:
- headers: vector.h
  cname: '@vector'
  module: Vector
  identifier: Vector
  instances:
  - Eq
  - Show
  - Storable

When using this binding specification as an external binding specification to create bindings for vector_length.h, we run into an issue.

  • After the NameAnon pass, there is a typedef named vector for an anonymous struct named @vector.
  • In the ResolveBindingSpec pass, references to @vector are correctly replaced with external bindings.
  • In the Select pass, function len vector_length(vector* p); references vector, not @vector, since typedefs are not handled/collapsed until the following HandleTypedefs pass. Type vector is (correctly) not selected because it is not in the main file, resulting in a missing declaration error.

A pending task is to add alias information to binding specifications.

We already have a declAliases :: [C.Name] field in the AST DeclInfo. It is initialized to the empty list, and we currently set it in the NameAnon pass. Perhaps that is too early, since prescriptive binding specifications should be able to configure this (#308). Perhaps the aliases should be set in the HandleTypedefs pass?

When generating a binding specification, we need to output any aliases for each type. The above would become the following.

types:
- headers: vector.h
  cname: '@vector'
  aliases:
  - vector
  module: Vector
  identifier: Vector
  instances:
  - Eq
  - Show
  - Storable

When loading an external binding specification, we must make sure that there is not duplicate configuration for the same type. Configuration for ({vector.h}, @vector) and ({vector.h}, vector) may not both exist.

When resolving external bindings, any use of an alias is also replaced with an external binding. In this case, vector would have an external binding, and there would no longer be a missing declaration error.

Thoughts?

@TravisCardwell TravisCardwell marked this pull request as draft July 31, 2025 05:11
@TravisCardwell
Copy link
Collaborator Author

Perhaps the aliases should be set in the HandleTypedefs pass?

This does not work, because aliases must be known during the ResolveBindingSpec pass. I think it is indeed appropriate to initialize aliases in NameAnon, update them in ResolveBindingSpec, and execute them (remove alias declarations) in HandleTypedefs. Perhaps we need to move some of the logic earlier, though, because we must know what types are aliases in ResolveBindingSpec.

We need to decide about the binding specification syntax. How should we specify if a typedef should be a newtype or type declaration in Haskell?

I am thinking through one option below, but I am of course open to alternate ideas!


One option is to just use aliases, a list of CName. (Since all aliases are typedefs, the are always in the ordinary namespace and are never anonymous.) If aliases is specified, it must list every alias. We can create a type declaration for each alias that does not have the same Haskell name as the type being aliased. Perhaps we need to provide a way to specify a custom Haskell name for an alias. When aliases is specified, any typedef alias that is not in the list should result in a newtype declaration to create a distinct Haskell type. (HandleTypeDefs must not remove that declaration, even when there is only one use.)


typedef struct foo { ... } foo_t;

(1) User wants distinct types

types:
  - headers: acme.h
    cname: struct foo
    aliases: []
    module: Acme
    identifier: Foo
  - headers: acme.h
    cname: foo_t
    aliases: []
    module: Acme
    identifier: Foo_t

This is the default if foo_t is not the only use of struct foo. In cases where there is only one use, users can use this syntax in a prescriptive binding specification to force this behavior.

(2) User wants type Foo_t

types:
  - headers: acme.h
    cname: struct foo
    aliases: foo_t
    module: Acme
    identifier: Foo

(3) User wants type FooT (custom name)

types:
  - headers: acme.h
    cname: struct foo
    aliases:
      - cname: foo_t
        identifier: FooT
    module: Acme
    identifier: Foo

(4) User wants no declaration for the alias

types:
  - headers: acme.h
    cname: struct foo
    aliases:
      - cname: foo_t
        identifier: Foo
    module: Acme
    identifier: Foo

(5) User wants a single declaration using the alias name

types:
  - headers: acme.h
    cname: struct foo
    aliases: foo_t
    module: Acme
    identifier: Foo_t

This is the default if foo_t is the only use of struct foo.


typedef struct foo { ... } foo;
typedef struct boo { ... } Boo;

In cases where the default Haskell names conflict, and the typedef alias is not the only use, a prescriptive binding specification must resolve the conflict. The same basic patterns as above may be used.

(6) User wants distinct types

types:
  - headers: acme.h
    cname: struct foo
    aliases: []
    module: Acme
    identifier: Foo
  - headers: acme.h
    cname: foo
    aliases: []
    module: Acme
    identifier: FooT

typedef struct { ... } foo;

Anonymous types always have a single use.

(7) User wants just data Foo

types:
  - headers: acme.h
    cname: '@foo'
    aliases: foo
    module: Acme
    identifier: Foo

This is the default.

(8) User wants distinct types (for some reason)

types:
  - headers: acme.h
    cname: @foo
    aliases: []
    module: Acme
    identifier: FooStruct
  - headers: acme.h
    cname: foo
    aliases: []
    module: Acme
    identifier: Foo

(9) User wants data Foo to be opaque

types:
  - headers: acme.h
    cname: '@foo'
    aliases: foo
    module: Acme
    identifier: Foo
    opaque: True

Configuration of opaque types is a separate issue (#809), but we should consider the relationship between various options. In this example, the syntax configures a single Foo type that is opaque.

@edsko
Copy link
Collaborator

edsko commented Aug 8, 2025

For

typedef struct foo { ... } foo_t;
types:
  - headers: acme.h
    cname: foo_t
    module: Acme
    identifier: Foo_t
    representation: {newtype/synonym/transparent}
  • newtype: newtype Foo_t = Foo_t Foo
  • synonym: type Foo_t = Foo
  • transparent: any reference to Foo_t will become a reference to Foo ("squashed"). (In this case some fields, such as identifier and instances, must be omitted).

We'd also have an entry for foo (anonymous or not); in the anonymous case, it would look something like

types:
  - headers: acme.h
    cname: '@foo'
    module: Acme
    identifier: Foo # can be omitted (then we use the default naming)

The user can now pick which identifier we use; without a prescriptive binding specification, we would use Foo_t when squashing the typedef; but with the prescriptive binding spec, the user would choose.

(Perhaps representation applies in other situations too; for example, an enum could be transparent or synonym.)

@edsko
Copy link
Collaborator

edsko commented Aug 8, 2025

If the user wants a type to be opaque:

types:
  - headers: acme.h
    cname: '@foo'
    module: Acme
    identifier: Foo
    representation: opaque

or

types:
  - headers: acme.h
    cname: foo_t
    module: Acme
    identifier: Foo
    representation: opaque-and-omit-anything-this-depends-on-if-not-used-anywhere-else

opaque-and-omit-anything-this-depends-on-if-not-used-anywhere-else will require some further thought. As is, this is really hard to implement; but unclear what the right solution is. Explicitly listing everything that needs to be omitted is possible, but awkward, users might end up having to list a ton of internal types where all the want to do is make the wrapper public type opaque. Conversely, if we say make "everytihng this depends on opaque" (irrespective or whether it's used anywhere else) is easier to implement, but much less useful. More thought needed. This is not specific to binding specifications; we already have this proble when generating bindings for something like

struct FILE__ { ... };

typedef struct FILE__ FILE;

If we can implement this (perhaps it's not that difficult?), then opaque should mean opaque-and-omit-anything-this-depends-on-if-not-used-anywhere-else. Not sure if that default would need overriding, but unlikely: the point of making it opaque is that the implementation is internal, and if those types are not used anywhere else, there is no point generating bindings for them.

Side note: generated binding specifications do not in general need to record omitted types: this is the purpose of selection and program slicing. Recording everything we don't generate bindings for (say, everything in the sys headers) could be an enormous list. However, explicit omits in prescriptive binding specifications should be included in the generated binding specifications, so that the output binding spec is a valid extension of the input binding spec.

@TravisCardwell
Copy link
Collaborator Author

(Rebased)

@TravisCardwell TravisCardwell force-pushed the tcard/anon branch 2 times, most recently from 0a45d7d to bff654b Compare August 19, 2025 22:35
@TravisCardwell TravisCardwell changed the title Distinguish anonymous declarations Binding specifications: separate C and Haskell specs Sep 26, 2025
@TravisCardwell
Copy link
Collaborator Author

This PR is blocked because "aliases" are not represented in binding specifications. We have decided to refactor binding specifications to instead specify C and Haskell types separately (#799). I need the work already done in this PR, so I am hijacking it. I just renamed it.

I rebased the branch and changed QualName to distinguish the tag kind for anonymous types. We now write struct @foo instead of just @foo.

New type constructor `QualNameAnon` is added to the `QualName` type to
distinguish anonymous declarations in binding specifications and select
predicates.  The string representation uses an `@` prefix.

Example: `@SC_c`

This character should not cause issues with regular expressions.

This character is reserved for future use by YAML and must therefore be
quoted.

Example:

```yaml
cname: '@S1_c'
```
With this change, we now write `struct @foo` instead of just `@foo`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Be more explicit about specifications for anonymous declarations
2 participants