Mark up docs schema, and add language specific filter #3188

guineveresaenger · 2025-09-13T00:25:10Z

This pull request does several things:

Changes the fixUpPropertyReference function to mark up any such reference with a tag that contains language-specific keys and values so a language-specific filter function can identify which value should be used for the language selected.
The span looks as follows in the schema:
\u003cspan pulumi-lang-nodejs=\"minUpper\" pulumi-lang-dotnet=\"MinUpper\" pulumi-lang-go=\"minUpper\" pulumi-lang-python=\"min_upper\" pulumi-lang-yaml=\"minUpper\" pulumi-lang-java=\"minUpper\"\u003emin_upper\u003c/span\u003e
Starts to fix a bug in fixUpPropertyReference where Dotnet was generating some property names in snake_case (TODO: proper capitalization for these property names is still missing) - see example
Creates a filter function that reads the provider's main schema from file and filters it for the correct language selectors on these spans and on the Pulumi Code Chooser (to avoid repeating examples translation), instead of re-generating the schema from scratch.
Adds new tests for fixUpPropertyReference; adds tests for the filter function.

In this pulumi-random PR, you can see the changes to the SDK are minimal (the azure -> azurerm is unrelated; the Java changes are due to #3184 and haven't been rolled out to providers.

Additional benefits:

We could use these spans in registry docsgen to fix up the broken default variable inflections there
Build time for the AWS Node SDK went from 130s to 26s; other SDKs are similar; smaller providers build the SDK almost instantaneously
Prompted us to build the Java SDK via pulumi package gen-sdk, reducing much future maintainer toil - it jsut comes for free with the Pulumi binary!
Paves the way for decoupling SDKgen from the terraform bridge in the future.

This pull request does not address the following:

some conversion stat cleanup should happen for the SDK gen pass - we're still seeing conversion stat output, which is nonsensical. This can be in a separate pull request IMO.

I ran this against the AWS provider as well. The SDK inline documentation has a few extra empty section headers with this change, because we're not running the description field through the docs generator twice. I think this is the correct bargain to make - after this change, we can fix the docs rendering in one place, and have the schema be the source of truth for everything.

Fixes #1918.

codecov · 2025-09-13T00:32:40Z

Codecov Report

❌ Patch coverage is 91.85185% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.78%. Comparing base (998de46) to head (7e78689).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/tfgen/generate.go	86.95%	6 Missing and 3 partials ⚠️
pkg/tfgen/docs.go	96.92%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #3188    +/-   ##
========================================
  Coverage   68.78%   68.78%            
========================================
  Files         336      336            
  Lines       43868    43995   +127     
========================================
+ Hits        30174    30264    +90     
- Misses      11996    12025    +29     
- Partials     1698     1706     +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

corymhall

I haven't reviewed much of the code yet, but I'm curious what you think about the prior discussions on this issue? Specifically related to pulumi/pulumi#16099 and the work started on pulumi/pulumi#18949.

From what I can follow it seems like the plan was to not require tfgen at all. It seems like it wouldn't be too hard to at a later point change the span generation to the tag/ref generation, but I'm also not sure how close we are to having 16099 finished. It seems like we could either:

Finish 16099 and then implement the tag/ref generation here
Merge this PR, then finish 16099, then come back and modify this generation.

I think it all depends on how close we are to merging 18949 / fixing 16099. If we are pretty close then I think we should try to push that through.

Thoughts?

corymhall · 2025-09-15T12:06:34Z

pkg/tfgen/docs.go

-			}
+
+			// Build our span
+			// <span pulumi-lang-nodejs="firstProperty" pulumi-lang-go="FirstProperty" ...>firstProperty</span>


Looking at the linked issues it looks like there was some agreement on a different format

pulumi/pulumi#16099 (comment)

<pulumi ref="#/resources/aws:s3:Bucket/properties/bucketName" />

guineveresaenger · 2025-09-18T19:17:06Z

thank you for the review! So there's a couple of things here.

It looks like Schema references in descriptions pulumi#18949 is making a good start towards the end goal; however it only implements things for the Python SDK for now. So I think it's actually not that close to done.
I'm hoping to keep this code contained in the bridge to allow us to iterate a bit faster here. This does mean generating a span that has a translation for each language, using fixupPropertyReference as the core point. Once the sdkgen functionality exists, we can refactor to get rid of fixupPropertyReference entirely, at which point we could swap out for the ref span (assuming we can teach docsgen to translate this ref span also - we do want the correct inflection to show on the registry too).

From what I can follow it seems like the plan was to not require tfgen at all.

Right! There's two separate tfgen paths here.

Use tfgen to output the docs schema
Use tfgen to generate the SDKS

This PR modifies 2) to get us closer to the goal of not using tfgen at all.
Currently, 2) creates a "language-specific schema" by running all of schemagen and docsgen (including example translation or retrieval from cache) with a language parameter, which the bridge then shells out to pulumi package gen-sdk.
This change removes the need to re-run schemagen (by reading from the schema file instead) and docsgen (by looking for code-specific markups in the schema and filtering by language). So all that's left as part of "tfgen" is this filtering functionality that can easily be taken out of the bridge and hooked up to an SDK gen pipeline (or added to SDKgen in Core). I'm leaving it in the bridge for now, though, because it means we can reduce build times today, and the SDKgen changes are still a bit out.

TL;DR: I think this PR is a stepping stone to the goals; moreover I don't think it makes any irreversible decisions that would stand in the way of implementing pulumi/pulumi#16099.

pkg/tfgen/docs.go

pkg/tfgen/generate.go

Graham-Pedersen · 2025-09-18T22:36:03Z

...ByLanguage/Handles_property_names_that_are_surrounded_by_back_ticks_AND_double_quotes.golden

+{
+  "name": "random",
+  "description": "A Pulumi package to safely use randomness in Pulumi programs.",
+  "keywords": [
+    "pulumi",
+    "random"
+  ],
+  "homepage": "https://pulumi.io",
+  "license": "Apache-2.0",


I love all the testing you added! Just wanted to call out how great that is to have some more unit test coverage!

go.mod

blampe · 2025-09-25T19:09:48Z

pkg/tests/schema_generation_test.go


+	// First, generate the schema file that the NodeJS generator expects
+	// Use a separate in-memory filesystem for schema generation to avoid conflicts
+	schemaRoot := afero.NewMemMapFs()


pkg/tests/schema_generation_test.go

blampe · 2025-09-25T19:14:19Z

pkg/tfgen/docs.go

+			// Use `ec2.Instance` format for Go and Python
+			goAndPyFormat := open + modname + resname.String() + close
+			// Use `aws.ec2.Instance` format for all other languages
+			allOtherLangs := open + c.pkg.String() + "." + modname + resname.String() + close


So the text we get is already appropriately snake/pascal-cased, we don't need to handle that?

Yeah! The resource name is already Pulumified.

pkg/tfgen/docs_test.go

blampe · 2025-09-25T19:18:05Z

pkg/tfgen/docs_test.go

-		assert.Equalf(t, text == "", elided,
-			"We should only see an empty result for non-empty inputs if we have elided text")
+	for _, tc := range testCases {
+		tc := tc


Not needed since Go 1.22.

pkg/tfgen/docs_test.go

pkg/tfgen/generate_test.go

…n with language tags.

…er now. add regex filters. Thought - could this be done better? UNCLEAR

…ssertions

…its to inline autogold strings

…nction

guineveresaenger mentioned this pull request Sep 13, 2025

[DO NOT MERGE] Sample of SDK gen using schema filter functionality pulumi/pulumi-aws#5825

Draft

corymhall reviewed Sep 15, 2025

View reviewed changes

guineveresaenger force-pushed the guin/markup-and-filter-schema branch from 017c089 to a59efa4 Compare September 18, 2025 22:04

Graham-Pedersen reviewed Sep 18, 2025

View reviewed changes

guineveresaenger force-pushed the guin/markup-and-filter-schema branch 4 times, most recently from db0b271 to dc40b9c Compare September 24, 2025 19:09

guineveresaenger marked this pull request as ready for review September 24, 2025 19:38

guineveresaenger requested a review from a team September 24, 2025 19:40

blampe approved these changes Sep 25, 2025

View reviewed changes

guineveresaenger added 17 commits September 25, 2025 16:10

We mark up Description fields that have code names in them with a spa…

f5ee509

…n with language tags.

Scaffold filter function

d5cc8a7

Add filter function and use it. Writing to test-python-span-with-filt…

658afa3

…er now. add regex filters. Thought - could this be done better? UNCLEAR

First pass using regex

7c5ff35

refactor and test docs

ef2deb8

Filter out code choosers and re-add version

2430bb0

Refactor filter to return a byte array; write unit tests

404ac4c

Fix a bug where only node and go would get camelCased property names

f9787cb

Use bytes not strings for find/replace. Simplify regex and fix tests

63ce182

skip examples processing

567c07d

Pare down the test schema.

4034d87

Capitalize dotnet properties

5333ae5

Ensure RegistryDocs 'language' is handled correctly

01d855e

lint_fix

3e4df30

Update pf test schemas with new markup feature

00546b0

Mark up test schema data for tfgen tests

d690686

Regenerate golden files for dynamic package

078956f

guineveresaenger added 10 commits September 25, 2025 16:10

Adjust schema generation test to first generate the docs schema

f36e859

Move TestPropertyDocumentationEdits to use autogold and clarify new a…

9f0dfc4

…ssertions

Move TestReformatText to golden files and TestPropertyDocumentationEd…

c6ab4c9

…its to inline autogold strings

address code review: structure and clean up fixupPropertyReference fu…

a9338f4

…nction

tests should respect installed versions

9bb3305

more lint fixes

e1fd65d

don't update the aws version yet

e61191a

update full docs test gen

d540e5a

use inline assertions for TestReformatText

7dd432d

update random test file

c15b2d4

guineveresaenger force-pushed the guin/markup-and-filter-schema branch from dc40b9c to ab52e74 Compare September 25, 2025 23:36

cleanup tests

a833c6b

guineveresaenger force-pushed the guin/markup-and-filter-schema branch from ab52e74 to a833c6b Compare September 25, 2025 23:43

use t.TempDir

df8cfcf

guineveresaenger force-pushed the guin/markup-and-filter-schema branch from f82e085 to df8cfcf Compare September 26, 2025 00:53

Regenerate ReformatText one last time

94157ab

guineveresaenger force-pushed the guin/markup-and-filter-schema branch from 7d9a18c to e10f8b9 Compare September 26, 2025 21:00

Tests should not clobber schema files of other tests

9fc1766

guineveresaenger force-pushed the guin/markup-and-filter-schema branch from e10f8b9 to 9fc1766 Compare September 27, 2025 00:02

Allow for non-parallel tests

7e78689

guineveresaenger merged commit 766d3c8 into main Sep 29, 2025
138 checks passed

guineveresaenger deleted the guin/markup-and-filter-schema branch September 29, 2025 19:10

guineveresaenger mentioned this pull request Oct 3, 2025

SDK gen breaks for GCP provider name #3198

Closed

Mark up docs schema, and add language specific filter #3188

Mark up docs schema, and add language specific filter #3188

Uh oh!

Conversation

guineveresaenger commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

corymhall left a comment

Choose a reason for hiding this comment

Uh oh!

corymhall Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

guineveresaenger commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Graham-Pedersen Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blampe Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blampe Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

guineveresaenger Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

blampe Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

guineveresaenger commented Sep 13, 2025 •

edited

Loading

codecov bot commented Sep 13, 2025 •

edited

Loading