Skip to content

test(W-19105940): confidence tests #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open

Conversation

mdonnalley
Copy link
Contributor

@mdonnalley mdonnalley commented Jun 16, 2025

What does this PR do?

Adds confidence tests

Running the Tests

Developers can run them locally using yarn test:confidence --file tests/confidence/sf-deploy-metadata.yml

See the DEVELOPING.md for instructions on how to setup the tests

In CI, these tests will only run if any of the following paths have been changed:

  • confidence/**
  • test/confidence/**
  • src/tools/**

The Yaml File

Example YAML structure:

models:
  - sfdc_ai__DefaultBedrockAnthropicClaude4Sonnet

initial-context:
  - 'My current OS is macos. I am working in a workspace with the following folders: /Users/sf-dev/dreamhouse-lwc
    My org alias is dreamhouse.
    This is the structure of /Users/sf-dev/dreamhouse-lwc:
    package.xml
    force-app/main/default/applications
    force-app/main/default/aura
    force-app/main/default/aura/pageTemplate_2_7_3
    force-app/main/default/classes
    force-app/main/default/contentassets
    force-app/main/default/cspTrustedSites
    force-app/main/default/flexipages
    force-app/main/default/flows
    force-app/main/default/layouts
    force-app/main/default/lwc
    force-app/main/default/messageChannels
    force-app/main/default/objects
    force-app/main/default/permissionsets
    force-app/main/default/prompts
    force-app/main/default/remoteSiteSettings
    force-app/main/default/staticresources
    force-app/main/default/tabs'

tests:
  # Deploy specific source directory (Lightning Web Components)
  - utterances:
      - Deploy the Lightning Web Components in force-app/main/default/lwc to the dreamhouse org.
    expected-tool: sf-deploy-metadata
    expected-parameters:
      sourceDir: force-app/main/default/lwc
      directory: /Users/sf-dev/dreamhouse-lwc
      usernameOrAlias: dreamhouse
    expected-tool-confidence: 50
    expected-parameter-confidence: 50
    allowed-tools:
      - sf-list-all-orgs

  # Deploy multiple source directories
  - utterances:
      - Deploy the classes and lwc folders to my dreamhouse org.
    expected-tool: sf-deploy-metadata
    expected-parameters:
      sourceDir: force-app/main/default/classes,force-app/main/default/lwc
      directory: /Users/sf-dev/dreamhouse-lwc
      usernameOrAlias: dreamhouse
    expected-tool-confidence: 50
    expected-parameter-confidence: 50
    allowed-tools:
      - sf-list-all-orgs

Implementation notes

  • The confidence-test command is implemented in an oclif cli that lives under the confidence/ directory. I did this because we'll likely add other commands as we build out the complete testing framework. It will also be easy to move into a separate repo if we ever wanted to
  • I added a RateLimiter class because I was originally working with a limiter 40 requests per minute when testing against a pre-prod API url. Now that we can hit production, we have 500 requests per minute so it's probably not an issue but it's nice to have as the number of tests grows

What issues does this PR fix or reference?

@W-19105940@

test/llmg.ts Outdated
prompts: string[],
model: string,
tools: InvocableTool[]
): Promise<{ model: string; messages: Array<{ role: string; content: string }>; responses: GatewayResponse[] }> => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should model values be verified against an allow list first to avoid making calls with bad (likely accidental) input?

@mdonnalley mdonnalley changed the title test(W-18772941): test against LLM Gateway test(W-19105940): confidence tests Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants