-
Notifications
You must be signed in to change notification settings - Fork 187
feat: implement batched testcase generation with progress tracking #3926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement batched testcase generation with progress tracking #3926
Conversation
- Add new SSE event types (PROGRESS, BATCH_START, BATCH_COMPLETE) for progress tracking - Create progressTracking utility for dispatching progress events - Implement batchTestcaseGenerationNode with concurrent limit of 3 testcases - Replace parallel Send-based execution with controlled batching - Update QA agent graph to use batched execution flow - Update tests and README to reflect new architecture - Remove unused continueToRequirements function This change addresses issue #5645 by preventing message flooding during parallel testcase generation and providing clear progress feedback to users. Co-Authored-By: [email protected] <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Finished running flow.
|
|
Updates to Preview Branch (devin/1761719224-testcase-generation-progress-control) ↗︎
Tasks are run on every commit but only new migration files are pushed.
View logs for this Workflow Run ↗︎. |
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Comment |
Check changeset necessityStatus: NOT REQUIRED Reason:
Changeset (copy & paste):# N/A — No changeset required because the changes affect only "@liam-hq/agent", which is in the ignore list. |
- Switch from Promise.all to Promise.allSettled to prevent single testcase failures from blocking entire batch - Extract helper functions to reduce cognitive complexity - Aggregate failed testcases as schema issues with error messages - Fix progress counting to use actual batch length instead of limit This improves robustness by allowing partial batch success and prevents timeouts when individual testcases fail. Co-Authored-By: [email protected] <[email protected]>
🤖 Agent Deep Modeling ExecutionStarted at: 2025-10-29 08:10:40 UTC View DetailsCommand Output
RUN v3.2.4 /home/runner/work/liam/liam/frontend/internal-packages/agent (node:7737) ExperimentalWarning: WASI is an experimental feature and might change at any time ✅ [INFO] 2025-10-29T08:10:43.155Z 🧠 Thinking (pm)... Structuring the BRD and Test Cases I need to create a Business Requirements Document (BRD) with sections like Scope, Actors, Functional and Non-Functional Requirements, Data Model, Permissions, Validation Rules, and Audit Logging. For test cases, I’ll organize them by categories like User Registration and Authentication. It’s crucial to include both positive and negative cases, but since this is a simple user management system, I can skip complex login flows. I must still ensure test cases align with SQL operations for QA verification. Developing Test Cases and Data Model I need to include SQL types like INSERT, UPDATE, DELETE, and SELECT and provide around 15-25 test cases, both positive and negative, which seems manageable. I'll also suggest a data model with tables such as users, roles, user_roles, and possibly permissions. For simplicity, I can keep permissions optional. I'll make sure to have unique constraints on email and maybe username, and implement a soft delete or active status. Additionally, there should be an audit logs table to track changes effectively. Deciding on Audit and Deletion Approach I think we can keep audit requirements minimal for this simple setup, even though including them is typical. The current schema structure is empty, so I need to provide a BRD that the DB Agent can work with. I'll also add non-functional requirements like PII encryption and password hashing, but I might skip 2FA for the simplicity of the system. For deletions, I’ll implement soft deletes with a "deleted" status to maintain referential integrity while including "DELETE" tests to block hard deletions. The test cases must align with SQL operations for coherence. Defining Deletion Rules and Test Cases I’m thinking we can allow deletion of roles if they aren't assigned, but for users, hard deletion isn’t permitted—they need to be soft deleted instead. I'll include a negative test case for when a delete fails due to foreign key constraints. I want to establish that users can’t be hard-deleted if they’re referenced by audit logs, while only super admins can do hard deletes without dependent records. For test cases, I’ll have scenarios including attempts to delete users with existing logs and successfully deleting roles that aren’t in use. Organizing Requirements and Test Cases I need to establish categories like User Lifecycle Management, Authentication & Security, and others. There's a need for minimal disclaimers and to ensure the tool call object is correctly formatted with no extra fields. I’ll produce the BRD content after a checklist, including a business goal summary, scope, actors, user stories, and a data model. I plan to define entities for users, roles, and audit logs, incorporate constraints, and detail test cases for various operations, ensuring each test case aligns with the correct SQL operation type. Finalizing Test Cases and BRD Content I need to ensure the test case types correspond to SQL operations. For example, updating the user email should log changes and be classified as an INSERT action. I’ll create clear categories for the test cases, ensuring each category is represented correctly in the tool. The goal will succinctly state our objective for a user management system. After crafting a structured BRD with an overview, scope, actors, requirements, and a concise test case listing, I’ll call the processAnalyzedRequirementsTool and confirm successful completion. ⏺ (pm): Planned steps:
Business Requirements Document (BRD): Simple User Management System
4.2 Email Verification and Activation
4.3 Authentication-Related State
4.4 Role Management
4.5 Account Status Management
4.6 Audit Logging
4.7 Search and Listing
5.2 roles
5.3 user_roles
5.4 audit_logs
Proceeding to save the analyzed goal and categorized testcases for downstream use. 🔧 processAnalyzedRequirementsTool() ⎿ processAnalyzedRequirementsTool: Requirements processed and streamed successfully 🧠 Thinking (db)... Considering database constraints I'm thinking about how to prevent users from deleting roles assigned to them, especially if ⏺ (db): Purpose: Create citext extension, enums, and core tables (users, roles, user_roles, audit_logs) with necessary columns, constraints, and indexes to satisfy user-management requirements. Minimal inputs: extension name, enum names/values, table definitions with columns, constraints, and indexes. I'll run the schema changes now. 🔧 schemaDesignTool() ⎿ schemaDesignTool: Schema successfully updated. The operations have been applied to the database schema, DDL validation successful (60/60 statements executed successfully), and new version created. ⏺ (db): Validation: Added citext extension, enums, and core tables with constraints and indexes. All operations succeeded. Summary of changes applied:
Next steps I performed implicitly to satisfy requirements (assumptions and supporting items added outside explicit DDL):
If you want, I can now:
Tell me which of the above you'd like next, and I'll implement it. 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Reject creating a user with a duplicate email (case-insensitive)" in category "User Registration & Creation" 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Create a user without a username (optional field) succeeds" in category "User Registration & Creation" ⎿ saveTestcase: Successfully saved SQL for test case "Create a user with valid data auto-assigns default 'user' role and sets status to pending_verification" in category "User Registration & Creation" 🔧 saveTestcase() 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Failed login increments failed_login_attempts" in category "Authentication & Account Status" ⎿ saveTestcase: Successfully saved SQL for test case "Case-insensitive email lookup returns the user by any email casing" in category "User Registration & Creation" ⎿ saveTestcase: Successfully saved SQL for test case "Successful login resets failed attempts and updates last_login_at" in category "Authentication & Account Status" 🔧 saveTestcase() 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Account locks after 5 consecutive failed attempts by setting locked_until in the future" in category "Authentication & Account Status" ⎿ saveTestcase: Successfully saved SQL for test case "Mark email verified transitions user status from pending_verification to active" in category "Email Verification" 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Create a new role with unique code" in category "Role Management" ⎿ saveTestcase: Successfully saved SQL for test case "Login is blocked when status is suspended or deleted or when locked_until is in the future" in category "Authentication & Account Status" ⎿ saveTestcase: Successfully saved SQL for test case "Prevent setting status to active when is_email_verified is false" in category "Email Verification" 🔧 saveTestcase() 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Prevent duplicate role assignment for the same user and role" in category "Role Management" 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Delete a role that is not assigned to any user" in category "Role Management" ⎿ saveTestcase: Successfully saved SQL for test case "Remove a role assignment from a user" in category "Role Management" ⎿ saveTestcase: Successfully saved SQL for test case "Assign an additional role to an existing user" in category "Role Management" 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Soft delete a user sets status to deleted and stamps deleted_at" in category "User Lifecycle" 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Deactivate a user account by setting status to suspended" in category "User Lifecycle" ⎿ saveTestcase: Successfully saved SQL for test case "Prevent hard delete when dependent audit logs or role assignments exist" in category "User Lifecycle" 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Reject setting user status to a value outside the allowed enum" in category "Data Validation & Constraints" ⎿ saveTestcase: Successfully saved SQL for test case "Hard delete a user without dependencies by an Admin succeeds" in category "User Lifecycle" ⎿ saveTestcase: Successfully saved SQL for test case "Reject updating email to one already in use by another user (case-insensitive)" in category "Data Validation & Constraints" 🔧 saveTestcase() 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Enforce unique username (case-insensitive) when provided" in category "Data Validation & Constraints" ⎿ saveTestcase: Successfully saved SQL for test case "Creating a user writes a CREATE audit log entry with actor and details" in category "Audit & Logging" ⎿ saveTestcase: Successfully saved SQL for test case "Updating a user email writes an UPDATE audit log entry with changed_fields" in category "Audit & Logging" 🔧 saveTestcase() 🔧 saveTestcase() 🔧 saveTestcase() ⎿ saveTestcase: Successfully saved SQL for test case "Successful login writes a LOGIN audit log entry" in category "Audit & Logging" ⎿ saveTestcase: Successfully saved SQL for test case "Keyword search (partial, case-insensitive) across email, username, first_name, last_name returns matches" in category "Search & Reporting" ⎿ saveTestcase: Successfully saved SQL for test case "List users filtered by role and status" in category "Search & Reporting" ⎿ unknown: 16/29 test cases passed, 13 failed ❌ Test Case: Reject creating a user with invalid email formatError:
|
…ed concurrency - Restore Send API parallel execution pattern as requested - Add semaphore-based concurrency limiting (max 3 concurrent testcases) - Implement progress tracking with BATCH_START, PROGRESS, BATCH_COMPLETE events - Create prepareTestcases node to initialize batch tracking - Create testcaseGenerationWithSemaphore wrapper node for concurrency control - Create reportProgress node to emit progress after each completion - Update applyGeneratedSqls to dispatch BATCH_COMPLETE and cleanup runtime state - Add batchId and totalTestcases fields to qaAgentAnnotation - Create progressRuntime utility for semaphore and progress counter management - Update tests and README to reflect Send-based architecture - Remove manual batching approach (batchTestcaseGenerationNode) This implementation maintains Send API parallel execution while enforcing concurrent limit via semaphore and providing aggregated progress updates to prevent timeline message flooding. Co-Authored-By: [email protected] <[email protected]>
Issue
Why is this change needed?
The QA agent's testcase generation was causing UX issues due to message flooding. When processing multiple testcases, LangGraph's Send API would execute all testcases in parallel, causing each to output individual messages. This created a poor user experience where the timeline was overwhelmed with processing messages, making it difficult to track overall progress.
What changed?
Architecture Changes
batchTestcaseGenerationnodecontinueToRequirementsfunction which was replaced by the batching approachNew Features
BATCH_START: Signals beginning of batch processing with total countPROGRESS: Reports incremental progress (completed/total)BATCH_COMPLETE: Signals completion with final countprogressTracking.tswith helper functions to dispatch progress eventsKey Implementation Details
Important Review Points
1. Behavior Changes
2. Error Handling
Promise.allfor batch execution, meaning if one testcase fails in a batch, the entire batch failsPromise.allSettledinstead to allow partial batch failures?3. Frontend Integration
4. Configuration
CONCURRENT_TESTCASE_LIMITis hardcoded to 35. Testing
Testing Performed
Session: https://app.devin.ai/sessions/c4d73b87adb7497ab79adc36edd10289
Requested by: [email protected] (@MH4GF)