Skip to content

test: reproduce commonly cited timeout error #1390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

danieljbruce
Copy link
Contributor

@danieljbruce danieljbruce commented May 6, 2025

Description

The main intent of this PR is to supply a test confirming that users can see the changes introduced by this PR. Primarily, the runQuery test provides an error stack that users can see so that they can identify the root cause of the timeout errors they are seeing. Confirming this behaviour is in place with a test is crucial for solving issues like #1176 going forward.

Impact

This is a crucial step towards solving #1176 giving us the tools we need to explore client library behaviour when the error from the issue occurs. Having this test in place ensures the impact from the fix in Gax is locked in which will assist users with determining the root cause of errors they are seeing in the future.

Testing

This PR only adds tests and no source code changes.

mock-server/datastore-server.ts: Changes are made to the mock server so that test code can provide its own service configuration providing custom responses for different server grpc endpoints.

test/mock-server-tester.ts: This file is meant to be a grpc endpoint agnostic utility file for providing standard grpc services to the mock server. For example, the sendErrorSeries method can be used to make the mock server send a series of errors with the same code and the shutdownServer method shuts down the mock server.

test/with-runquery-mockserver.ts: This file runs a test where runQuery calls return a deterministic set of UNAVAILABLE errors and then it checks the error message to see that it reports the timeout error along with the list of UNAVAILABLE errors in order.

Additional Information

Some related PRs:
googleapis/gax-nodejs#1740
googleapis/gax-nodejs#1650

Next Steps

  • We have a test for UNAVAILABLE on runQuery now, but we should apply the right fixes for DEADLINE_EXCEEDED errors
  • Let's add a test involving transactions

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: datastore Issues related to the googleapis/nodejs-datastore API. labels May 6, 2025
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels May 6, 2025
done();
} catch (e) {
done(e);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only real change is to make the test runner throw an error if the script throws an error instead of failing silently

// The error message is based on client library behavior.
assert.strictEqual(
(e as Error).message,
'4 DEADLINE_EXCEEDED: error details: error count: 1',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should be something like this. We need to do a bit more investigation into what the client should do in this case to determine the exact error message should be, but for now it's fine to backlog this as a TODO.

@@ -0,0 +1,126 @@
// Copyright 2025 Google LLC
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains utilities for working with the mock server that include a function for producing a deterministic set of errors and a function for shutting down the mock server.

callback: (arg1: string | null, arg2: {}) => {},
) {
// SET A BREAKPOINT HERE AND EXPLORE `call` TO SEE THE REQUEST.
callback(null, {message: 'Hello'});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This grpcEndpoint function has just been moved inside the only place it is used.

@danieljbruce danieljbruce changed the title 414574369 test error stack end to end test: reproduce commonly cited timeout error May 6, 2025
Comment on lines 62 to 68
* `DEADLINE_EXCEEDED` code (4), a details message indicating the number of
* errors generated so far by this instance, and some metadata.
*
* @returns {ServiceError} A `ServiceError` object representing a simulated
* gRPC error.
*/
generateError() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - without reading the comment, it's actually hard interpret generateError to generate a DEADLINE_EXCEEDED error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restructured this so that we pass the error code into generateError. This might make the intent of this method more clear.

@@ -0,0 +1,51 @@
// Copyright 2025 Google LLC

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - the test file is also a bit confusing. I think your intention is to differentiate the test between non-retryable error like NOT_FOUND v.s. a DEADLINE_EXCEEDED error.

But the createreadstream v.s. runquery in the name made it confusing as it seems to do with the functionalities.

Copy link
Contributor Author

@danieljbruce danieljbruce May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to differentiate between the calls to the lookup grpc endpoint and the runQuery grpc endpoint.

Note that this PR is in a Draft stage. Although the early feedback is appreciated, I think I'm going to make some larger adjustments like removing this file so that we can lock in the progress we have with the UNAVAILABLE test on runQuery. This gives us the confidence that we can see the error stack for these types of calls.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels May 22, 2025
export function startServer(cb: () => void) {
export function startServer<ResponseType>(
cb: () => void,
serviceConfigurationOverride?: MockServiceConfiguration<ResponseType>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where a new service can be provided to the mock server.

@danieljbruce danieljbruce marked this pull request as ready for review May 22, 2025 20:58
@danieljbruce danieljbruce requested review from a team as code owners May 22, 2025 20:58
@danieljbruce danieljbruce requested a review from leahecole May 22, 2025 20:59
const metadata = new grpc.Metadata();
metadata.set(
'grpc-server-stats-bin',
Buffer.from([0, 0, 116, 73, 159, 3, 0, 0, 0, 0]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this buffer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

* This method is designed to be used in mock server setups for testing purposes.
* It returns a function that, when called, will invoke a callback with a
* pre-configured error object, simulating a gRPC service responding with an error.
* The error includes a DEADLINE_EXCEEDED code (4), a details message indicating the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why DEADLINE_EXCEEDED? Doesn't it take an arbitrary error code?

Copy link

@leahecole leahecole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a doc about the mock server tester or the mock server itself? Not sure if this is something your team typically does for your repos, but it seems like a lot of maintenance overhead to introduce. I am seeing yellow flags at reintroducing something somewhat generic that does a series of errors - isn't that what we should be testing in gax? (Which I do know you're working on, I just wonder if this is redundant?)

Comment on lines +15 to +17
// eslint-disable-next-line n/no-extraneous-import
import {ServiceError} from 'google-gax';
import {Server} from '@grpc/grpc-js';

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be added to a package.json instead of disabling extraneous imports

const metadata = new grpc.Metadata();
metadata.set(
'grpc-server-stats-bin',
Buffer.from([0, 0, 116, 73, 159, 3, 0, 0, 0, 0]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/nodejs-datastore API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants