Skip to content

Conversation

dalaro
Copy link

@dalaro dalaro commented Nov 6, 2019

GraphRequestAction implements ChainableAction, but there's a specific case where it won't pass the Gatling session forward to the next action in the chain, hanging Gatling. The specific case is when the user supplies a lambda to delay generation of the GraphStatement until GRA is already inside sendQuery on its own fixed-sized threadpool, and then that lambda throws an exception.

Please feel free to be vicious with the PR review, as I'm well outside my starting point on this issue's debugging trajectory, and not intimately familiar with the components I'm touching here.

The DSP issue for which this PR's branch is named is sort of tangential to the PR, since the issue involves multiple potential problems in different components.

GraphRequestAction calls `buildFromSession`.  There's at least one
case where this can end up calling a user-supplied lambda --
`executeGraphFluent` / `GraphFluentStatementFromScalaLambda`.

If this user-supplied lambda throws an exception, then the exception
propagates out of the sendQuery method uncaught, without calling `next
! <session>` to keep the Gatling chain running.

From a user POV, this manifested as all tasks quickly becoming active
(because `sendQuery` reaches the exception-throwing line almost
immediately for each request), but no tasks becoming done (because the
session wasn't passed to the `next` reference defined in Gatling's
`ChainableAction`).
This test simulates the case where a user provides a lambda that
generates a GraphStatement (as with one of the `executeGraphFluent`
overloads), but that lambda throws an exception instead of returning
normally.

The session-chain-checking portion of the test is original, but the
trailing assertions related to logging are adapted from a very similar
block of existing code in CqlRequestActionSpec.
Copy link
Contributor

@pingtimeout pingtimeout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this!

next ! session.markAsFailed
throw e
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrapped the call dseAttributes.statement.buildFromSession(session) with a try catch. But according to the type system, this call should return a Validation[GraphStatement]. A Validation is a Gatling structure that represents either a Success or Failure (http://gatling.io/docs/current/session/validation/). So really, this case of an uncaught exception should have been handled before we return to GraphRequestAction.

I think that the fix should be located in DseGraphStatements.scala:54 instead of here. Proposition: replace L54 in DseGraphStatements by safely()(lambda(gatlingSession).success). This will use a built-in Gatling function that does exactly what you want, while also satisfying the type system.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insight.

Gatling's safely is a lot more concise and idiomatic than try-catch. There's one thing that makes me uneasy about it though -- I don't see a way to log the stacktrace of the exception caught by safely before it goes out of scope. For a user-provided lambda, there could be cases where the exception stacktrace is useful, potentially significantly more useful than the exception type/message (e.g. NPE).

I just pushed a new commit that moves exception handling down to DseGraphStatements.scala:54 as directed, but uses Try { ... } match { ... } instead of safely. It's now similar to some of its peer implementations inside DseGraphStatements.scala that also already used similar Try { ... } match { ... } constructs, and I can still see lambda traces in the logs.

What do you think?

Theoretically, pushing down exception handling like this could allow some other implementation of buildFromSession to induce a hang, if that implementation allows uncaught exceptions to propagate out to the caller; but it looks like some of the other implementations in DseGraphStatements already go out of their way to prevent that, and I'm assuming your comment means that's the general rule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a way to log the stacktrace of the exception caught by safely before it goes out of scope.

That's a good reason to use a try/catch block, indeed. +1 for that approach

Theoretically, pushing down exception handling like this could allow some other implementation of buildFromSession to induce a hang

I agree with you. I also wondered whether to keep the extra try/catch in GraphRequestAction, just to be 100% future proof. But I think the way to go is to honor the type system and make it clear that errors are prevented.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I also wondered whether to keep the extra try/catch in GraphRequestAction, just to be 100% future proof. But I think the way to go is to honor the type system and make it clear that errors are prevented.

Sounds good, I'll stick to the pushdown approach and update the tests to match. Thank you for addressing this.

This commit responds to review feedback from Pierre, moving exception
handling from GraphRequestAction.sendQuery's buildFromSession call to
one specific implementation of buildFromSession.  It eschews
safely(...) so that we can log the exception stacktrace.
Copy link
Contributor

@pingtimeout pingtimeout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from one nit, I think this PR is ready to be merged. Did you manage to validate the fix more globally on the ecdc project?

// Attempt to generate a graph statement from our parameters, propagating any uncaught exceptions after
// continuing the chain with a failed session
val stmt: Validation[GraphStatement] =
dseAttributes.statement.buildFromSession(session)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment lines are not accurate anymore here, could you restore the original line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll delete that comment

@dalaro
Copy link
Author

dalaro commented Nov 7, 2019

Did you manage to validate the fix more globally on the ecdc project?

I think so; this is the gist of what I executed:

.exec(graph("...")
    .executeGraphFluent(gsession => {
    [ ... throw uncaught NoSuchElementException ... ]
    }).[ ... check(...) result existence ... ])

Before both the initial PR branch push and before my latest push, I checked that the PR's underlying gatling-dse-plugin change both (a) caused the chain to complete, with all tasks promptly transitioning from the active state to the done state and (b) logged each exception with its name, message, and stacktrace.

However, my last commit was just to check direction. If the current direction is OK with you, then it's still not quite ready.

My first push made GraphRequestAction keep the chain alive if buildFromSession threw an exception, and it also added a test spec to check that behavior. Acting on your review feedback, that's being lost, so if buildFromSession throws something, then GRA is supposed to hang the chain again. The new test is failing in response. From my limited perspective, this makes me uncomfortable, but you are the expert here, and I certainly don't know the overall design or the details of every other buildFromSession implementation well enough to disagree.

Anyway, now that this overall approach has approval, I will delete the spec that I added initially, and maybe I can add a spec elsewhere for GraphFluentStatementFromScalaLambda's exception handling in exchange.

- Delete a vestigial comment from `GraphRequestAction` (thanks for
  finding this Pierre)

- Delete an obsolete case from `GraphRequestActionSpec`; it was
  checking that GRA kept the chain alive despite `buildFromSession`
  exceptions, but since that behavior has been reverted, this spec is
  now unnecessary (and failing)

- Add a new case to `DseGraphStatementSpec`, exercising a sort of
  equivalent scenario to the one deleted in the last bullet; it checks
  that `GraphFluentStatementFromScalaLambda.buildFromSession` catches
  and logs a runtime exception thrown by its inner lambda.  This is
  similar to the case that got deleted, except that some of the
  assertions were rewritten as matchers in an attempt to be more
  consistent with the surrounding tests in the same file
@dalaro
Copy link
Author

dalaro commented Nov 7, 2019

I just pushed a comment and test fixup commit. I deleted the outdated GraphRequestActionSpec case, and adapted it into a new case in DseGraphStatementSpec, exercising the failure case on the the new Try {...} match {...} block in buildFromSession. I retested against my original Gatling simulation; all tasks move promptly from active to done, and the exceptions with traces are visible in the log. sbt test is also clean on my machine.

I can't think of any other remaining work items that would block a hypothetical merge at this point. Please do let me know if there's anything else I can do to improve the PR -- code changes, squashing the branch, etc.

@dalaro dalaro requested a review from pingtimeout November 12, 2019 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants