Skip to content

Make the TrioInternalError experience less confusing #1056

Open
@njsmith

Description

@njsmith

We've had several bug reports recently where the filers basically said "I got this TrioInternalError, I know it's not a bug in Trio but rather 100% my fault for breaking things, but it said to file a bug so I'm doing that":

At one level this is fine – a very common mistake that programmers make is to assume that when something goes wrong, it's their fault, when it's actually a fixable bug in the software they're using. I'd rather get some ambiguous cases filed in our tracker than miss out on real bugs. #882 is a good example. In that case it turned out that the reporter was implementing the context manager protocol by hand and their implementation had a bug in it, so the immediate problem was a bug in their code, not in Trio – but this wasn't obvious at first.

Still, it seems like the error message could be clearer and more accurate.

Cases where you get TrioInternalError:

  • When there's a bug in Trio
  • When you raise an exception from a place where the docs explicitly say you aren't allowed to raise an exception (especially spawn_system_task, TrioToken.run_sync_soon, but also potentially from weird places like the abort callback in wait_task_rescheduled)
  • When you do something that violates Trio's assumptions, like calling __enter__/__exit__ methods in the wrong order.

Ideally, when people misuse the API, they should get an error telling them that, like a RuntimeError or TypeError or something, not a TrioInternalError. (For example, I guess #882 is now basically "can we give a more specific error message when people do weird things with cancel scope/nursery entry/exit".) But we can't do that 100% of the time. Sometimes we can detect that something weird happened that we didn't anticipate, but by definition this means we don't know why it happened – could be a bug, could be someone messing with us, there's no general way to tell. #1055 is an example – @oremanj diagnosed that it has mis-nested cancel scopes, but in general async_fn().send(None) is going to create very weird internal corruption (consider trio.hazmat.wait_readable(fd).send(None) – Trio will think that the task has suspended itself and should be resumed when the fd becomes readable, but in fact the task isn't suspended at all and is going to sleep somewhere else instead...).

The spawn_system_task and run_sync_soon and abort callbacks cases are slightly different: in all of these cases, an exception means that there's a bug in the code that used those APIs. And since these are basically internal APIs that are also public, then sometimes that user is Trio itself, and sometimes it's a user. Of course when they were created it was 100% internal users, because Trio had no users, so that was the case I optimized for :-).

So here's some possible action items:

  • Make the generic TrioInternalError message a little more accurate/informative. Maybe: "Trio encountered an unexpected error. This could be because someone did something unsupported, or it could be because of a bug in Trio. If you're not sure, please file a bug!"

  • Try to be more systematic about distinguishing between internal bugs in Trio versus API misuse. For example:

    • If we could catch bad nesting of cancel scopes/nurseries, that would be great, since it seems to come up a lot. (And tell people to use @asynccontextmanager!)
    • We could tweak the user-visible versions of spawn_system_task and run_sync_soon so that if user code crashes when it's not supposed to, you get a different error (not TrioInternalError). I'm not sure if this comes up often enough in practice to be worth worrying about though.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions