Skip to content

[camera_android] prevent startImageStream OOM error when main thread hangs (flutter#166533) #8998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kwikwag
Copy link

@kwikwag kwikwag commented Apr 4, 2025

When streaming images using CameraController.startImageStream(), images are being post()-ed to the main looper in the Android implementation, even if it's hanging or paused. This will happen, for instance, when the Flutter debugger is paused or when the main thread is very busy. This can quickly result in an OOM (out-of-memory) error due to many images pending in queue, and the Android OS will kill the app abruptly.

The fix is done by counting the number of images that have been posted to the main thread but have not yet arrived. If too many images are in transit (I arbitrarily chose 3 as the maximum allowable amount) subsequent frames are dropped, until the main thread receives the pending images. This is a form of back-pressure on the main looper handler.

A log message is emitted for each dropped frame.

Fixes flutter/flutter#166533

That issue also provides a demo program.

A few extra considerations for the reviewer(s):

  • This is similar to an iOS issue that was fixed at [camera] Fixed a crash when streaming on iOS plugins#4520. There, the native code waits until the Dart side sends an acknowledgement back. From testing I have performed, this is not required in order to achieve constant memory use once the debugger is paused. Instead, it's enough to receive a response from the main looper. Since this OOM bug does not occur for the camera_android_camerax plugin, I suspect that other potential buffer problems are being handled by some other streaming mechanism somewhere between Android and Flutter.
  • There was duplicated code to create mock Image objects: it was inlined (twice) inside ImageStreamReaderTest but got its own getImage() method in ImageStreamReaderUtilsTest. However, there was one very small difference in the implementations: inside ImageStreamReaderTest the uSize (and incidentally vSize) that was hard-coded was one byte less than the value calculated by ImageStreamReaderUtilsTest.getImage(). I assumed this was insignificant, but perhaps the original author can comment? I made the method static and placed it in ImageStreamReaderTest. Is this the best place?
  • I arbitrarily chose 3 as the maximum number of frames pending before the ImageStreamReader stops pushing post()s to the main looper. Just 1 seemed like a stretch - I assume sometimes the main thread can get caught up and we don't want to drop frames too often, so it's not too noticeable. However, I assume that if the main thread is busy enough as to not be able to process a post() within a certain interval, the app won't suffer from dropping frames either. But it's a behavior change. Of course, the main benefit here is avoiding a possible OOM crash, which is much worse. A number too high might also take up extra memory, or even trigger an OOM error, with no real benefit.
  • With advice from the #hackers-tests Discord channel the test uses a mock Handler which is provided to the instance. However, how to implement this exactly I wasn't sure. I chose a to add handler field (that's decorated with @VisibleForTesting(otherwise = VisibleForTesting.NONE) to indicate that this should not be usable by non-test code; this modifier is not used anywhere else in the codebase but it made complete sense to me; of course I can remove it if need be).
  • Initially I used a static field for counting pending frames. This caused problems with the tests, as the handler runnable is not invoked for the tests, so if the count uses a static field the ImageReaderStream will pile up as tests are run and start skipping frames where one does not want it. This can be solved by forcing other tests to provide a Handler that invokes the runnables, which might be a good idea anyway. However, since for all intents and purposes there will be one ImageReaderStream, and even if there is more than one - the number of pending frames will scale linearly with the number of instances, it seems good to keep this an instance field rather than a static one.

Pre-Review Checklist

If you need help, consider asking for advice on the #hackers-new channel on Discord.

Footnotes

  1. Regular contributors who have demonstrated familiarity with the repository guidelines only need to comment if the PR is not auto-exempted by repo tooling. 2 3

kwikwag added 2 commits April 4, 2025 02:10
…read paused (flutter#166533)

When streaming images using CameraController.startImageStream(), images are being post()-ed to the main looper even if it's halted. This is common when debugging. This quickly results in an OOM and the app crashes abruptly.

The fix is done by counting the number of images that have been posted to the main thread but have not yet arrived. If too many images are in transit (I arbitrarily chose 3 as the maximum allowable amount) subsequent frames are dropped, until the main thread receives the pending images.

A log message is emitted for each dropped frame.

Fixes flutter/flutter#166533

That issue also provides a demo program.
@reidbaker
Copy link
Contributor

Sorry @camsim99 is out of town. Adding team android to give a first pass review.

@reidbaker reidbaker requested a review from a team April 8, 2025 18:42
Copy link
Contributor

@reidbaker reidbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you this is a tricky problem and clearly impacts developers.

I am a bit worried that we are dropping frames instead of creating a frame queue but I understand that is probably a more difficult pr to author.

My largest worry is that this will cause an issue in non debug code. What do you think about renaming numImagesInTransit to consecutiveImagesInTranset and when any frame is delivered setting the value to zero. That would solve the debugger paused execution issue by dropping frames but remove an off by one set of conditions from causing the camera frames to stop.

This will still need @camsim99 to approve before we merge it.

// Handle "buffer is inaccessible" errors that can happen on some devices from
// ImageStreamReaderUtils.yuv420ThreePlanesToNV21()
final Handler handler =
this.handler != null ? this.handler : new Handler(Looper.getMainLooper());
handler.post(
() ->
imageStreamSink.error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt this also have --numImagesInTransit otherwise if we had more than 3 images error then we would no longer update any frames.

Copy link
Author

@kwikwag kwikwag Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reidbaker Thanks for these questions. I'll address the issues one-by-one

  1. As opposed to how I initially phrased the issue, this will not only happen when the debugger is paused, but any time the main loop is lagging. I see no benefit to the 'consecutive' method:

    • I'm not sure what you mean by an "off-by-one set of conditions." If the main thread is lagging by one frame, it will do so regardless of this fix. Dropping frames will only happen when at least two frames have already been sent to the handler but have not yet been handled. Being less familiar with Flutter/Android workings, I'm not sure if the EventSink.success() call can raise an exception. If it can, it would perhaps be wish to wrap the call with a try..finally. However, I would not move the decrement before the success() call, as decrement should happen when the image memory can readily be released, otherwise a queue build-up might still occur.
    • Resetting the number of frames whenever any frame reaches the main thread can have an adverse affect, too. Imagine a scenario where a frame is generated every 10ms, but handled every 20ms on the main thread itself. You will start accumulating an infinite amount of frames pending on post() and eventually crash the app with an out-of-memory error. Ensuring that post() arrives and completes is exactly the queue capability you are talking about, as far as I understand.
  2. Regarding not updating frames anymore - the only scenario I see that you will no longer update frames in, is when the handler stops handling the Runnables passed to post(). If that is the case, it means the main thread is paused or hanging, in which case it should be OK to not update the frames until it releases and the Runnable gets invoked. I see no risk of deadlock, as AFAIK there is no way for the main thread to wait on whatever calls onImageAvailable().

How long will it be before @camsim99 comes back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents:

  • @reidbaker can you clarify the off-by-one issue? I'm also a bit confused there.
  • Based on my current understanding of things (off-by-one issue aside), I agree with @kwikwag that we should not reset numImagesInTransit back to 0 when a frame is delivered. My understanding is that if we do, then we would no longer only be waiting on 3 images to accumulate to start closing images, but instead, would be waiting on 3 then 6 then potentially 9 and so on and so forth.
  • I agree with @reidbaker that in this case the comment is linked to (the case where an IllegalStateException is thrown streaming an image), we should also decrement numImagesInTransit because we have essentially "handled" that image.

@kwikwag kwikwag changed the title [camera_android] prevent startImageStream OOM error when main thread paused (flutter#166533) [camera_android] prevent startImageStream OOM error when main thread hangs (flutter#166533) Apr 14, 2025
@kwikwag
Copy link
Author

kwikwag commented Apr 14, 2025

I can also suggest the following alternative implementations:

  1. Make imageBuffer an instance variable, and keeping only one post() invocation alive (e.g. by testing first with Handler.hasCallbacks(). This will eliminate the queue count and will only provide the latest frame to the handler. However, the actual Map behind imageBuffer better be correctly initialized whenever the Runnable send to post() actually completes, which can be more confusing to implement.
  2. Using Handler.sendMessage() and Handler.removeMessages() to prevent the queue from building up. This will allow keeping newer frames rather than older frames, though it might be a bit tricky keeping dropping the oldest frames only.

I don't see a true benefit to option 1. Option 2 has the benefit of being able to drop older frames at the cost of some added complexity the benefit isn't large enough as once the main thread resumes it will handle new frames anyway.

@kwikwag
Copy link
Author

kwikwag commented Apr 14, 2025

Yet another suggestion: keep image data as a soft/weak reference and skip frames where memory is evicted by the runtime. Only keep a hard reference to the most recent frame. Then we don't need this special case that counts the messages pending on the looper. I am not sure whether this will in fact eliminate the out of memory error, but I can check if this is a requirement.

/** Ensure that passing in an image with padding returns one without padding */
@Test
public void yuv420ThreePlanesToNV21_trimsPaddingWhenPresent() {
Image mockImage = getImage(160, 120, 16);
Image mockImage = ImageStreamReaderTest.getImage(160, 120, 16, ImageFormat.YUV_420_888);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: seems non-intuitive for test files to call between each other. Can you create an ImageStreamReaderTestUtils.java or something like that and put it there?

// Handle "buffer is inaccessible" errors that can happen on some devices from
// ImageStreamReaderUtils.yuv420ThreePlanesToNV21()
final Handler handler =
this.handler != null ? this.handler : new Handler(Looper.getMainLooper());
handler.post(
() ->
imageStreamSink.error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My two cents:

  • @reidbaker can you clarify the off-by-one issue? I'm also a bit confused there.
  • Based on my current understanding of things (off-by-one issue aside), I agree with @kwikwag that we should not reset numImagesInTransit back to 0 when a frame is delivered. My understanding is that if we do, then we would no longer only be waiting on 3 images to accumulate to start closing images, but instead, would be waiting on 3 then 6 then potentially 9 and so on and so forth.
  • I agree with @reidbaker that in this case the comment is linked to (the case where an IllegalStateException is thrown streaming an image), we should also decrement numImagesInTransit because we have essentially "handled" that image.

@kwikwag
Copy link
Author

kwikwag commented Apr 14, 2025

@camsim99 Thank you for the review and the comments.

I added fixes re the two issues mentioned. Note that the naming of the added file ImageStreamReaderTestUtils might be a bit confusing as the same folder holds ImageStreamReaderUtilsTest...

Rather than decreasing the numImagesInTransit in two places, I just moved the exception handling to surround only relevant code. Hope that's good.

Finally, I also added an extra branch with the WeakReference solution. It seems to work too, and doesn't require a counter. However since the unit test requires to check what happens when a WeakReference is released, the Runnable code there is a bit more complex. If you think that's better, I can create a pull request for that instead.

@camsim99
Copy link
Contributor

@kwikwag Thanks for the updates! The changes look reasonable to me 👍

Concerning the WeakReference solutions, what is the main difference there? I know it doesn't use the counter, but it's still present in that code, so I'm wondering how images would be closed differnetly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants