-
Notifications
You must be signed in to change notification settings - Fork 630
Body tracking is too slow and inaccurate #514
Comments
I understand that the body tracking is only a preview release, and will likely be improved over time. As a stopgap, if it's possible, perhaps the body tracking can expose a parameter that can be tuned to achieve higher accuracy for offline or remote processing on powerful systems for non-real time applications. There are many use cases that might only need a 3-4 second clip and could afford to spend minutes/hours processing, but can't afford meter+ gaps between actual and reported joint positions. |
Fully agree! I have a feeling like the output data is also overly filtered/smoothed somewhere in the pipeline, possibly at the model fitting stage after the DNN? Another addition could be to output the 2D joint positions straight from the DNN when we need accurate points on the IR frame prior to any model fitting and post filtering stages. I've had these 2 suggestions on the feedback page already in case you want to upvote: |
Thank you for the feedback. Please note that the Body Tracking SDK is currently a preview release and we are actively working on many aspects of quality. The installation process and dependencies are a pain point that we are aware of and are working to fix. As of the preview release the minimum recommended card is a GTX1070: Please keep sending us your feedback as new releases come out. |
@Brekel I upvoted your first suggestion some days ago and now - the 2nd one as well. Anyway, I don't think a single hyperparameter tuning would solve the lag-issue. DNNs are notoriously slow, even on high end graphics processors. |
Thanks for the upvote. In any case the DNN is still amazingly robust so not a critique just trying to help the team in making it more awesome and useful for more scenarios. The more controls we have the easier that will be. |
@cdedmonds Thank you for the swift response! I know the system requirements and that BT SDK is still in preview. But I think it would be better (for both - you and us) to give such a feedback now, while in preview than when the official release is out. I suppose the Azure Kinect-team is much different than the team developed the Kinect-v2 sensor and SDK, but anyway, you should have access to all the documentation and internal info from back then. It may be worth comparing the K4A body tracking to the body tracking of Kinect-v2, in means of performance, accuracy, features, extras, etc. It would be good to get a well performing, more accurate and more versatile body tracking in the end than the previous one. Please keep this issue open, so we could comment when the new BT SDK releases come out. When do you plan to release the next preview version, by the way? |
We appreciate the feedback. Keep it coming. |
@cdedmonds I would say the key part is exposing as much control as possible to make things adaptable to a wide range of scenarios. For example:
Again I don't know what is possible with the underlying algorithms and am just guessing, but the more that can be exposed the more wide the use cases. We're clever programmers using this you know :) |
I definitely agree that if you can expose more control parameters, more possible use cases open up in a non-linear fashion, because then developers can work around the limitations by adapting their workflow, or by combining outputs from processing a stream more than once. I think this is particularly true for a device like the Kinect which feels like a revolutionary "general purpose" technology, with a lot of applications that haven't really been developed before. For instance, we would love to have low-latency, low computation positional accuracy. However, if that's not possible, we can always send the mkv files for remote processing on cloud GPUs and then send the data back to the end user. |
Absolutely! |
@cdedmonds Please also look at what the competition is doing: https://developer.apple.com/videos/play/wwdc2019/607 |
For my use cases, the tracking latency and the ability to recognize a person are the key concerns - positional accuracy within 6 inches (roughly 15cm) is sufficient. Multiple sensors can be used to increase accuracy, but there is no way to get around the latency and tracking delay. 60fps depth tracking would probably help alleviate the problem, but if that's not possible with the hardware then reduced latency is a priority. We can always add more cameras, but we can't subtract time. Though I note it would be easier to add more cameras if they didn't each require a PC with a Nvidia card; Intel NUCs would take far less space, and in fact the Azure Kinect fits atop them perfectly. |
I must admit, v0.9.2 has significant improvements in means of performance and accuracy. In this regard, I have a question: do you change the model as well, or only the SDK internally? |
@rfilkov Is there an MSI for 0.9.2? I can't seem to get my hands on it. |
@rflikov v0.9.2 includes an updated model. |
The new model does seem slightly better. The main issue for us is that the model does not output confidence scores for the joints. Most 2D pose estimation have this feature, which makes it easy to deal with situations where the model has to fail (occluded joint). Is this something that can be added? I'll start a new issue with a feature request. |
I'm agree for the confidence score fro each joint. |
@hmexx Here is a feature request in this regard: https://feedback.azure.com/forums/920053-azure-kinect-dk/suggestions/38166871-joint-tracking-state Please upvote or comment there, too. |
@qm13, I may be the only person here satisfied with that answer. Has anyone at Microsoft tried to reproduce the test I posted at https://youtu.be/7Jc7KhoPWdc yet? Considering that I found that Microsoft's example code for realtime, lowest-latency-possible output has an error that causes it to buffer an extra frame and thus always be 1 frame further behind than it could be - and the example code has yet to be updated with a correction - I question the testing (if any) of bodytracking latency. I am still of the opinion that either the DNN is designed to withhold the result until it gets later frames (thus returning a 'stale' result), or some part of the firmware or SDK is delaying frames when it shouldn't be. I could partially test this by creating 'artificial' frames (a saved video+depth feed that is edited and re-arranged by hand) to feed into the DNN, which would let me more easily catch it delaying frames if the delay is on the DNN side. This would take far more than a reasonable amount of my time to do, and I don't think the editing toolset for that even exists so I would need to create it. Beyond that, I think we all know what the real, long-term solution must be: integrate the CUDA cores and about 1GB RAM into the sensor. |
I created issue #1253 to suggest integrating the body tracker into a V2 of the sensor. That would solve all of the problems. It may be solving it by increasing the price of the sensor, but I'd pay $4000 per sensor if it had integrated body tracking and 50ms body tracking latency. Beyond that, I stand by my statement that the current Nvidia-GPU dependent approach is fine for me IF the latency issue can be fixed. My GPU processing time is 22ms per frame, but that does not jive well with the real-world measured (and demonstrated) latency of the bodytracking latency. |
@qm13, thank you for the update. For us, IF the new DNN version allow us to run with consumer grade laptop (without Nvidia GTX GPU) and maintain the accuracy level of the current DNN, it's fine ! |
I'm using RTX 2060 super with a single kinect, the frame rate is almost 30 FPS and the latency is acceptable. |
@qm13 Thank you for the update and the transparency. Nevertheless I must say that I am deeply disappointed by this decision. You have a dedicated community that is begging on all possible channels for a funtionality that was already implemented years ago because otherwise alot of interesting and serious applications, especially in the health sector, are not feasible. And your (the development teams) answer is: "Sorry we want to play with the current hot stuff" namely DNNs. I get it, it is a fascinating technology but you are ignoring all the people that actually want to use your sensor for something useful that can have a positive impact on many peoples life. |
@qm13 @wes-b Please mark the feature request as ‘unplanned’, to show the dev community how much you value their input. This is the most discussed issue and second most requested feature - porting the option over would be a breeze for any decent developer. |
If there is nothing else to add here, I'd like to close this very long and fruitless thread. A year later, it's also a bit too late to start implementing the most wanted (or any) feature requests. |
Added prediction methods based on velocity calculations to update feet and hip positions when kinect prediction has not processed yet. This decreased the latency of updating the trackers in space (made the trackers track quicker movements better). Right now only positions are updated. Velocity stays constant for performance until the impact of adding this can be benchmarked. Reduced estimated time differences in velocity calculation to accomodate reduced update latency Added velocity calculation for hip tracker Changed the way the vector position was updated. Instead of using just the raw data, the raw data is merged with the predicted position (same as the prediction method above) at a ratio of 6 to 4. This decreased the variability of the data (made the trackers less jittery) for the feet to a level I found usable. Removed temporal smoothing tweaks in the calibrator as any level of smoothing set above 0.05 resulted in trackers that were too slow to use. In the future, (maybe if more people want this functionality) timing data in the function needs to be changed to be dynamically computed instead of hardcoded estimations (estimations based on benchmarks found here microsoft/Azure-Kinect-Sensor-SDK#514). This needs to be done to accomadate different machine setups and to also support newer software/hardware that can process frames at a higher fps. TODOs : 1: figure out a way to preserve hip positional data when rotating body ( when body is rotated the hip's predicted position becomes whatever side is facing the camera. This breaks some of the rotational tracking capabilities of the device. Potentially look into setting the hip tracker on one or some combination of the hip bones instead of the pelvis bone? 2: Measure the performance impact of predicting data vs using only raw data 3: Implement the method for more trackers 4: Add accurate timing data into the prediction method so the predicted values can be more accurate for every machine (not just my testing rig). Testing Rig Specs: i7-8700k MSI 370 A-Pro Motherboard 32gb ddr4 ram rx 580 8gb gtx 960 2gb Rift S Kinect Azure Windows 10 All drivers and software up to date 7/26/2020
My GPU is Nvidia RTX 2060. It runs at about 30 FPS. |
@Chang-Che-Kuei Developers switching from Kinect2 to Kinect4A expect similar performance on similar hardware, which is not the case with Kinect4A. |
They promised a lighter, more performant DNN model, that works across different GPU manufacturers and then completely abandoned development. |
It is also a bit worrying that there is still no compatible release of the body tracking sdk with the new NVidia 3000 series and its a known issue since obtober: Or that they never released Ubuntu binaries for a body tracking sdk that is compatible with sensor sdk 1.4. |
@fractalfantasy @RoseFlunder as far as I know, they were still on it with the hopes of a late 2020 release, but it seems it's been delayed. But given microsoft track record of announcing projects cancelations, it would very well be the case that it's been silently canceled. Anyway, on our side, we moved on to look for alternate solutions that don't require this BodyTracking nor the K4A camera. We simply could not afford waiting a whole year for a solution. |
@vpenades may I ask if you found any alternate solutions? I was hoping that maybe Nuitrack would support the Azure Kinect, but doesn't seem to be the case. I looked into other sensors couldn't find any competitor that has good resolution... Intel sensors aren't even comparable to Kinect v2. |
@fractalfantasy I can't comment, sorry. But our solution is tailored to our use case. It's still not as good as what the Kinect2 delivered, though, so if the kinect team finally delivers something that really improves (a lot) on what's currently available, then we might switch back to it. |
@fractalfantasy There are not many options - Intel RealSense, Orbbec (their new sensor looks promising), Apple's iPad-Pro/iPhone-Pro (with some limitations). Kinect-v2 is still the best one out there. Unfortunately, MS are famous for ruining or canceling their best products, as a result of not listening. Deja vu. |
I would be surprised if the Azure Kinect project was cancelled. MS has been active on responding to many issues as recently as a day or two ago. Though the silence on issue #1125 is frustrating... |
@L4Z3RC47 Yes, they provide some minimum of customer support. But look at the progress of development here: https://feedback.azure.com/forums/920053-azure-kinect-dk It hasn't moved an inch since one year ago. |
@rfilkov Not only the feedback page hasn't moved. There's not been any commit to the main repository since 1-july-2020. And I'd be surprised if they consider the driver side "completed" |
@L4Z3RC47 They answer these forum posts so that senior management thinks they are being active, and then go off and play video games. |
I don't think that's the case... nevertheless, if there's been delays to deliver, at least they could be mitigated with better communication. |
@rfilkov wow this new orbbec sensor looks like an Azure Kinect, wonder how it will compare resolution-wise |
If you are interested, Nuitrack currently supports Kinect Azure and provides fast CPU-only skeletal tracking with it. |
Thanks for the tip. I'm aware of nuitrack and I've been following their progress for a while now.... unfortunately, our use case prevents us from using NuiTrack for a number of reasons. Long story short: our customers demand low end, easy to use, easy to install, no headaches, no power hungry, no internet required, and easy to purchase, solutions. The Kinect2 met almost all these requirements. Everything that came afterwards for one reason or another, falls short from these requirements. Fortunately for us, we're progressing on solutions that only require a webcam, it's not as good as the Kinect2, but we've managed to fit our requirements with it. |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
By all means Azure Kinect is the best Kinect so far, and will be probably the best depth sensor on the market. The sensor SDK is pretty stable and good, providing almost everything an average user would want. But the body tracking subsystem is ruining this positive user experience. In means of API this SDK is great too, but the DNN model performance is much worse than the body tracking of Kinect-v2. The joint positions are inaccurate by fast movements. The body index map is not very accurate, as well. It does not fully match the user's silhouette on the depth frame. On my GTX 1060 it takes 2-3 depth frame cycles to process a body frame. Hence, it works at about 10 fps.
To Reproduce
Expected behavior
Please consider at least providing some option to the users, who don't have high end graphics cards and would like to get Kinect body tracking out of the box, without (or with minimum) extra installations. As far as I remember, Kinect-v2 used random forest model(s) for its body tracking. The performance was great and no extra installations were needed, back then in 2013/14.
Logs
Screenshots
Desktop (please complete the following information):
Additional context
I believe most Kinect users would expect better, more accurate and more performant body tracking experience, not worse. And now, with Apple adding people segmentation and body tracking to their AR-Kit 3.0 I would expect Kinect (with all these years of experience) to provide a better user experience in all aspects than anybody else.
The text was updated successfully, but these errors were encountered: