Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Video/object track aggregation event #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Video/object track aggregation event #4
Changes from all commits
660da53
33791b5
e287219
ec20b93
e4152ab
4dbfcd1
6894d3c
18df87d
255f872
6072b38
9d7b4dc
8152bbf
9e129bc
98c8542
9ad4037
5e714aa
13fcd6a
18b6cdc
106aaac
b0cf6a7
8751ef8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think it would be beneficial to add more context describing this feature.
Things to consider are:
We are not sure if this is the correct place to add it given how other rules are described in Appendix A, but we suggest to make room for it in the specification.
As an example we could minimally add something like here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to push this further can you please prepare the minimal example by resolving the <Insert description..> part?
As for the other parts, since you are not sure what is the correct place, I suggest not doing that now. If you have important details in the other parts, such as a limitation, it can be added to the minimal suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think this feature is unclear in its scope and intention. I have not been part in formulating this feature and I can not guess the thought process or intentions of those who did. Therefore I think it is better if someone involved with designing this feature as it currently stands tries to add some more context with above considerations. Don't you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will propose to add the below text to the spec right at the start of the section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Promised @dfahlen to generally comment on these: ( the comments shall not be added to the standard)
Description of the general feature and how it relates to the scene description data described in, 5 Scene Description.
Object track aggregation can be used as an alternative to the scene description. The object track contain an object appearance aggregated and consolidated over all frames that can be used in forensic search. An image can be provided to of the object to be used for subsequent automated analysis or manual operator verification on the client. Benefits can also include reduction of transferred data and spending less resources aggregating data in the client.
Scene description datatypes are reused but describe general appearance features rather than per frame appearance.
Description of intended use.
Main use case is forensic search. See above.
Describe limitations
When consolidating a track details are lost. Selected details can be offered by vendors in the object track attribute where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe in more detail what you mean by "object appearance aggregated and consolidated over all frames"? How does the data structure included in the ObjectTrack/Aggregation message support the representation this?
Additionally, tt:Object includes fields for more information than just visual feature of an object, such as bounding box, geo position, speed etc. Does those fields lack meaning in this context? If they have no clear interpretation this should be mentioned in the standard to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification of "A.1.2.1 Generic parameters" in https://www.onvif.org/specs/srv/analytics/ONVIF-Analytics-Service-Spec.pdf
It mentions some parameters for events defined in Appendex A. How should these be handled in the context of the Object Track aggregation rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes, region should be supported (not a MUST) otherwise we are opening up this event to include lot more objects if visible in scene, so its an additional filter like class to minimize what we send to client.
From PoC, we can say we did not have time to implement, let's see if WG objects to that.
Same for Armed, some rules even after creation needs an explict 'enable/disable' to actually trigger events and if our rules falls under such implementation pattern we may need to signal that support in options, for now we don't need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a region based rule this should be clarified as it is not obvious. In other "region based rules" e.g. "A.4 Loitering Detector" the field parameter has been explicitly added to the parameter list in "A.4 Loitering Detector".
If this is intended to support the armed this needs to be clarified as well. Also a general description of the workflow regarding armed/disarmed would be beneficial as it is not obvious what a client is expected to do with regards to the armed parameter.
Personally, I do not see a reason why the armed parameter is needed and I am not sure the region parameter is needed but that is besides the point. The point is that regardless of if the region and armed parameters should be included or not in the rule it sould al last be clear from the spec if they are or not.
Do you agree?
As a note: Neither "Filed" or "Armed" are currently supported in the PoC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be implementation dependent.
If vendor wants to provide client with additional option of configuring the field, they would respond the same in the GetRuleOptions for client to leverage such option in CreateRule.
If vendor implements the rule without any field support (missing field in GetRuleOptions) devices may trigger more events as its viewing more FOV.
For some rules having field explicitly, many of those specs are written years ago while the section "Generic parameters" section with field/armed is a bit newer update added to avoid adding field/armed into every new rule that gets included in the spec in future and hence also made 'rule should contain a parameter 'Field' ElementItem. ' (Note the should and not SHALL - to keep it open for vendor implementations/flavors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment:
It is not clear from reading the document what the meaning of mandatory/optional parameters in this context should be.
What is the interpretation of mandatory parameters?
mandatory to support by a device but optional for a user to set
or
mandatory to support by a device a and mandatory for a user to set
Similarly, what is the interpretation of optional parameters?
Mandatory for a device to support and optional for a user to set.
or
Optional to implement and optional for a user to set.
Given the text in 6.2.3.3 CreateRules
"The device shall accept adding of analytics rules with an empty Parameter definition"
The interpretation should probably be,
Mandatory - Mandatory for a device to support and optional for a user to set.
Optional - Optional to implement and optional for a user to set.
It may still be confusing when parameters that are optional support by a device specify default behavior. Does a device not supporting a specific parameter need to consider the default behavior or not? If not, it may be confusing that a device may function differently dependent on if an optional parameter is supported by the device or not.
This is exemplified by the similar comment regarding the parameter ReportTimeInterval.
It would be great to clarify this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes interpretation should be
This observation is true for Rule Parameters + payload description in spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you agree this may be confusing? Do you think this can be clarified in the sepc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general guideline, ONVIF spec is always written from device perspective - unless explicitly stated otherwise.
So if parameter (in rule configuration) is mentioned as optional, its upto the device to support that option and hence optional for client to configure (subject to availability from device)
So if a field in event payload is is mentioned as optional, its upto the device to include/exclude from the event payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, on how to handle multiple parameter specifying a filter for a set of classification candidates.
How does the class filter and confidence level work in combination with multiple classification candidates?
Are they independent or work in conjunction with each other.
Independent interpretation: (Feels wrong)
*Is any of the candidates the following class ""
Combined interpretation: (Feels more correct)
Example, "Independent interpretation":
Data:
{Human: 0.9, Animal: 0.7}
Filter:
class filter = [Animal]
confidence = 0.8
Output:
The message passes the filter,
{Human: 0.9, Animal: 0.7}
Example, "Combined interpretation":
Data:
{Human: 0.9, Animal: 0.7}
Filter:
class filter = [Animal]
confidence = 0.8
Output:
The message does not pass the filter
No candidate is both Animal and above 0.8 in confidence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment:
In a class filter is it possible to specify unclassified objects?
Example:
A consumer is interested in all ObjectTrackAggregation events with Unclassified objects and Humans but not Vehicles.
How to specify this class filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a class filter or confidence filter is used it is possible to get an Update event but no Final Event.
This makes it impossible for the user to distinguish between not ever receiving a final event and that the object is still in the scene.
Example:
Confidence Level = 0.5
Update event has {Human: 0.7} and is sent to the user.
Final event has {Human: 0.4 } and is not sent to the user.
When does the user know that the track is over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add a clarification that says, irrespective of the filter - Final aggregation has to be sent when track ends, we can bring this up before 25.06.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, great, lets add this comment to the public pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update pushed into PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the PoC to support this in order to follow updated spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this element intended to be interpreted and used?
tt:Object as described in "5.3.1 Objects" is intended to be used to describe the state of an object at a single point in time.
In this context the element description suggests that tt:Object should be used to describe an "aggregation" of data.
It is not clear from the description of the element nor the surrounding context what type of "aggregation" is to be expected. Furthermore since tt:Object was originally designed to describe the state of an object at a single point in time most of the elements in the data structure has no natural interpretation when representing some sort of "aggregate".
From this I am not convinced tt:Object is the correct data structure to use for the purpose of "aggregation".
Regardless of what data structure that is ultimately used, I think it needs to be further described what type of "aggregation" that is intended here as the word "aggregation" has a very broad interpretation and in isolation it does not describe the meaning or interpretation of some piece data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am open to a recommendation/proposal text from you/team on the lines of "It is recommended to enable ObjectTrack data in sparsely crowded scenes to optimize data produced from the device." to explain what could be part of aggregation and we don't want want to get back to WG for details that they care less about.
Here is my proposal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also see it as a benefit of not locking down the the implementation details, in any large system not just onvif, since it allows us to change it using more advanced technology in the future. As long as the scope of the consolidation is known, time interval + object id + source/producer, it's sufficiently defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will collect all clarifications into a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not talking about implementation details. I am talking about the interface of this feature, that is not a detail nor is it technology dependent.
The tt:Object data structure is part of the interface. As that is the case it needs a clear interpretation and meaning. As I pointed out above I do not think it does as the original intention of the data structure is to represent the state of an object at a single point in time.
If tt:Object has no clear meaning that can be described in the standard I think it is better to leave it out and instead make clear that kind of data needs to be delivered as vendor extensions.
If we really want to standardize a data structure it needs a standard interpretation otherwise there is no point in standardizing it.
I am am asking for one of the following:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numerous discussions in WG did not yield much progress and tt:Object is the only option available for now, though we all know its not perfect, without that the whole event/feature has absolutely no base at all.
I don't see ONVIF inventing anything complementing tt:Object in near future (next 3-4 years)
In that context, I would say lets add more text about "interpretation of tt:Object should be in this context." in order to render what we did in last year little more meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed text
This event may be used to describe an object in the scene alternative or complementary to Scene description generated per frame by aggregating data collected per object for an object track. The process by which devices aggregate object track data for a given object is out of scope of this specification.
The Object Track Aggregation rule generates an object track aggregation event for below listed scenarios
Optionally, Initial aggregation after an object appears OR an Intermediate aggregation while the object is present in the field of view.
Final aggregation when the object is no longer visible in the field of view.
Optionally, device may include additional object track data for e.g. snapshot image as configured in ObjectTrackDataFilter parameter.
It is recommended to enable ObjectTrack data in sparsely crowded scenes to optimize data produced from the device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is a start but I still think we need additional clarification.
The main issue is not that it is up to the vendor how "aggregation" is done, the specification does not need describe that. This is the same for other features. E.g. the standard describes how to represent the state of tracked object in a video frame with tt:Frame and tt:Object and how that data should be interpreted, but it says nothing about how the tracker should be implemented.
So, what is needed here is a description of how tt:Object should be interpreted in the context of "aggregation" not the process by which the aggregation is done by a vendor.
Additionally, the above suggested text mentions "object track". This is the first time in the standard that that term is used. I think it needs to be defined what "object track" mean in the context of the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A benchmark for a "good" description is that a human should be able to read the description on how the data should be represented and interpreted and then be able to annotate a video using that representation.
This way we can ensure representation is meaningful an possible to explain. It is also possible to develop benchmarks that measure how well a machine can perform the task by comparing the machine output to a human annotation.