-
-
Notifications
You must be signed in to change notification settings - Fork 106
Opt-in weekly sending of statistics #6851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Counterpart of chatmail/core#6851
a4d64cb
to
5619601
Compare
Counterpart of chatmail/core#6851
285d80a
to
416131b
Compare
932b191
to
8895fd8
Compare
511aaa9
to
966124a
Compare
966124a
to
b1c57ee
Compare
Counterpart of chatmail/core#6851
6b9f05f
to
4314864
Compare
|| is_mdn | ||
|| chat_id_blocked == Blocked::Yes | ||
|| group_changes.silent | ||
|| mime_parser.from.addr == STATISTICS_BOT_EMAIL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the chat is archived and muted by default, maybe there's no need in this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid this, assigning InNoticed
is enough. For blocked chat messages and silent group changes, the same btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why make the logic more complicated when it doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact this will make the logic less complicated, otherwise the fix in #6415 is harder to implement. TL;DR: InSeen
messages should cause MsgsNoticed
events, but InNoticed
shouldn't. Probably we have some naming issue here (MsgsSeen
would be better), but it's how it (almost) works currently. "Almost", because only imap::Session::sync_seen_flags()
emits this event, but not receive_imf
.
src/statistics.rs
Outdated
.log_err(context) | ||
.ok(); | ||
|
||
set_last_excluded_msg_id(context).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add last_msg_id
to struct Statistics
and exclude it from serialization. This way we won't miss messages stored into the db concurrently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we would need to:
- make the two invocations of
get_message_stats()
use a single transaction, which is non-trivial - introduce
get_statistics_inner()
or similar, which returns the struct rather than the serialized string (the tests would then still useget_statistics()
) - make
set_last_counted_msg_id()
take an optional message id
Since it's both super unlikely that a message will be missed, and completely random which message will be missed (-> it won't skew our statistics), I don't think it's worth the effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need in calling get_message_stats()
in the same transaction, probably just passing some last_counted_msg_new
to it works. The get_config_u32(Config::StatsLastCountedMsgId)
call can be moved into get_message_stats()
OTOH.
But this is not critical indeed, it's unlikely that this may cause a notable inconsistency in statistics.
This way, the statistics / self-reporting bot will be made into an opt-in regular sending of statistics, where you enable the setting once and then they will be sent automatically. The statistics will be sent to a bot, so that the user can see exactly which data is being sent, and how often. The chat will be archived and muted by default, so that it doesn't disturb the user.
Co-authored-by: iequidoo <[email protected]>
cc70f41
to
7ef56d6
Compare
Counterpart of chatmail/core#6851
&self, | ||
account_id: u32, | ||
qr: String, | ||
source: Option<u32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is None
different from Some(Unknown)
for both parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all, I first thought that making them an Option will make them optional parameters in the JsonRPC, but this didn't actually work. I'll just make them u32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, the Option
does do a good job at communicating that it's ok to pass nothing here, while this is less clear when it's just a u32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this Unknown
is defined in the code somewhere, it will also do this good job. If Option
isn't needed technically, i'm for removing it.
|| is_mdn | ||
|| chat_id_blocked == Blocked::Yes | ||
|| group_changes.silent | ||
|| mime_parser.from.addr == STATISTICS_BOT_EMAIL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid this, assigning InNoticed
is enough. For blocked chat messages and silent group changes, the same btw
.await | ||
.context("Failed to send statistics message") | ||
.log_err(context) | ||
.ok(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can lose some message statistics (because it's a diff), but not SecureJoin statistics (because it's integral). This may create a disproportion in reported statistics. A possible solution is to make SecureJoin statistics differential as well, i.e. reset it together with setting StatsLastCountedMsgId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Securejoin statistics and message statistics are independent from each other, it's not necessary for them to be proportional.
But I'll instead make sure that we try again if Delta Chat is killed by the system while sending statistics, or there is some other spurious error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Securejoin statistics and message statistics are independent from each other, it's not necessary for them to be proportional.
While this may look fine for now, we don't know how we'll use reported statistics in the future, maybe some calculations will require both message and SecureJoin statistics. Maybe we can make message statistics integral as well, e.g. by remembering the whole serialized statistics struct in the db? Then we can calculate statistics for new messages (like you already do), add it to the stored statistics and send the resulting integral statistics to the bot. Then also another problem mentioned in #6851 (comment) is solved. The bot can't integrate the statistics correctly because it doesn't know which state is referenced by the received diff (maybe that state was never received at all). If we have any problems in the client-bot interaction, errors will accumulate.
/// If false, only messages in other chats (groups and broadcast channels) are counted. | ||
async fn get_message_stats( | ||
context: &Context, | ||
last_counted_msg: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should include reported messages based on timestamp_sent
instead, i.e. those which fall into the current week period. Otherwise if an old backup is restored, the same messages will be reported again. Also we should include this time period into reported stats so that we can understand if the client clock is correct.
...
Apparently it's better not to report differential statistics at all, see #6851 (comment) for reasoning and possible solution.
This way, the statistics / self-reporting bot will be made into an opt-in regular sending of statistics, where you enable the setting once and then they will be sent automatically. The statistics will be sent to a bot, so that the user can see exactly which data is being sent, and how often. The chat will be archived and muted by default, so that it doesn't disturb the user.
The collected statistics will focus on the public-key-verification that is performed while scanning a QR code. Later on, we can add more statistics to collect.
Context:
This is just to give a rough idea; I realize that I would need to write a lot more than a few paragraphs in order to fully explain all the context here.
End-to-end encrypted messengers are generally susceptible to MitM attacks. In order to mitigate against this, messengers offer some way of verifying the chat partner's public key. However, numerous studies found that most popular messengers implement this public-key-verification in a way that is not understood by users, and therefore ineffective - a 2021 "State of Knowledge" paper concludes:
This is why Delta Chat tries to go a different route: When the user scans a QR code (regardless of whether the QR code creates a 1:1 chat, invites to a group, or subscribes to a broadcast channel), a public-key-verification is performed in the background, without the user even having to know about this.
The statistics collected here are supposed to tell us whether Delta Chat succeeds to nudge the users into using QR codes in a way that is secure against MitM attacks.
Plan for statistics-sending:
TODO[blog post]
in the code)