Skip to content

Conversation

Hocuri
Copy link
Collaborator

@Hocuri Hocuri commented May 12, 2025

This way, the statistics / self-reporting bot will be made into an opt-in regular sending of statistics, where you enable the setting once and then they will be sent automatically. The statistics will be sent to a bot, so that the user can see exactly which data is being sent, and how often. The chat will be archived and muted by default, so that it doesn't disturb the user.

The collected statistics will focus on the public-key-verification that is performed while scanning a QR code. Later on, we can add more statistics to collect.

Context:

This is just to give a rough idea; I realize that I would need to write a lot more than a few paragraphs in order to fully explain all the context here.

End-to-end encrypted messengers are generally susceptible to MitM attacks. In order to mitigate against this, messengers offer some way of verifying the chat partner's public key. However, numerous studies found that most popular messengers implement this public-key-verification in a way that is not understood by users, and therefore ineffective - a 2021 "State of Knowledge" paper concludes:

Based on our evaluation, we have determined that all current E2EE apps, particularly when operating in opportunistic E2EE mode, are incapable of repelling active man-in-the-middle (MitM) attacks. In addition, we find that none of the current E2EE apps provide better and more usable [public key verification] ceremonies, resulting in insecure E2EE communications against active MitM attacks.

This is why Delta Chat tries to go a different route: When the user scans a QR code (regardless of whether the QR code creates a 1:1 chat, invites to a group, or subscribes to a broadcast channel), a public-key-verification is performed in the background, without the user even having to know about this.

The statistics collected here are supposed to tell us whether Delta Chat succeeds to nudge the users into using QR codes in a way that is secure against MitM attacks.

Plan for statistics-sending:

  • Get this PR reviewed and merged (but don't make it available in the UI yet; if Android wants to make a release in the meantime, I will create a PR that removes the option there)
  • Write something for people who are interested in what exactly we count, and link to it (see TODO[blog post] in the code)
  • Prepare a short survey for participants
  • Fine-tune the texts at [WIP] Make sending of statistics into a setting deltachat/deltachat-android#3794, and get it reviewed and merged
  • After the next release, ask people to enable the statistics-sending

Hocuri added a commit to deltachat/deltachat-android that referenced this pull request May 12, 2025
@Hocuri Hocuri force-pushed the hoc/send-statistics-setting branch from a4d64cb to 5619601 Compare June 20, 2025 09:17
@Hocuri Hocuri changed the base branch from main to link2xt/pgp-contacts June 20, 2025 15:55
Hocuri added a commit to deltachat/deltachat-android that referenced this pull request Jun 25, 2025
Base automatically changed from link2xt/pgp-contacts to main June 26, 2025 14:06
@link2xt link2xt force-pushed the main branch 2 times, most recently from 285d80a to 416131b Compare June 26, 2025 14:07
@Hocuri Hocuri force-pushed the hoc/send-statistics-setting branch 2 times, most recently from 932b191 to 8895fd8 Compare July 8, 2025 15:06
@Hocuri Hocuri force-pushed the hoc/send-statistics-setting branch from 511aaa9 to 966124a Compare July 15, 2025 15:24
@Hocuri Hocuri changed the title [WIP] Make sending of statistics into a setting [WIP] Opt-in weekly sending of statistics Jul 31, 2025
@Hocuri Hocuri force-pushed the hoc/send-statistics-setting branch from 966124a to b1c57ee Compare August 14, 2025 16:15
Hocuri added a commit to deltachat/deltachat-android that referenced this pull request Aug 15, 2025
@Hocuri Hocuri force-pushed the hoc/send-statistics-setting branch from 6b9f05f to 4314864 Compare August 15, 2025 11:56
@Hocuri Hocuri changed the title [WIP] Opt-in weekly sending of statistics Opt-in weekly sending of statistics Aug 15, 2025
@Hocuri Hocuri requested review from link2xt and iequidoo August 15, 2025 12:46
|| is_mdn
|| chat_id_blocked == Blocked::Yes
|| group_changes.silent
|| mime_parser.from.addr == STATISTICS_BOT_EMAIL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the chat is archived and muted by default, maybe there's no need in this check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this check, you get this annoying number-in-a-circle for the archived chats:

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this, assigning InNoticed is enough. For blocked chat messages and silent group changes, the same btw

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why make the logic more complicated when it doesn't matter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact this will make the logic less complicated, otherwise the fix in #6415 is harder to implement. TL;DR: InSeen messages should cause MsgsNoticed events, but InNoticed shouldn't. Probably we have some naming issue here (MsgsSeen would be better), but it's how it (almost) works currently. "Almost", because only imap::Session::sync_seen_flags() emits this event, but not receive_imf.

.log_err(context)
.ok();

set_last_excluded_msg_id(context).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add last_msg_id to struct Statistics and exclude it from serialization. This way we won't miss messages stored into the db concurrently

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we would need to:

  • make the two invocations of get_message_stats() use a single transaction, which is non-trivial
  • introduce get_statistics_inner() or similar, which returns the struct rather than the serialized string (the tests would then still use get_statistics())
  • make set_last_counted_msg_id() take an optional message id

Since it's both super unlikely that a message will be missed, and completely random which message will be missed (-> it won't skew our statistics), I don't think it's worth the effort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need in calling get_message_stats() in the same transaction, probably just passing some last_counted_msg_new to it works. The get_config_u32(Config::StatsLastCountedMsgId) call can be moved into get_message_stats() OTOH.

But this is not critical indeed, it's unlikely that this may cause a notable inconsistency in statistics.

Hocuri added 8 commits August 25, 2025 17:26
This way, the statistics / self-reporting bot will be made into an opt-in regular sending of statistics, where you enable the setting once and then they will be sent automatically. The statistics will be sent to a bot, so that the user can see exactly which data is being sent, and how often. The chat will be archived and muted by default, so that it doesn't disturb the user.
@Hocuri Hocuri force-pushed the hoc/send-statistics-setting branch from cc70f41 to 7ef56d6 Compare August 25, 2025 15:26
Hocuri added a commit to deltachat/deltachat-android that referenced this pull request Aug 25, 2025
@Hocuri Hocuri requested a review from iequidoo August 25, 2025 16:02
&self,
account_id: u32,
qr: String,
source: Option<u32>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is None different from Some(Unknown) for both parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, I first thought that making them an Option will make them optional parameters in the JsonRPC, but this didn't actually work. I'll just make them u32.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, the Option does do a good job at communicating that it's ok to pass nothing here, while this is less clear when it's just a u32

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this Unknown is defined in the code somewhere, it will also do this good job. If Option isn't needed technically, i'm for removing it.

|| is_mdn
|| chat_id_blocked == Blocked::Yes
|| group_changes.silent
|| mime_parser.from.addr == STATISTICS_BOT_EMAIL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this, assigning InNoticed is enough. For blocked chat messages and silent group changes, the same btw

.await
.context("Failed to send statistics message")
.log_err(context)
.ok();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can lose some message statistics (because it's a diff), but not SecureJoin statistics (because it's integral). This may create a disproportion in reported statistics. A possible solution is to make SecureJoin statistics differential as well, i.e. reset it together with setting StatsLastCountedMsgId.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Securejoin statistics and message statistics are independent from each other, it's not necessary for them to be proportional.

But I'll instead make sure that we try again if Delta Chat is killed by the system while sending statistics, or there is some other spurious error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Securejoin statistics and message statistics are independent from each other, it's not necessary for them to be proportional.

While this may look fine for now, we don't know how we'll use reported statistics in the future, maybe some calculations will require both message and SecureJoin statistics. Maybe we can make message statistics integral as well, e.g. by remembering the whole serialized statistics struct in the db? Then we can calculate statistics for new messages (like you already do), add it to the stored statistics and send the resulting integral statistics to the bot. Then also another problem mentioned in #6851 (comment) is solved. The bot can't integrate the statistics correctly because it doesn't know which state is referenced by the received diff (maybe that state was never received at all). If we have any problems in the client-bot interaction, errors will accumulate.

/// If false, only messages in other chats (groups and broadcast channels) are counted.
async fn get_message_stats(
context: &Context,
last_counted_msg: u32,
Copy link
Collaborator

@iequidoo iequidoo Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should include reported messages based on timestamp_sent instead, i.e. those which fall into the current week period. Otherwise if an old backup is restored, the same messages will be reported again. Also we should include this time period into reported stats so that we can understand if the client clock is correct.
...
Apparently it's better not to report differential statistics at all, see #6851 (comment) for reasoning and possible solution.

@iequidoo iequidoo self-requested a review August 28, 2025 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants