Skip to content

Conversation

@riccardobl
Copy link
Contributor

@riccardobl riccardobl commented Sep 29, 2025

Based on #363 - mostly a refactoring with some generalization

This NIP describes how Nostr relays can be used for the signaling required to establish direct peer-to-peer connections between two or more participants.

The first part defines a generic abstraction of the signaling events, and the second part specifies an implementation for WebRTC data channels (ie. generic binary packets).

@riccardobl
Copy link
Contributor Author

riccardobl commented Sep 29, 2025

@chakany pinging you to ask what you think about this refactoring of your original nip, and if you are ok with being credited as an author

@vitorpamplona
Copy link
Collaborator

NIPs tend to do better when they are specific, not generalistic. Where are you going to apply this protocol?

For instance, we don't want the "presence" to be used by both a text-only chat client that doesn't support video and a video call client that doesn't support chats. Additionally, A voice-only client should not use the same protocol of those other two. Clients that are only sharing files need other clients that can interpret file-sharing, etc.

Otherwise, people will know they are in the same place, but their clients cannot see or talk to each other, even though they implement this NIP.

@riccardobl
Copy link
Contributor Author

riccardobl commented Sep 29, 2025

Thanks for your feedback, I’m using an earlier iteration of this for the netcode in ngengine.
This nip only covers the "signaling" layer of the p2p connection.

The idea is that every P2P app needs a way for peers to discover each other and coordinate the initial connection. Typically, this is handled by centralized servers or by a DHT, as in the case of Holepunch’s Hyperswarm and other decentralized protocols.

This NIP standardizes a way for apps to do this on Nostr. However, it doesn’t make apps inherently interoperable, but it can be a base layer for other nips. I think it is similar to nip-90 in this regard.

I can think of two ways to use this nip:

  • The y and i tags can be used to filter presence events for apps known to the user. Once filtered, the connection can then be passed to the app after being established through this NIP.
  • Apps can connect directly to compatible rooms through some external discovery, eg. by asking the user to input the room privkey

Maybe the "Standard protocols" section can be rewritten as an "example", so to not make this nip tied specifically to some undefined use of webrtc .

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Sep 29, 2025

However, it doesn’t make apps inherently interoperable, but it can be a base layer for other nips. I think it is similar to nip-90 in this regard.

Sure, but the point of this repository is to make things interoperable. The way NIP-90 solves this is by having a separate repo with a separate event kind list and documentation procedure for each type of DVM, fully documented. For this PR, I guess it would mean that each game would have to document their game protocol to allow other clients to code and run the same game from start to finish.

Also, keep in mind that NIP-90 is a terrible NIP. There has been multiple efforts to completely rewrite it because right now it is so general that it doesn't really help anyone. None of those efforts are moving forward because nobody agrees to anything. Which is the worst it can happen to a repo whose sole purpose is to get implementers together not further apart.

@riccardobl
Copy link
Contributor Author

riccardobl commented Sep 29, 2025

I’ve been experimenting with NIP-90 and I get where you’re coming from, but this NIP works at a much lower level.

The signaling layer is an independent issue from the app protocol itself, which means it can be implemented in standard Nostr libraries and be interoperable. The actual data exchange layer is a separate concern, that’s beyond the scope of this PR, except for a few general notes on how it might be facilitated.

If you pair this signaling NIP with any UDP-based transport, NAT traversal, and a STUN server, you already have everything you need for a P2P connection.

WebRTC conveniently handles transport and NAT traversal and is widely available. What’s left is the signaling, which this NIP provides, and STUN, which could also be delivered via relays.

This leads to a standard way for developers to build P2P apps on Nostr, that doesn't require to reinvent the wheel for every app. I hope this clarifies my point of view, that is about making the signaling part of p2p protocols interoperable, not the apps built on top of them.

If you still disagree, that’s fine, I think this is a complex issue and there are many valid perspectives.

Anecdotally, I was initially more drawn to Holepunch because their modules make building P2P apps easy, that's one of the things i am working to improve, because I think we can achieve the same on Nostr, with the added benefit of full browser support, using NIPs like this and their corresponding implementations in standard libraries. Which is why I’m bundling WebRTC support in nostr4j.
Having this boilerplate stuff standardized would make a lot of code reusable.

An added bonus is that relays and clients will recognize (or ignore) these events as likely P2P signaling, rather than having to handle apps embedding signaling data in DMs or other unrelated or conflicting event kinds.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Sep 29, 2025

The signaling layer is an independent issue from the app protocol itself, which means it can be implemented in standard Nostr libraries and be interoperable.

I think it is a big assumption that every P2P app needs the same signaling layer. At least, from some early tests I did 2 years ago, only the "presence" message was needed, and it didn't need a public key or expiration.

When you say things like: "The offer format is protocol specific". This means that if I am coding this, I will get offers all the time in formats I don't know and thus cannot support. That seems like wasted resources. My client should only be downloading the events it can parse in the application layer. So, to me, all of these other message types are all application dependent and should probably be using event kinds that only their app download.

Otherwise, if this is successful and everybody uses it for their game, clients are going to connect to a relay, download thousands of stuff they cannot parse before the first thing they can. For instance, if a client uses this but not for webrtc then it needs a way to download everything BUT the i=webrtc tag, which is not a possible filter on Nostr. The client will have to download everything and throw away everything that it does not support.

Shouldn't your "disconnect" message link to the respective "connect" one? Otherwise, if a client (or multiple clients) is connected to 3 protocols/games and disconnects from 2, how would the others know which one is being disconnected? Or can I not connect with the same room in 2 clients at the same time?

@riccardobl
Copy link
Contributor Author

riccardobl commented Sep 30, 2025

I think it is a big assumption that every P2P app needs the same signaling layer. At least, from some early tests I did 2 years ago, only the "presence" message was needed, and it didn't need a public key or expiration.

Presence is not enough, you need to negotiate a connection with the other peers, as they will have different networks conditions:

  • you might be able to connect to some, but they might not be able to connect to you
  • they might be on your lan
  • they might be unreachable and you might want to negotiate the use of a turn server
  • their network might be changing (eg. from wifi to data)

etc..

This nip implements the webrtc handshake that can also update the ice candidates as they come (using the routes event). Why do you think you need more or less than this?

The expiration is used to know when to consider a peer "lost" when the connection drops without a disconnection packet.

The public key (i suppose the room public key?) is a convenient id to group peers that want to connect to eachother (see response below) while also having a way to prove they are authorized to do so (they must have the room private key).

When you say things like: "The offer format is protocol specific". This means that if I am coding this, I will get offers all the time in formats I don't know and thus cannot support. That seems like wasted resources. My client should only be downloading the events it can parse in the application layer. So, to me, all of these other message types are all application dependent and should probably be using event kinds that only their app download.

Otherwise, if this is successful and everybody uses it for their game, clients are going to connect to a relay, download thousands of stuff they cannot parse before the first thing they can. For instance, if a client uses this but not for webrtc then it needs a way to download everything BUT the i=webrtc tag, which is not a possible filter on Nostr. The client will have to download everything and throw away everything that it does not support.

You won’t be flooded with irrelevant signaling events, since this is scoped to the specific room you connect to.
Apps that are unrelated simply won’t connect to the same room.

Also, you shouldn’t need to subscribe to the kind, because starting the signaling phase already requires the room private key, that is obtained through discovery or sharing (this is outside the scope of this NIP). You only subscribe to the P tags matching the room public key.

In the case of a game, the room would typically be shared via some form of matchmaking. An app could provide a clickable link. I suppose, NIP-53 could be used for this too.

Shouldn't your "disconnect" message link to the respective "connect" one? Otherwise, if a client (or multiple clients) is connected to 3 protocols/games and disconnects from 2, how would the others know which one is being disconnected? Or can I not connect with the same room in 2 clients at the same time?

Good point. Disconnecting from 3 games simultaneously isn’t possible for the reasons mentioned above, but the current spec would allow two instances of the same app, using the same keypair and connected to the same room, to be disconnected at the same time by a single disconnect packet.

To prevent this, I think it would be better to enforce the use of throwaway keypairs for signaling. That way, the two apps cannot eavesdrop on each other’s traffic.


EDIT: Thanks or the feedbacks, I made some changes and added a d tag for "session id" that can be used as alternative to throwaway keypairs.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Sep 30, 2025

The public key (i suppose the room public key?) is a convenient id to group peers that want to connect to eachother (see response below) while also having a way to prove they are authorized to do so (they must have the room private key).

How does this work? Nothing in the spec requires the private key for the room. Anyone can just spam any P they want.

Why do you think you need more or less than this?

Because I coded a thin webrtc some years ago and didn't need a room definition. Users would just post a "presence"-like event attesting that they are online, which already includes the coordinates to connect to them directly. The concept of a "room" is quite weird. Maybe it's necessary for games, but regular P2P nostr or voice calls, for instance, don't actually require any room.

I assume you want to use TURN servers for the non-nostr-event payloads, which I wasn't using at the time, and it is not P2P anymore, which means that any of the server-in-the-middle libraries out there can be used. IMO, TURN servers defeat the purpose of any P2P stack. But with them, you will need some parts of these other message types.

Still, the Offer/Answer protocol is just a way to describe an infinite number of features to be used by each app. On Nostr, each NIP generally just picks a preferred configuration to avoid forcing clients to support multiple options in the flow. For instance, we only use secp256k1 for signing messages, not any cryptographic curve or algo. We only use AES-GCM for encryptions in NIP-17. We only use ChaCha for private chats. There are no options. Which means there is no need to make a protocol to understand and choose options. It would be great if we could pre-choose options for this NIP, too.

Options work great when the same company is designing both sides of the call (WhatsApp only talks to WhatsApp). But it is terrible when you have 100 clients, all coded from scratch, trying to support multiple paths in the exact same way without any shared codebase

At the time, my biggest hope was to make an IPv6-only P2P protocol so that none of these NAT/Firewall issues can get in the way to complicate things. I am not sure if we are there now, but at the time, most Amethyst instances already had access to IPv6.

@riccardobl
Copy link
Contributor Author

riccardobl commented Sep 30, 2025

How does this work? Nothing in the spec requires the private key for the room. Anyone can just spam any P they want.

The private key requirement is in the encryption section. Of course, anyone can spam a public relay, but without the room key their messages are meaningless to peers in the room.

Because I coded a thin webrtc some years ago and didn't need a room definition.
Users would just post a "presence"-like event attesting that they are online, which already includes the coordinates to connect to them directly. The concept of a "room" is quite weird.

Publishing everyone's IPs on Nostr and connecting blindly isn’t ideal. Without a room, every peer can try to connect to everyone. A room provides a clean way to scope discovery and ensures that only selected peers can see each other.

With the offer-answer handshake you know who you’re talking to BEFORE attempting a direct connection. And you know who is trying to connect to you, this can be used to attempt nat traversal from the other side.

Without the offer-answer handshake you will be also forced to do some more complex handshake after the connection, because you are still going to need to know which packet is from whom.

Maybe it's necessary for games, but regular P2P nostr or voice calls, for instance, don't actually require any room.

I’d argue there’s always some notion of a room/topic, whether for a group chat, a call, or a game. It’s just the mechanism that ensures only the intended peers are discovered.

I assume you want to use TURN servers for the non-nostr-event payloads, which I wasn't using at the time, and it is not P2P anymore, which means that any of the server-in-the-middle libraries out there can be used. IMO, TURN servers defeat the purpose of any P2P stack. But with them, you will need some parts of these other message types.

TURN is only used as an optional fallback when direct P2P isn’t possible. It’s not a replacement for P2P. The handshake step can also carry other useful info: protocol version, session metadata, acceptance/rejection, etc.

At the time, my biggest hope was to make an IPv6-only P2P protocol so that none of these NAT/Firewall issues can get in the way to complicate things. I am not sure if we are there now, but at the time, most Amethyst instances already had access to IPv6.

That would be ideal, but IMO not realistic today. Many consumer networks are still IPv4, and even IPv6 devices might sit behind firewalls and nats. Even in a pure IPv6 world, you’d still need signaling to know who to connect to and how, you’d just skip NAT traversal.

@vitorpamplona
Copy link
Collaborator

Publishing everyone's IPs on Nostr and connecting blindly isn’t ideal.

It doesn't need to be public. There are many crypto schemes that given apps need to use. The event is just a marker that a client is online. In my case, the presence kind was per NIP. So a Voice Call NIP would use a presence event kind just for itself and would specify that the follow-up would transfer IP via direct NIP-44 encryption between the two p tags involved. I would send a request to connect via some kind (just for voice calls again) and which would be received, displayed to the user, approved, and replied to only by Clients that support voice calls by filtering by kind, NIP-44 encrypted to the final p-Tag (no need for shared secret rooms or other shared conventions with other types of applications).

Forcing every type of application to use a fixed shared secret scheme via rooms is not ideal, IMO. What if users don't trust each other to share that secret? What if the application is doing a better MLS-based design where the secrets are based on the hierarchical trees such that each subgroup is a separate secret. What if applications need a public IP instead?

With the offer-answer handshake you know who you’re talking to BEFORE attempting a direct connection.

You will always know before attempting a direct connection, regardless of which protocol you use.

Without the offer-answer handshake you will be also forced to do some more complex handshake after the connection, because you are still going to need to know which packet is from whom.

I don't think that is true. Yes, the application will have to do some handshake. But that will not be "more complex" in any way because each NIP should define a handshake that removes everything that is not needed for that particular application and adds more information that is exclusive to this application flow, without having to bother the other types of apps out there.

Some apps will have a very simple handshake, others will have a multi-round one. It's unfair to ask apps that just need a simple handshake to defensively implement all possible handshakes just to "comply" with this NIP.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Sep 30, 2025

All I am saying is that if you focus this NIP on the needs of the gaming engine handshake, you can reduce the complexity this abstraction is creating, specify more parts of the flow to streamline implementations, and get to a better level of interoperability between the apps you care about. Then the other WebRTC (or even broader P2P) handshakes can all define their own NIPs with their own handshakes as well. It's a win-win.

@riccardobl
Copy link
Contributor Author

riccardobl commented Sep 30, 2025

If I’m understanding correctly, what you’ve described here is an offer-answer flow, similar to the one described in this nip, but you filter presence by kind instead of y, wouldn’t this be a problem if p2p usage were to grow on nostr and every app used its own event kind for signaling?

I still believe standardizing signaling is possible and beneficial with some iteration of this event set, but if there’s no consensus on that, I think it would at least be worth reserving a kind for app signaling, similar to what NIP-78 does.

Anyhow, thank you again for taking the time to share your feedback.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Sep 30, 2025

wouldn’t this be a problem if p2p usage were to grow on nostr and every app used its own event kind for signaling?

I didn't say everybody should use that protocol. So much so that I didn't even create a NIP PR for it. What I am saying is that each app type should probably use their own event kinds with its own structures and flows that maximize for their unique needs. Making all apps use just one fixed abstraction layer blocks them from exploring more efficient protocols... that might only work for them, but that's ok. Sometimes these generalistic protocols are necessary. But here it does feel like every app could benefit from the added freedom. Either way, I look forward to seeing this working with the complete flow for the engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants