Skip to content

Conversation

3v1n0
Copy link
Collaborator

@3v1n0 3v1n0 commented Jul 7, 2025

When authd does any NSS request it ends up calling itself again, and
while this is not a problem per se it is just not required because we
always assume in authd code that our users are not part of the local
entries, when we ask for, given that we are in control of the database.

It's also potentially causing dead-locks, since we may end up calling
NSS functions when we have a lock, and these may lead to calling authd
again.

As per this, avoid performing this when the NSS module is loaded by
authd process itself.

To avoid this we expose the authd PID as an environment variable that
also the NSS module can read, even before connecting, and we check that
the current module PID matches with the variable.

Now, this is still not enough to be sure though, since any other process
may just replicate the same by defining the env variable and thus
bypassing the check.

So, to prevent this to happen, in case all the previous checks have
passed, get the peer credentials from the unix stream, and verify that
the socket is actually owned by the PID advertised by the env variable.

And only in this case, ignore invoking authd for all the NSS requests.

UDENG-7313

We were running the NSS integration test without authd actually using
the library, do it now to simulate a more real scenario

Even though it is gonna be a no-op soon
@3v1n0 3v1n0 requested a review from a team as a code owner July 7, 2025 12:21
@3v1n0 3v1n0 changed the title Nss avoid self connections NSS: avoid self connections Jul 7, 2025
@3v1n0 3v1n0 force-pushed the nss-avoid-self-connections branch 5 times, most recently from ab1e3c2 to f49a9f6 Compare July 7, 2025 16:13
3v1n0 added 3 commits July 7, 2025 18:15
…itself

When authd does any NSS request it ends up calling itself again, and
while this is not a problem per se it is just not required because we
always assume in authd code that our users are not part of the local
entries, when we ask for, given that we are in control of the database.

It's also potentially causing dead-locks, since we may end up calling
NSS functions when we have a lock, and these may lead to calling authd
again.

As per this, avoid performing this when the NSS module is loaded by
authd process itself.

To avoid this we expose the authd PID as an environment variable that
also the NSS module can read, even before connecting, and we check that
the current module PID matches with the variable.

Now, this is still not enough to be sure though, since any other process
may just replicate the same by defining the env variable and thus
bypassing the check.

So, to prevent this to happen, in case all the previous checks have
passed, get the peer credentials from the unix stream, and verify that
the socket is actually owned by the PID advertised by the env variable.

And only in this case, ignore invoking authd for all the NSS requests.
Since for each process NSS will be loaded just once there's no need to
perform the authd-self-check on all the requests, but we can just do it
the first time and then rely on such result.

So use an OnceLock to do it
The PID of the library won't change during the execution, so perform the
test and save the information for later use, instead of repeating the
same check for each request
@3v1n0 3v1n0 force-pushed the nss-avoid-self-connections branch from f49a9f6 to 21a1ce8 Compare July 7, 2025 16:15
@adombeck
Copy link
Contributor

adombeck commented Jul 9, 2025

Now, this is still not enough to be sure though, since any other process
may just replicate the same by defining the env variable and thus
bypassing the check.

So, to prevent this to happen, in case all the previous checks have
passed, get the peer credentials from the unix stream, and verify that
the socket is actually owned by the PID advertised by the env variable.

Do I understand correctly that the potential threat we want to protect against is that an attacker who can control the environment of a program is able to make the program skip NSS requests to authd users/groups? If so, I don't think checking the peer creds and comparing the PID from that with the PID from the env var is enough. An attacker could just guess the PID of the process and set the env var accordingly. If they repeat it often enough, they will guess correctly eventually (the PID space is too small to avoid that). Checking the name of the currently running executable would be more secure.

@adombeck
Copy link
Contributor

adombeck commented Jul 9, 2025

Considering that an attacker who can control the environment can also set LD_PRELOAD to arbitrarily manipulate the process, it's enough to consider the cases in which LD_PRELOAD cannot be used. That is the case when ld considers that the binary should be run in secure-execution mode: https://manpages.ubuntu.com/manpages/noble/man8/ld.so.8.html#environment

A binary is executed in secure-execution mode if the AT_SECURE entry in the auxiliary vector (see getauxval(3)) has a nonzero value.

So I think the correct check we should do before skipping self lookups is getauxval(AT_SECURE) != 0.

@adombeck
Copy link
Contributor

adombeck commented Jul 9, 2025

So I think the correct check we should do before skipping self lookups is getauxval(AT_SECURE) != 0.

I'll prepare a commit for that.

@adombeck
Copy link
Contributor

adombeck commented Jul 9, 2025

Another thing to consider: Do we really want to avoid self-lookups in all the NSS requests done by authd? There might be cases where we are not checking our own database first, and instead rely on NSS to do that for us. For example when checking if a generated UID is unique:

unique, err := r.temporaryUserRecords.uniqueNameAndUID(name, uid, tmpID)

AFAICT we don't check our own database first in that case, but rely on NSS to do that. That case might be moot once #993 is merged, but there could be others.

That makes me a bit reluctant of the approach to always skip self-lookups in our NSS module. I would prefer it if we could define functions in Go which we can call to do NSS lookups without self-lookups, but can also still do NSS lookups with self-lookups.

@adombeck
Copy link
Contributor

adombeck commented Jul 9, 2025

I would prefer it if we could define functions in Go which we can call to do NSS lookups without self-lookups, but can also still do NSS lookups with self-lookups.

We might be able to achieve that by using thread-local variables instead of environment variables. Implementing that seems like too much effort for something that we didn't plan, so I'll not do that for now. Avoiding self-requests is a nice performance optimization, but it's not high priority IMO, so I think we should revisit this only once we don't have any higher priority issues to work on.

@3v1n0
Copy link
Collaborator Author

3v1n0 commented Jul 9, 2025

Another thing to consider: Do we really want to avoid self-lookups in all the NSS requests done by authd? There might be cases where we are not checking our own database first, and instead rely on NSS to do that for us. For example when checking if a generated UID is unique:

This PR is meant to be based on what we are doing on #993 and so there we don't rely on it, reason why I think we should just never check for it and always assume it.

Use per-request variables IMHO it can just lead to racy situations, and I think it's better (since it's also easier to test for us) if we assume that a NSS request done by the daemon won't ever return a result, or we end up in inconsistent situations that are quite hard to test too (they can, but only in integration tests). While we have already lots of tests in users that assume that there is no NSS in place

@adombeck
Copy link
Contributor

adombeck commented Jul 9, 2025

Use per-request variables IMHO it can just lead to racy situations

I don't see that, how can it be racy if the variables are bound to a single thread and a single request?

Use per-request variables IMHO it can just lead to racy situations, and I think it's better (since it's also easier to test for us) if we assume that a NSS request done by the daemon won't ever return a result, or we end up in inconsistent situations that are quite hard to test too (they can, but only in integration tests).

why are they hard to test? we could for example implement my proposal by changing the signature of GetPasswdEntries() to GetPasswdEntries(selfLookup bool), and make it perform a self lookup based on that argument. I don't see why this would be harder to test than the change you're proposing here. both are not easy to test in unit tests, because the functionality depends on the NSS module.

Even if we're not doing any lookups which rely on NSS to check our own database with #993 (I did not check if that's the case or not), I think it's better to make it explicit in the function call whether internal entries are considered or not. We (or someone else working on the code base) might later find it unexpected when e.g. user.Lookup does not return users from the authd database. Making it explicit in the function call avoids any potential confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants