Skip to content

Conversation

tighwm
Copy link

@tighwm tighwm commented Sep 3, 2025

Added recursive filtering of request body on this issue and unit test on this case

@JargeZ
Copy link

JargeZ commented Sep 3, 2025

Thank you so much @tighwm for taking this on. You figured it out very quickly and did a great job!
@mahenzon thanks for building such a good community - I’d appreciate it if you could also leave comments on this PR with your thoughts and code review!

I’d like this extension to be included in the library, so I want to call out the maintainers to get your attention for review and advice.

This functionality is meant to help in the following cases:

As a library user, I expect that request body parameter filtering acts as a kind of sanitizing guard. That’s why my test configs usually look like this:

@pytest.fixture(scope="module")
def vcr_config():
    return {
        "decode_compressed_response": True,
        "filter_headers": [
            "authorization",
        ],
        "filter_query_parameters": [
            "access_token",
            "token",
        ],
        "filter_post_data_parameters": [
            "api_key",
            "password",
        ],
    }

However, there are cases where requests (e.g. JsonRPC) have a request body wrapped in some envelope:

{
  "jsonrpc": "2.0",
  "method": "remoteProcedure",
  "params": {
    "credentials": {
      "password": "secret",
      "login": "name"
    },
    "payload": 42
  },
  "id": 3
}

Or sometimes even worse, when communicating with something custom:

{
  "remoteProcedure": {
    "credentials": {
      "id": "NPC",
      "password": "secret"
    },
    "remoteProcedurePayload": {
      "newValue": "901",
      "anotherKey": "Possible Secret"
    }
  }
}

In such cases, sanitizing request parameters for recording requires writing a custom replacer callback with recursion - since that’s the simplest option without knowing the exact path to the value, which may vary, in order to remove it.

When I used the config, I was expected that this field would always be filtered out from the request - without care that these fields must exist only at the first nesting level. In other words, I expected recursive lookup of this field:

"filter_post_data_parameters": [
    "password",
],

My considerations and concerns I’d like to check with you:

  1. In the current implementation, I suggested not distinguishing whether the replace value is Callable or not.
    This makes all filter behavior recursive in terms of key lookup.

  2. Recursive traversal obviously consumes more CPU resources. I’m curious whether this could be critical.
    My conclusion was that since filtering (presumably) happens once per cassette before saving, recursive traversal should not have a significant effect on overall test performance.

  3. In cases where the Request body is a dict, technically it could be recursive, making traversal impossible. However, as far as I recall, such usage is invalid and would already throw an error at the requests level.
    Maybe in the vcrpy traversal implementation, recursion depth could be limited by some value.

I also think this behavior could be applied only to simple field specification (like in the config example above). And we could keep the current behavior for Tuple and Callable cases. However, that would make the configuration more fragmented.

@kevin1024 @neozenith @jairhenrique @hartwork I’d like to hear your advice and possible caveats.
If there are concerns, maybe it makes sense to add a separate new config parameter filter_all, which would be a list of string field keys that can be filtered out across all possible request parameters (body, query, headers).
This would preserve full backward-compatible semantics.

@vEpiphyte
Copy link

(heavy VCR user here) My understanding of this change would affect newly recorded cassettes mainly. If someone had existing code in place, updated vcrpy and re-recorded as cassette, this could lead to removing more fields than intended. That would lead to payback issues. I'm imagining a payload like this:

{
   "secretkey": "foo"
   "callname" : "someRpcThing",
   "params": {
     "query": "select * from table where valu=(secretkey)",
     "kwargs": {
       "secretkey": "12345"
     }
   }
}

Scrubbing out the inner secretkey reference could cause issues.

Maybe providing the recursive cleanup function as utility function in the library and providing a recipie ( via documentation on filtering ) showing how someone could opt into using the recursive functionality would be a good middle road?

@tighwm
Copy link
Author

tighwm commented Sep 3, 2025

@vEpiphyte Hi! I understand your concerns.
Turning this function into a utility seems like a more flexible solution.
On the other hand, I’d suggest a simpler alternative: using the recursion_filter flag, which defaults to False.

@JargeZ
Copy link

JargeZ commented Sep 6, 2025

@vEpiphyte thanks for the quick comment and opinion.
I see that @tighwm has already added this as a separate filter.

I’d like to finally propose thinking through the desired usage API. Since I’d prefer not to make the configuration of tools too complicated with lots of different settings.
Adding an entirely new parameter just for the recursive variation seems redundant to me.
We could either extend the mechanism for advanced usage of the current replace configuration, or introduce a completely new, more independent configuration parameter.

Suggested solutions

Modify existing semantic

Facts for risk assessment:

  • does not affect existing cassettes
  • affects recording of new cassettes
  • potentially unexpected behavior if a user previously had filter_post_data_parameters with a key that appears not only at the top level of a request
  • AND at the same time the user really expect this field to be included, necessarily expects this field in the request to be taken into account in the request logic
  • The standard behavior is matching on the uri

So the question is: what percentage of users simultaneously have both a configured filter_post_data_parameters and body matching and/or expect request to be recorded with a repeating key that are filtered out on the root level?

In other words, I want to assess how real this risk is - whether we must preserve full backward compatibility, or if this is an unnecessary safeguard that will just complicate the library’s API.

Modify current functionality with a backward-compatibility helper

Assuming that recursive search and sanitization of this key is more intuitive and convenient behavior (based only on my personal experience and assumption)

It would be possible to change the default behavior to recursive and update the documentation, noting that if you want only top-level keys, you should now wrap the key in a helper, for example:

@pytest.fixture(scope="module")
def vcr_config():
    return {
        "decode_compressed_response": True,
        ...
        "filter_post_data_parameters": [
            "api_key",
            vcr.RootKey("password"),
        ],
    }

Variation of existing functionality with backward compatibility

We could do something similar by leaving full backward compatibility by default.
For example, introducing a helper for the opposite situation:
vcr.RecursiveKey("password")

This option is safe in terms of compatibility, but makes the behavior less explicit, which, in my assumption, is actually what users expect by default.

New configuration parameter

Based on all this, we could also introduce new sanitization functionality altogether.
The goal here is to have the library proactively handle a problem that users are very likely to encounter.

For example, we could add a new API usage pattern like:

with my_vcr.use_cassette('test.yml', sensetive=True):
    # sensitive HTTP request goes here

In this simple usage, it would automatically search for and remove potentially sensitive keys (headers, query params, body).
The list of built-in parameters could be discussed, but covering most default use cases would only require popular ones.

And for advanced usage, the list of sensitive fields could be user-defined:

with my_vcr.use_cassette('test.yml', sensetive=True, sensetive_fields=['password']):
    # sensitive HTTP request goes here

(Specifying sensetive_fields explicitly could make it unnecessary to also set sensetive=True).

That way, the my original use case, where you want to apply guards to all cassettes, would look like this:

@pytest.fixture(scope="module")
def vcr_config():
    return {
        "decode_compressed_response": True,
        "sensetive": True,
        "sensetive_fields": [
            "authorisation",
            "access_token",
            "token",
            "api_key",
            "password",
        ],
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants