-
Notifications
You must be signed in to change notification settings - Fork 414
Feature recusive filter request body #937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature recusive filter request body #937
Conversation
Thank you so much @tighwm for taking this on. You figured it out very quickly and did a great job! I’d like this extension to be included in the library, so I want to call out the maintainers to get your attention for review and advice. This functionality is meant to help in the following cases: As a library user, I expect that request body parameter filtering acts as a kind of sanitizing guard. That’s why my test configs usually look like this: @pytest.fixture(scope="module")
def vcr_config():
return {
"decode_compressed_response": True,
"filter_headers": [
"authorization",
],
"filter_query_parameters": [
"access_token",
"token",
],
"filter_post_data_parameters": [
"api_key",
"password",
],
} However, there are cases where requests (e.g. JsonRPC) have a request body wrapped in some envelope: {
"jsonrpc": "2.0",
"method": "remoteProcedure",
"params": {
"credentials": {
"password": "secret",
"login": "name"
},
"payload": 42
},
"id": 3
} Or sometimes even worse, when communicating with something custom: {
"remoteProcedure": {
"credentials": {
"id": "NPC",
"password": "secret"
},
"remoteProcedurePayload": {
"newValue": "901",
"anotherKey": "Possible Secret"
}
}
} In such cases, sanitizing request parameters for recording requires writing a custom replacer callback with recursion - since that’s the simplest option without knowing the exact path to the value, which may vary, in order to remove it. When I used the config, I was expected that this field would always be filtered out from the request - without care that these fields must exist only at the first nesting level. In other words, I expected recursive lookup of this field: "filter_post_data_parameters": [
"password",
], My considerations and concerns I’d like to check with you:
I also think this behavior could be applied only to simple field specification (like in the config example above). And we could keep the current behavior for Tuple and Callable cases. However, that would make the configuration more fragmented. @kevin1024 @neozenith @jairhenrique @hartwork I’d like to hear your advice and possible caveats. |
(heavy VCR user here) My understanding of this change would affect newly recorded cassettes mainly. If someone had existing code in place, updated vcrpy and re-recorded as cassette, this could lead to removing more fields than intended. That would lead to payback issues. I'm imagining a payload like this:
Scrubbing out the inner secretkey reference could cause issues. Maybe providing the recursive cleanup function as utility function in the library and providing a recipie ( via documentation on filtering ) showing how someone could opt into using the recursive functionality would be a good middle road? |
@vEpiphyte Hi! I understand your concerns. |
Co-authored-by: vEpiphyte <[email protected]>
@vEpiphyte thanks for the quick comment and opinion. I’d like to finally propose thinking through the desired usage API. Since I’d prefer not to make the configuration of tools too complicated with lots of different settings. Suggested solutionsModify existing semanticFacts for risk assessment:
So the question is: what percentage of users simultaneously have both a configured In other words, I want to assess how real this risk is - whether we must preserve full backward compatibility, or if this is an unnecessary safeguard that will just complicate the library’s API. Modify current functionality with a backward-compatibility helperAssuming that recursive search and sanitization of this key is more intuitive and convenient behavior (based only on my personal experience and assumption) It would be possible to change the default behavior to recursive and update the documentation, noting that if you want only top-level keys, you should now wrap the key in a helper, for example: @pytest.fixture(scope="module")
def vcr_config():
return {
"decode_compressed_response": True,
...
"filter_post_data_parameters": [
"api_key",
vcr.RootKey("password"),
],
} Variation of existing functionality with backward compatibilityWe could do something similar by leaving full backward compatibility by default. This option is safe in terms of compatibility, but makes the behavior less explicit, which, in my assumption, is actually what users expect by default. New configuration parameterBased on all this, we could also introduce new sanitization functionality altogether. For example, we could add a new API usage pattern like: with my_vcr.use_cassette('test.yml', sensetive=True):
# sensitive HTTP request goes here In this simple usage, it would automatically search for and remove potentially sensitive keys (headers, query params, body). And for advanced usage, the list of sensitive fields could be user-defined: with my_vcr.use_cassette('test.yml', sensetive=True, sensetive_fields=['password']):
# sensitive HTTP request goes here (Specifying That way, the my original use case, where you want to apply guards to all cassettes, would look like this: @pytest.fixture(scope="module")
def vcr_config():
return {
"decode_compressed_response": True,
"sensetive": True,
"sensetive_fields": [
"authorisation",
"access_token",
"token",
"api_key",
"password",
],
} |
Added recursive filtering of request body on this issue and unit test on this case