Skip to content

BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside a target. #61598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
monagai opened this issue Jun 7, 2025 · 4 comments
Closed
3 tasks done

Comments

@monagai
Copy link

monagai commented Jun 7, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({
   ...:     'A': [1, 9, 6, 2, 7],
   ...:     'B': [6, 1, 3, 6, 3],
   ...:     'C': [2, 8, 4, 4, 4]
   ...: }, index=list('abcde'))
df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
df['vals'] = df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)

Issue Description

This ia reprot about ~ opetarotr in pandas dataframe.

Here is the example on python=3.10.12, pandas=2.2.3.

python 3.10.12 (main, Feb  4 2025, 14:57:36) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.34.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({
   ...:     'A': [1, 9, 6, 2, 7],
   ...:     'B': [6, 1, 3, 6, 3],
   ...:     'C': [2, 8, 4, 4, 4]
   ...: }, index=list('abcde'))

In [3]: df
Out[3]:
   A  B  C
a  1  6  2
b  9  1  8
c  6  3  4
d  2  6  4
e  7  3  4

In [3]: df
Out[3]:
   A  B  C
a  1  6  2
b  9  1  8
c  6  3  4
d  2  6  4
e  7  3  4

In [4]: df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[4]:
a    False
b     True
c     True
d    False
e     True
dtype: bool

In [5]: df['vals'] = df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)

In [6]: df
Out[6]:
   A  B  C   vals
a  1  6  2  False
b  9  1  8   True
c  6  3  4   True
d  2  6  4  False
e  7  3  4   True

In [7]: df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[7]:
a   -2
b   -1
c   -1
d   -2
e   -1
dtype: int64

In the above example, the same df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1) is executed in step 4, 5, and 7.
However, the result of step 7 is ridiculous.
In spite of ~, not operator returns a correct answer.
It seems that ~ operator in pandas dataframe quite dangerous and unreliable.

In the environment of python 3.13.3, panads=2.2.3, only for the step 7, python returns warning that <ipython-input-7-7d5677ff0f59>:1: DeprecationWarning: Bitwise inversion '~' on bool is deprecated and will be removed in Python 3.16. This returns the bitwise inversion of the underlying int object and is usually not what you expect from negating a bool. Use the 'not' operator for boolean negation or ~int(x) if you really want the bitwise inversion of the underlying int..
However, I think this is a warning by python (not by pandas) from a different point of view.

Expected Behavior

The result of step 7 is same as step 4, 5.

Installed Versions

python = 3.10.12
pandas = 2.2.3

@monagai monagai added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 7, 2025
@monagai monagai changed the title BUG: Dangerous inconsistency: ~ operator changes behavior based on context. BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside the target. Jun 7, 2025
@monagai monagai changed the title BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside the target. BUG: Dangerous inconsistency: ~ operator changes behavior based on context outside a target. Jun 7, 2025
@Liam3851
Copy link
Contributor

Liam3851 commented Jun 7, 2025

I believe the way to do what you want is simply

~((df['B'] > 3) & (df['C'] < 8)

This keeps all the math within pandas and returns a boolean Series.

When you call

df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)

you are expressly telling the system that you want to take each row of df, cast it to a Series, look up individual elements of the Series, and then apply ~ to the elements. In this case you are taking all the math away from pandas and sending it to Python, which is doing something you don't want, which is why you get the Python warning.

On line 4 I think you happen to get away with it because the whole DataFrame is dtyped as np.int64 and so you're staying with numpy scalars (e.g. np.int64, rather than python int), and so your comparison operators return numpy.bool_s, and numpy is handling this the way you want. When you add the additional column you're getting an object dtyped Series on the cross-section (because you now have columns of different dtypes), and so it's going all the way to python, so you're getting python ints, and thus python bools, which give you a different answer, i.e.

In[1]: ~np.bool_(False)
Out[1]: True

In [2]: ~False
Out[2]: -1

@monagai
Copy link
Author

monagai commented Jun 12, 2025

Sorry, your comment is unacceptable for me.
I am talking about inconsistent behavior of ~ operator depending on DataFrame columns
It seems that you lead the discsussion to an another point.

df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1) targets only int columns.
Why is it correct that this commands depends on off-target columns?

Python 3.13.3 (main, Apr  8 2025, 13:54:08) [Clang 16.0.0 (clang-1600.0.26.6)]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.2.0 -- An enhanced Interactive Python. Type '?' for help.
Tip: `?` alone on a line will brings up IPython's help
…

In [4]: df
Out[4]: 
   A  B  C   vals
a  1  6  2  False
b  9  1  8   True
c  6  3  4   True
d  2  6  4  False
e  7  3  4   True

In [5]: df.dtypes
Out[5]: 
A       int64
B       int64
C       int64
vals     bool
dtype: object

In [6]: tmp_df = df[['B','C']]

In [7]: tmp_df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[7]: 
a    False
b     True
c     True
d    False
e     True
dtype: bool

In [8]: df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
<ipython-input-8-f0949cd95cf2>:1: DeprecationWarning: Bitwise inversion '~' on bool is deprecated and will be removed in Python 3.16. This returns the bitwise inversion of the underlying int object and is usually not what you expect from negating a bool. Use the 'not' operator for boolean negation or ~int(x) if you really want the bitwise inversion of the underlying int.
  df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[8]: 
a   -2
b   -1
c   -1
d   -2
e   -1
dtype: int64

If such a curious behaviour is a 'correct specifiation' in pandas, pandas shoud never recommend ~ operator as a method to get a part of series or dataframes.

@MarcoGorelli MarcoGorelli added Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 13, 2025
@MarcoGorelli
Copy link
Member

I think @Liam3851 is correct, and this part of his answer

When you add the additional column you're getting an object dtyped Series on the cross-section (because you now have columns of different dtypes), and so it's going all the way to python, so you're getting python ints, and thus python bools, which give you a different answer

answers your question about why you get different answers depending on which dataframe you apply the operation to:

  • in the first case, all your columns have the same type
  • in the second case, you have mixed type columns

I don't think there's anything dangerous about ~ - the dangerous part is using apply(..., axis=1) on a dataframe with mixed datatypes, which is what you have in the second case


I'd suggest taking a kinder approach here. @Liam3851 took the time to write a nice answer to you, and you downvote the reply and comment "Sorry, your comment is unacceptable for me". Please ask yourself whether this constitutes friendly behaviour 🙏

@monagai
Copy link
Author

monagai commented Jun 16, 2025

Thank you for your comment.

However,

took the time to write a nice answer to you,

Sorry, but this is meaningless.
I submitted this issue not for my own benefit, but for all pandas users, because I already know this problem and how to avoid it.

In the pandas documentation here:
https://github.com/pandas-dev/pandas/blob/main/doc/source/user_guide/indexing.rst#L1315-L1347
it states: You can negate boolean expressions with the word not or the ~ operator.

However, this is not true, is it?
Here is the evidence:

In [5]: df
Out[5]:
   A  B  C   vals
a  1  6  2  False
b  9  1  8   True
c  6  3  4   True
d  2  6  4  False
e  7  3  4   True

In [6]: df.apply(lambda x: ~((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[6]:
a   -2
b   -1
c   -1
d   -2
e   -1
dtype: int64

In [7]: df.apply(lambda x: not ((x['B'] > 3) & (x['C'] < 8)), axis=1)
Out[7]:
a    False
b     True
c     True
d    False
e     True
dtype: bool

Please answer yes or no to this question.

Furthermore, your comment suggests that:

  • All pandas users must understand the deep layers of pandas implementation to use it effectively.
  • All pandas users must have complete knowledge of the data types in their DataFrames, regardless of size.

Is this correct? Please answer yes or no to this question.

In reality, this appears to be a fundamental specification defect, not merely a bug.
The behavior is as specified, but the specification itself is problematic and leads to unpredictable results.
As bitwise operations and logical operations represent fundamentally distinct computational concepts, this conflation of semantics inevitably results in confusion and unpredictable behavior.

I understand that perhaps pandas cannot revert to a world without the ~ operator now.
In that case, shouldn't there at least be a prominent warning regarding this issue?

Closing this issue serves only to suppress the facts.
I believe this violates the spirit of open source, even though I may be relatively new to Python.

Please reopen this issue if you are fair developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants