-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Related to #63099 and the change of returning a read-only view in Series.array.
One reason that the read-only view is a problem (for the use case of GeoPandas), is because we want to update the underlying array of the Series in place, e.g. for updating an attribute.
But if we would keep doing that through a ser.array that directly gives the underlying array:
ser.array.attr = ...then that would actually still violate the idea of CoW, because this attribute change would propagate to any shallow copy of the series (although we never explicitly discussed whether attributes fall under CoW or not).
Right now, if you want to actually update (swap) the underlying array of a Series without going though ser.array/ser.values/ser._values, you essentially can swap the underlying manager (this is of course using private APIs, but to illustrate):
>>> ser = pd.Series([1, 2, 3])
>>> mgr = ser._mgr
>>> arr = mgr.array
>>> arr = arr.view()
>>> arr[0] = 100 # or change an attribute on an EA, but mutate value to illustrate here
>>> mgr.set_values(arr)
>>> ser._mgr = mgr
>>> ser
0 100
1 2
2 3
dtype: int64To achieve something like this, we could also provide a method, something like ser.update_array(arr) (although given how of an advanced use case this is, it is also not great that this would be an actual public method, so we could also make it a semi-private (hidden but documented for use by developers) method like _update_array().
I would have to think how it should handle the CoW situation (i.e. should it keep the block refs, or discard them? that might be an option for the developer to specify in the method, if they know if the physical array stayed the some or not)
Right now, as a developer, the way you can update the array inplace,