Skip to content
/ pyobject Public

A multifunctional utility tool for operating internal python objects, compatible with nearly all Python 3 versions. 一个提供操作对象底层工具的多功能Python包, 支持几乎所有Python 3版本。

License

Notifications You must be signed in to change notification settings

qfcy/pyobject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stars GitHub release License: MIT

pyobject - A multifunctional all-in-one utility tool for managing internal Python objects, compatible with nearly all Python 3 versions.

[English | 中文]

Submodules:

pyobject.__init__ - Displays and outputs attribute values of Python objects.

pyobject.browser - Provides a visual interface to browse Python objects using tkinter.

pyobject.code - Provides tools for manipulating Python native bytecode.

pyobject.search - Implements the utility for locating the path to a specific object.

pyobject.objproxy - Implement a generic object proxy that can replace any Python object, including modules, functions, and classes

pyobject.pyobj_extension - A C extension module offering functions to manipulate low-level Python objects.

Functions:

describe(obj, level=0, maxlevel=1, tab=4, verbose=False, file=sys.stdout):

Printing all attributes of an object in attribute: value format for debugging purpose. The alias is desc().

  • maxlevel: The depth of attribute levels to print.
  • tab: Number of spaces for indentation, default is 4.
  • verbose: Boolean indicating whether to print special methods (e.g., __init__).
  • file: A file-like object for output.

browse(object, verbose=False, name='obj'):

Browse any Python objects in a graphical interface using tkinter.

  • verbose: Same as in describe, whether to print special methods.

The graphical interface of the browse() function is shown below:

browse function GUI

bases(obj, level=0, tab=4):

Prints base classes and the inheritance order of an object.

  • tab: Number of spaces for indentation, default is 4.

Functions for searching objects:

make_list(start_obj, recursions=2, all=False):

Creates a list of objects without duplicates.

  • start: The object to start searching from.
  • recursion: Number of recursions.
  • all: Whether to include special attributes (e.g., __init__) in the list.

make_iter(start_obj, recursions=2, all=False):

Similar to make_list, but creates an iterator, which may contain duplicates.

search(obj, start, recursions=3, search_str=False):

Searches for objects starting from a specified starting point. For example, search(os, sys, 3) returns results like ["sys.modules['site'].os", "sys.modules['os']", ...].

  • obj: The object to search for.
  • start: The starting object.
  • recursion: Number of recursions.
  • search_str: Whether to search substrings within strings.

Class: pyobject.Code

The Code class provides a wrapper for Python bytecode objects, making it easier to manipulate Python bytecode.

Python's internal bytecode object, CodeType (e.g., func.__code__), is immutable. The Code class offers a mutable bytecode object and a set of methods to simplify operations on the underlying bytecode.

Unlike Java bytecode, Python bytecode is not cross-version compatible. Bytecode generated by different versions of the Python interpreter is incompatible.

The Code class provides a universal interface for bytecode, supporting all Python versions from 3.4 to 3.14 (including PyPy's .pyc format), simplifying complex version compatibility issues.

Constructor (def __init__(self, code=None))

The Code class can be initialized with an existing CodeType object or another Code instance. If no argument is provided, a default CodeType object is created.

Attributes

  • _code: The internal bytecode of the current Code object. Use exec(c._code) or exec(c.to_code()) instead of directly using exec(c).

The following are attributes of Python's built-in bytecode (also attributes of the Code object). While Python's internal CodeType bytecode is immutable and these attributes are read-only, the Code object is mutable, meaning these attributes can be modified:

  • co_argcount: The number of positional arguments (including those with default values).
  • co_cellvars: A tuple containing the names of local variables referenced by nested functions.
  • co_code: A bytes object representing the sequence of bytecode instructions, storing the actual binary bytecode.
  • co_consts: A tuple containing the literals used by the bytecode.
  • co_filename: The filename of the source code being compiled.
  • co_firstlineno: The first line number of the source code corresponding to the bytecode. Used internally by the interpreter in combination with co_lnotab to output precise line numbers in tracebacks.
  • co_flags: An integer encoding multiple flags used by the interpreter.
  • co_freevars: A tuple containing the names of free variables.
  • co_kwonlyargcount: The number of keyword-only arguments.
  • co_lnotab: A string encoding the mapping of bytecode offsets to line numbers (replaced by co_linetable in Python 3.10).
  • co_name: The name of the function/class corresponding to the bytecode.
  • co_names: A tuple containing the names used by the bytecode.
  • co_nlocals: The number of local variables used by the function (including arguments).
  • co_stacksize: The stack size required to execute the bytecode.
  • co_varnames: A tuple containing the names of local variables (starting with argument names).

Attributes introduced in Python 3.8 and later:

  • co_posonlyargcount: The number of positional-only arguments, introduced in Python 3.8.
  • co_linetable: Line number mapping data, introduced in Python 3.10 as a replacement for co_lnotab.
  • co_exceptiontable: Exception table data, introduced in Python 3.11.
  • co_qualname: The qualified name of the bytecode, introduced in Python 3.11.

Methods

Core Methods

  • exec(globals_=None, locals_=None): Executes the code object within the provided global and local scope dictionaries.
  • eval(globals_=None, locals_=None): Executes the code object within the provided global and local scope dictionaries and returns the result.
  • copy(): Creates a copy of the Code object and returns the duplicate.
  • to_code(): Converts the Code instance back to a built-in CodeType object, equivalent to c._code.
  • to_func(globals_=None, name=None, argdefs=None, closure=None, kwdefaults=None): Converts the code object into a Python function. The parameters are the same as those used when instantiating Python's built-in FunctionType.
  • get_flags(): Returns a list of flag names for the co_flags attribute, e.g., ["NOFREE"].
  • get_sub_code(name): Searches for sub-code objects (e.g., functions or class definitions) in the co_consts attribute. This method does not perform recursive searches. Returns the found Code object or raises a ValueError if not found.

Serialization

  • to_pycfile(filename): Dumps the code object into a .pyc file using the marshal module.
  • from_pycfile(filename): Creates a Code instance from a .pyc file.
  • from_file(filename): Creates a Code instance from a .py or .pyc file.
  • pickle(filename): Serializes the Code object into a pickle file.

Debugging and Inspection

  • show(*args, **kw): Internally calls pyobject.desc to display the attributes of the code object. The parameters are the same as those used in desc().
  • info(): Internally calls dis.show_code to display basic information about the bytecode.
  • dis(*args, **kw): Calls the dis module to output the disassembly of the bytecode, equivalent to dis.dis(c.to_code()).
  • decompile(version=None, *args, **kw): Calls the uncompyle6 library to decompile the code object into source code. (The uncompyle6 library is optional when installing the pyobject package.)

Factory Functions

  • fromfunc(function): Creates a Code instance from a Python function object, equivalent to Code(func.__code__).
  • fromstring(string, mode='exec', filename=''): Creates a Code instance from a source code string. The parameters are the same as those used in the built-in compile function, which is called internally.

Compatibility Details

  • Attribute co_lnotab: In Python 3.10 and later, attempts to set the co_lnotab attribute will automatically be converted into setting the co_linetable attribute.

Example usage: (excerpted from the doctest):

>>> def f():print("Hello")
>>> c=Code.fromfunc(f) # or c=Code(f.__code__)
>>> c.co_consts
(None, 'Hello')
>>> c.co_consts=(None, 'Hello World!')
>>> c.exec()
Hello World!
>>>
>>> # Save to pickle files
>>> import os,pickle
>>> temp=os.getenv('temp')
>>> with open(os.path.join(temp,"temp.pkl"),'wb') as f:
...     pickle.dump(c,f)
...
>>> # Execute bytecodes from pickle files
>>> f=open(os.path.join(temp,"temp.pkl"),'rb')
>>> pickle.load(f).to_func()()
Hello World!
>>> # Convert to pyc files and import them
>>> c.to_pycfile(os.path.join(temp,"temppyc.pyc"))
>>> sys.path.append(temp)
>>> import temppyc
Hello World!
>>> Code.from_pycfile(os.path.join(temp,"temppyc.pyc")).exec()
Hello World!

Object Proxy Classes ObjChain and ProxiedObj

pyobject.objproxy is a powerful tool for proxying any other object and generating the code that calls the object. It is capable of recording detailed access and call history of the object.
ObjChain is a class encapsulation used to manage multiple ProxiedObj objects, where ProxiedObj is a class that acts as a proxy to other objects.

Example usage:

from pyobject import ObjChain

chain = ObjChain(export_attrs=["__array_struct__"])
np = chain.new_object("import numpy as np", "np")
plt = chain.new_object("import matplotlib.pyplot as plt", "plt",
                        export_funcs=["show"])

# Testing the pseudo numpy and matplotlib modules
arr = np.array(range(1, 11))
arr_squared = arr ** 2
print(np.mean(arr)) # Output the average value

plt.plot(arr, arr_squared) # Plot the graph of y=x**2
plt.show()

# Display the auto-generated code calling numpy and matplotlib libraries
print(f"Code:\n{chain.get_code()}\n")
print(f"Optimized:\n{chain.get_optimized_code()}")

Output:

Code: # Unoptimized code that contains all detailed access records for objects
import numpy as np
import matplotlib.pyplot as plt
var0 = np.array
var1 = var0(range(1, 11))
var2 = var1 ** 2
var3 = np.mean
var4 = var3(var1)
var5 = var1.mean
var6 = var5(axis=None, dtype=None, out=None)
ex_var7 = str(var4)
var8 = plt.plot
var9 = var8(var1, var2)
var10 = var1.to_numpy
var11 = var1.values
var12 = var1.shape
var13 = var1.ndim
...
var81 = var67.__array_struct__
ex_var82 = iter(var70)
ex_var83 = iter(var70)
var84 = var70.mask
var85 = var70.__array_struct__
var86 = plt.show
var87 = var86()

Optimized: # Optimized code
import numpy as np
import matplotlib.pyplot as plt
var1 = np.array(range(1, 11))
plt.plot(var1, var1 ** 2)
plt.show()

Detailed Usage

ObjChain

  • ObjChain(export_funcs=None, export_attrs=None): Creates an ObjChain object, where export_funcs is a list of functions to be exported at the global level, and export_attrs is a list of attributes to be exported at the global level. Since these are at global scope, they are effective for all variables.
  • new_object(code_line, name, export_funcs=None, export_attrs=None, use_target_obj=True): Adds a new object and returns a proxy object of type ProxiedObj that can be directly used as a normal object.
    code_line is the code that needs to be executed to obtain the object (e.g., "import numpy as np"), and name is the variable name in which the object is stored after execution (e.g., "np").
    export_funcs and export_attrs are the lists of methods and attributes for this object that need to be exported.
    use_target_obj indicates whether to create a proxy template object in real-time and operate on it (see the "Implementation" section for details).
  • add_existing_obj(obj, name): Adds an existing object and returns a proxy object of type ProxiedObj.
    obj is the object to be added, and name is an arbitrary variable name that will be used to refer to this object in the code generated by ObjChain. use_exported_obj determines whether not to pass the ProxiedObj object as a calling parameter to __target_obj.
  • get_code(start_lineno=None, end_lineno=None): Retrieves the original code generated by ObjChain. start_lineno and end_lineno are line numbers starting from 0, and if not specified, they default to the beginning and end.
  • get_optimized_code(no_optimize_vars=None, remove_internal=True, remove_export_type=True): Retrieves the optimized code. Internally, a directed acyclic graph (DAG) is used for optimization (see the "Implementation" section).
    no_optimize_vars: A list of variable names that should not be removed, such as ["temp_var"].
    remove_internal: Whether to remove internal code generated during the execution of the code. For example, with plt.plot and arr, arr2 being ProxiedObj objects, if remove_internal is False, the internal code generated by accessing arr and arr2 during the call plt.plot(arr, arr2) (such as var13 = arr.ndim) will not be removed.
    remove_export_type: Whether to remove unnecessary type exports, such as str(var).

ProxiedObj

ProxiedObj is the type of object returned by ObjChain's new_object() and add_existing_obj() methods. It can be used as a substitute for any regular object, though it is generally not recommended to directly use the methods and properties of the ProxiedObj class itself.

Implementation Details

The ObjChain class tracks all objects added to an ObjChain as well as the objects derived from them, and it maintains a namespace dictionary containing the tracked objects to be used when calling exec to execute its own generated code.
Each ProxiedObj object belongs to an ObjChain. All special magic methods (such as __call__, __getattr__) of the ProxiedObj class are overridden. The overridden methods both record the call history into the associated ObjChain and call the same magic method on the object's proxy target (__target_obj, if available).
When operations on a ProxiedObj return a new object (such as when obj.attr returns a new attribute), the new object will also be tracked by the ObjChain, forming a long chain of all derived objects starting from the first object within the ObjChain.
If the ProxiedObj has a __target_obj attribute, magic method calls on the ProxiedObj will synchronously call the corresponding magic method on the __target_obj and pass the result to the next ProxiedObj as its __target_obj property.
If the __target_obj attribute does not exist, the ProxiedObj will not synchronously call the magic method. Instead, it will generate a record of the call code, temporarily storing it in the ProxiedObj until an export (export) method or attribute is needed, at which point all accumulated code is executed at once and the result is returned.

Principle of Code Optimization

In the code, the dependency relationship between variables can be represented as a graph. For instance, the statement y = func(x) can be represented as an edge from the node x to y.
However, since in the code generated by ProxiedObj each object corresponds to a unique variable and the variables cannot be reassigned (similar to JavaScript's const), the result is a directed acyclic graph (DAG).
During optimization, variables that affect 0 or 1 other variables (i.e., that point to 0-1 other nodes) are first identified. If a variable affects only one other variable, its value is inlined into the dependent statement; otherwise, the variable is simply removed.
For example:

temp_var = [1, 2, 3]
unused_var = func(temp_var)

Here, temp_var only has one edge pointing to unused_var, while unused_var does not point to any other node.
By inlining the value of temp_var into func(temp_var), the code becomes unused_var = func([1,2,3]). After removing unused_var, the optimized code is func([1, 2, 3]).

Module: pyobj_extension

This module is written in C and can be imported directly using import pyobject.pyobj_extension as pyobj_extension. It includes the following functions:

convptr(pointer):

Converts an integer pointer to a Python object, as a reverse of id().

py_decref(obj):

Decreases the reference count of an object.

py_incref(obj):

Increases the reference count of an object.

getrealrefcount(obj):

Get the actual reference count of the object before calling this function.
Unlike sys.getrefcount(), this function does not consider the additional reference count that is created when the function is called. (The difference is the constant _REFCNT_DELTA)
For example, getrealrefcount([]) will return 0, because after exiting getrealrefcount, the list [] is no longer referenced by any object, whereas sys.getrefcount([]) will return 1.
Additionally, a=[]; getrealrefcount(a) will return 1 instead of 2.

setrefcount(obj, n):

Set the actual reference count of the object (before calling the function) to n.
This is the opposite of getrealrefcount() and also does not consider the additional reference count created when the function is called.

getrefcount_nogil(obj) and setrefcount_nogil(obj, ref_data):

In the GIL-free version of Python 3.14+, get and set reference counts, where ref_data is (ob_ref_local, ob_ref_shared), without considering the reference counts added during the call. (Experimental)

Warning: Improper use of these functions above may lead to crashes.

list_in(obj, lst):

Determine whether obj is in the sequence lst. Compared to the built-in Python call "obj in lst" that invokes the "==" operator (__eq__) multiple times, this function directly compares the pointers to improve efficiency.

Current Version of pyobject: 1.3.2

Change Log

2025-6-23(v1.3.2): Added the use_exported_obj parameter to the pyobject.objproxy module and further optimized the performance.
2025-6-6(v1.3.0): Optimized the performance of the pyobject.objproxy module.
2025-4-30(v1.2.9): Improved and enhanced the sub-module pyobject.objproxy, and renamed the sub-module pyobject.code_ to pyobject.code.
2025-3-31(v1.2.8): Renamed pyobject.super_proxy to pyobject.objproxy and officially released it; modified the pyobject.pyobj_extension module.
2025-3-6 (v1.2.7): Added support for special class attributes excluded from dir() (such as __flags__, __mro__) in pyobject.browser and modified the pyobj_extension module.
2025-2-15 (v1.2.6): Fixed the lag issue when browsing large objects in pyobject.browser, improved the pyobject.code_ module, introduced a new reflection library pyobject.super_proxy currently in development, and added getrefcount_nogil and setrefcount_nogil to the pyobj_extension module.
2024-10-24 (v1.2.5): Fixed high DPI support for pyobject.browser on Windows, modified the pyobj_extension module, along with other improvements.
2024-08-12 (v1.2.4): Added support for Python versions 3.10 and above in pyobject.code_; further optimized search performance in the search module, along with various other fixes and improvements.
2024-06-20 (v1.2.3): Updated the .pyc file packing tool in the test directory of the package, and enhanced the object browser in pyobject.browser with new features such as displaying lists and dictionary items, back, forward, refresh page options, as well as adding, editing, and deleting items.
2022-07-25 (v1.2.2): Added a C language module pyobj_extension for manipulating Python's underlying object references and object pointers.
2022-02-02 (v1.2.0): Fixed several bugs and optimized the performance of the search module; added the Code class in code_, introduced editing properties functionality in browser, and added doctests for the Code class.

About

A multifunctional utility tool for operating internal python objects, compatible with nearly all Python 3 versions. 一个提供操作对象底层工具的多功能Python包, 支持几乎所有Python 3版本。

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published