You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
implement the model to reside in memory, rather than reloading the model initialization every time you execute the system command, which is inefficient.
#1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
def run_command(command: List[str]) -> str:
"""Run a system command and capture its output."""
try:
result = subprocess.run(command, check=True, capture_output=True, text=True)
return result.stdout
except subprocess.CalledProcessError as e:
raise HTTPException(status_code=500, detail=f"Error occurred while running command: {e}")
except Exception as e:
raise HTTPException(status_code=500, detail=f"Unexpected error: {e}")
This is the same as the way run_inference.pyrun_inference.py executes system commands. Your project simply encapsulates run_command. If you provide API interfaces as a server, you can consider using the resident memory mode instead of loading the model into memory from scratch every time (this is time-consuming and inefficient). For details, you can refer to the "llama.cpp" method and use "llama_cpp.server" to provide services。
The text was updated successfully, but these errors were encountered:
Your inference core logic function:
def run_command(command: List[str]) -> str:
"""Run a system command and capture its output."""
try:
result = subprocess.run(command, check=True, capture_output=True, text=True)
return result.stdout
except subprocess.CalledProcessError as e:
raise HTTPException(status_code=500, detail=f"Error occurred while running command: {e}")
except Exception as e:
raise HTTPException(status_code=500, detail=f"Unexpected error: {e}")
This is the same as the way run_inference.pyrun_inference.py executes system commands. Your project simply encapsulates run_command. If you provide API interfaces as a server, you can consider using the resident memory mode instead of loading the model into memory from scratch every time (this is time-consuming and inefficient). For details, you can refer to the "llama.cpp" method and use "llama_cpp.server" to provide services。
The text was updated successfully, but these errors were encountered: