Skip to content

implement the model to reside in memory, rather than reloading the model initialization every time you execute the system command, which is inefficient. #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
CementZhang opened this issue Apr 18, 2025 · 3 comments

Comments

@CementZhang
Copy link

Your inference core logic function:

def run_command(command: List[str]) -> str:
"""Run a system command and capture its output."""
try:
result = subprocess.run(command, check=True, capture_output=True, text=True)
return result.stdout
except subprocess.CalledProcessError as e:
raise HTTPException(status_code=500, detail=f"Error occurred while running command: {e}")
except Exception as e:
raise HTTPException(status_code=500, detail=f"Unexpected error: {e}")

This is the same as the way run_inference.pyrun_inference.py executes system commands. Your project simply encapsulates run_command. If you provide API interfaces as a server, you can consider using the resident memory mode instead of loading the model into memory from scratch every time (this is time-consuming and inefficient). For details, you can refer to the "llama.cpp" method and use "llama_cpp.server" to provide services。

@grctest
Copy link
Owner

grctest commented Apr 22, 2025

Thanks, I'll look to include this fix soon as well as support for the conversational mode required to use Microsoft's latest official BitNet model.

@abstratium-dev
Copy link

isn't it just easier to use llama_server, as described here: https://github.com/abstratium-informatique-sarl/bitnet-llm?tab=readme-ov-file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants