-
Is it possible to use llama-cli as an embedded subprocess where:
Basically a completion server, but where the completion prompts are read from stdin in a loop. I know about llama-server, but what I'm looking for is a lower level way to embed llama in an application without using it as a library. I saw some options such as
If such mode doesn't exist yet, is it something that could be accepted as a PR? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
I am not sure if this is precisely what you're after, but I am going to be releasing a tool in the coming days that provides a powerful scripting environment with llama-cli compatibility. This allows for an easy way to write scripts or run quick llama-cli one-liners. I will have to experiment in the "raw data" case, however. Do you mean you want no templates applied, just speaking to the AI in conversation mode but no template application and such? For what its worth, reading from stdin and sending to the AI using my tool will be really simple endeavour: set agent [ai new -vulkan]
$agent run -m ./models/gemma-3-4b-it-qat-Q4_K_M.gguf ...
while {[gets stdin line] >= 0 && [$agent running?]} {
$agent send $line
} |
Beta Was this translation helpful? Give feedback.
-
Yes, I'd like just a simple completion. However, I just found out that llama-server also has a non-chat completions endpoint which can probably be used for what I wanted. Will play with it later. |
Beta Was this translation helpful? Give feedback.
Yeah, that is probably the way to go instead of trying to hack llama-cli. Generally speaking, integrating an app with the server APIs is a great way to integrate with various programming languages without having to deal with lower-level details. :)
I will post here an update once I release my app in case you're curious. It has many benefits as well, which I will expand on in the release notes.