How embed/use llama-cli as a subprocess of another application/script in raw completion mode. #15171

tarruda · 2025-08-08T12:32:18Z

tarruda
Aug 8, 2025

Is it possible to use llama-cli as an embedded subprocess where:

It continuously accepts prompts from stdin. In other words The parent process would give it raw text (including special tokens) and llama-cli would output text until the EOS is emitted (which would also be outputted by llama-cli, so that the parent would know the generation stopped).
When the parent sends subsequent prompts, it would not continue the previous conversation, but process as a entirely new completion (but also being able to use kv-cache to speed processing for matching prefix).

Basically a completion server, but where the completion prompts are read from stdin in a loop.

I know about llama-server, but what I'm looking for is a lower level way to embed llama in an application without using it as a library.

I saw some options such as --simple-io and -no-cnv, but from my local testing I could never get it to work in the way I described above. For example:

I always have to pass a prompt with -p or -f, or else llama-cli will exit immediately
If I pass a prompt, it will persist and accept new messages, but always continuing the existing conversation.

If such mode doesn't exist yet, is it something that could be accepted as a PR?

Answered by bandoti

Aug 8, 2025

Yeah, that is probably the way to go instead of trying to hack llama-cli. Generally speaking, integrating an app with the server APIs is a great way to integrate with various programming languages without having to deal with lower-level details. :)

I will post here an update once I release my app in case you're curious. It has many benefits as well, which I will expand on in the release notes.

View full answer

bandoti · 2025-08-08T13:10:00Z

bandoti
Aug 8, 2025
Collaborator

I am not sure if this is precisely what you're after, but I am going to be releasing a tool in the coming days that provides a powerful scripting environment with llama-cli compatibility. This allows for an easy way to write scripts or run quick llama-cli one-liners.

I will have to experiment in the "raw data" case, however. Do you mean you want no templates applied, just speaking to the AI in conversation mode but no template application and such? For what its worth, reading from stdin and sending to the AI using my tool will be really simple endeavour:

set agent [ai new -vulkan]
$agent run -m ./models/gemma-3-4b-it-qat-Q4_K_M.gguf ...
while {[gets stdin line] >= 0 && [$agent running?]} {
    $agent send $line
}

0 replies

tarruda · 2025-08-08T13:51:11Z

tarruda
Aug 8, 2025
Author

Do you mean you want no templates applied, just speaking to the AI in conversation mode but no template application and such

Yes, I'd like just a simple completion. However, I just found out that llama-server also has a non-chat completions endpoint which can probably be used for what I wanted. Will play with it later.

2 replies

bandoti Aug 8, 2025
Collaborator

Yeah, that is probably the way to go instead of trying to hack llama-cli. Generally speaking, integrating an app with the server APIs is a great way to integrate with various programming languages without having to deal with lower-level details. :)

I will post here an update once I release my app in case you're curious. It has many benefits as well, which I will expand on in the release notes.

Answer selected by tarruda

tarruda Aug 8, 2025
Author

Your project seems interesting, looking forward to it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How embed/use llama-cli as a subprocess of another application/script in raw completion mode. #15171

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How embed/use llama-cli as a subprocess of another application/script in raw completion mode. #15171

Uh oh!

tarruda Aug 8, 2025

Replies: 2 comments · 2 replies

Uh oh!

bandoti Aug 8, 2025 Collaborator

Uh oh!

tarruda Aug 8, 2025 Author

Uh oh!

bandoti Aug 8, 2025 Collaborator

Uh oh!

tarruda Aug 8, 2025 Author

tarruda
Aug 8, 2025

Replies: 2 comments 2 replies

bandoti
Aug 8, 2025
Collaborator

tarruda
Aug 8, 2025
Author

bandoti Aug 8, 2025
Collaborator

tarruda Aug 8, 2025
Author