Releases: LLukas22/llm-rs-python
Custom RoPE support & Small Langchain bugfixes
Better HuggingfaceHub Integration
Simplified the interaction with other GGML based repos. Like TheBloke/Llama-2-7B-GGML created by TheBloke.
Stable GPU Support
Fixed many gpu acceleration bugs in rustformers\llm and improved performance to match native ggml.
Experimental GPU support
Adds support for Metal/CUDA and OpenCL acceleration for LLama-based models.
Adds CI for the different acceleration backends to create prebuild binaries
Added 🌾🔱 Haystack Support + BigCode-Models
- Added support for the haystack library
- Support "BigCode" like models (e.g. WizardCoder) via the
gpt2architecture
Added 🦜️🔗 LangChain support
Merge pull request #21 from LLukas22/feat/langchain Add LangChain support
Added Huggingface Tokenizer Support
AutoModel compatible models will now use the official tokenizers library, which improves the decoding accuracy, especially for all non llama based models.
If you want to specify a tokenizer manually, it can be set via the tokenizer_path_or_repo_id parameter. If you want to use the default GGML tokenizer the huggingface support can be disabled via use_hf_tokenizer.
Fixed GPT-J quantization
0.2.8 GPT-J quantization bugfix
Added other quantization formats
Added support for q5_0,q5_1 and q8_0 formats.
Streaming support
Added the stream method to each model, which returns a generator that can be consumed to generate a response.