Skip to content

Releases: OpenCSGs/llm-inference

v0.1.0

10 Apr 06:17
f8be0c5
Compare
Choose a tag to compare

What's Changed

  • Format python code using autopep8 and add pylint by @jasonhe258 in #12
  • Fix default pipeline output problem by @jasonhe258 in #17
  • Enable multiple workers cooperated on batch prompt by @depenglee1707 in #19
  • Push image to opencsg registry by @SeanHH86 in #25
  • Refine some parameters for initialization(will keep refining...) and fix Qwen issues by @depenglee1707 in #24
  • Remove initializer: transformerpipeline by @depenglee1707 in #26
  • Update warmup mechanism by @depenglee1707 in #27
  • Fix output pipeline format not work by @depenglee1707 in #29
  • Fix issue from_pretrain has no parameter device by @depenglee1707 in #28
  • Enable warmup for defaulttransformers by @depenglee1707 in #30
  • Break non text generation model by warmup by @depenglee1707 in #31
  • Fix bug for set pad_token by @SeanHH86 in #32
  • Devicemap not work on mps, since put data in wrong device, fix it by @depenglee1707 in #34
  • Refine config for model: deepseek-coder-1.3b by @depenglee1707 in #35
  • Refactor the "defaulttransformers" to meet the common design of class "pipeline" by @depenglee1707 in #36
  • Fix broken yamls by @depenglee1707 in #39
  • Add opencsg-deepseek-coder-1.3b by @SeanHH86 in #38
  • Remove some abandoned implements by @depenglee1707 in #40
  • Fix issue caused by huggingface pipeline with text-generation by @depenglee1707 in #41
  • Update config.py for new model by @SeanHH86 in #42
  • Update deepseek yaml file by @SeanHH86 in #43
  • Add Qwen1.5-72B-chat by @SeanHH86 in #44
  • Add parameter for timeout by @SeanHH86 in #46
  • Fix max-token conflict w/ DS by @depenglee1707 in #49
  • Fix output issue 4 ui by @depenglee1707 in #50
  • Enable "use_bettertransformer" and "torch_compile" by @depenglee1707 in #51
  • Enable chat template for huggingface transformer by @depenglee1707 in #54
  • Update ray to 2.9.3 by @SeanHH86 in #56
  • Enable prompt template for gguf format inference by @depenglee1707 in #57
  • Refactor the solution of vllm integration by @depenglee1707 in #60
  • Fix load json data with '\n' failed by @SeanHH86 in #62
  • Fix json format issue for "transformerpipeline" by @depenglee1707 in #63
  • Enable chat template applied for vllm integration by @depenglee1707 in #65
  • Add streaming API support by @SeanHH86 in #66
  • Make scale out policy consistent between deployments by @depenglee1707 in #70
  • Add Qwen1.5-72B-GGUF yaml and fix load json input error by @SeanHH86 in #71
  • Correct vllm version by @depenglee1707 in #73
  • Fix generate bug for stream api of llamacpp by @SeanHH86 in #74
  • Fix stream without prompt format by @SeanHH86 in #75
  • Fix path params issue, make interface consistent by @depenglee1707 in #78
  • Enhance name of router for comparation scenario by @depenglee1707 in #79
  • Fix issue: stream generation is slow by @depenglee1707 in #80
  • Fix prompt is not string bug by @SeanHH86 in #81
  • Refactor streaming by @depenglee1707 in #82
  • Enhance llamacpp integration to share soma logic between streaming and predict by @depenglee1707 in #83
  • Fix issue: non-support streaming pipeline cannot work when call it as stream by @depenglee1707 in #84

New Contributors

Full Changelog: v0.0.1...v0.1.0

v0.0.1 tag release

05 Mar 11:54
c725258
Compare
Choose a tag to compare

First tag release for the project.

What's Changed

  • Set min replica to 1 for opt-125m by @SeanHH86 in #1
  • replace / to -- in model id by @SeanHH86 in #2
  • enhance model if for cli by @jasonhe258 in #4
  • Fix loading issue for non text-generation models by @depenglee1707 in #5
  • Fix output issue for default transformers pipeline by @jasonhe258 in #6
  • Add cn readme and license by @jasonhe258 in #7

New Contributors

Full Changelog: https://github.com/OpenCSGs/llm-inference/commits/v0.0.1