GitHub - liminalfever/a-small-concept-model: Working prototype for a Small Concept Model (SCM) based on Meta's Large Concept Model (LCM), with a custom embedding decoder for vec-to-text conversion.

Working prototype for a Small Concept Model (SCM) based on Meta's Large Concept Model (LCM), with a custom embedding decoder for vec-to-text conversion.

On the root of this project, you can find:

train_inversion.ipynb, a notebook that trains an embedding inversion model based on prefix tuning (Figure 1). By default, it trains a PreNet to invert paraphrase-multilingual-MiniLM-L12-v2 sentence-level embeddings.

Figure 1. Scheme of the architecture of the embedding inversion model.

train_scm.ipynb, a notebook that trains the actual autoregressive small concept model (SMC), a decoder-only transformer inspired by Meta's BaseLCM designed for next-embedding prediction (Figure 2).

For a more faithful straightforward reproduction of BaseLCM, take a look at this implementation.

Figure 2. High-level scheme of the Small Concept Model (SCM).

inference_test.ipynb, a notebook where you can run inference using pretrained weights. You can test both the embedding inversion model (trained on 1 million sentences from BookCorpus), and the SCM (trained on 100k sequences of 16 sentences each from BookCorpus).

Inside the ./small_concept_model folder, you can find all the modules used to build models, load and pre-process datasets, and create the full pipleine.

Streamlit App

In this repo you can also find a Streamlit app to interact with the pretrained language models for text generation. To use it, from the root of the project run:

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.streamlit		.streamlit
resources		resources
saved_models		saved_models
small_concept_model		small_concept_model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference_test.ipynb		inference_test.ipynb
requirements.txt		requirements.txt
train_inversion.ipynb		train_inversion.ipynb
train_scm.ipynb		train_scm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Streamlit App

About

Uh oh!

Languages

License

liminalfever/a-small-concept-model

Folders and files

Latest commit

History

Repository files navigation

Streamlit App

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages