Project Adaptation in Code Completion
via In-Context Learning

The source codes for bachelor's thesis

Abstract • Contributions • Structure • Installation • Reproduction • License • Acknowledgments • Citation

Abstract

This thesis addresses the challenge of enhancing code completion models with repository-level context awareness. Modern completion systems struggle with information dispersed across large codebases, limiting their performance. The work presents a context composition framework that extracts relevant repository information and a fine-tuning pipeline for model adaptation, evaluated through systematic experimentation. The research demonstrates that context selection strategy significantly impacts completion quality during inference, while repository-level pre-training preserves in-context learning capabilities. Notably, the study demonstrates that computational requirements for context window extension can be substantially reduced while maintaining competitive performance, advancing code completion by enabling better integration of project-wide information.

Keywords: repository-level code completion, project adaptation, in-context learning, long context, context extension, resource efficiency, Transformer, Code LLM

Contributions

Implemented

Context Composition Framework (incontext): Modular and flexible package to extract and compose relevant information from software repositories.
Fine-Tuning Pipeline (pipeline): End-to-end pipeline for project adaptation of code completion LLMs via fine-tuning.

Demonstrated

Composition Impact on Inference: Repository context significantly impacts completion quality during inference.
Fine-Tuning on Compositions: DeepSeek-Coder-Base 1.3B shows minimal effect from context-specific fine-tuning, suggesting rooted initial training of this model.
Effect of Context Extension: Context extension preserves the in-context learning capabilities developed earlier.
Influence of Composition on Context Extension: Repository context plays a minimal role in the outcome of context extension.
Resource Efficiency: ✨ Repository-level pre-training can achieve competitive results with significantly fewer resources (73M tokens vs billions). This contribution is additionally issued in the form of tiny paper. ✨
Other, more detailed insights on the subject of the thesis.

Released

The core checkpoints are available here.

Structure

.
├── configs        # configs for pipeline and evaluation
├── datasets       # dataset head demos
├── demo           # demo for incontext package
├── evaluation     # evaluation script and outputs
├── incontext      # context composition framework
├── paper          # LaTeX source files
├── pipeline       # fine-tuning pipeline
├── requirements   # dependency files
├── runs           # experiment instances
└── thesis.pdf     # compiled thesis document

Installation

Virtual Environment

Python Version: 3.11+

python3 -m venv .venv
source .venv/bin/activate

Dependencies

pip install -r requirements/demo.txt
pip install -r requirements/evaluation.txt
pip install -r requirements/incontext.txt
pip install -r requirements/pipeline.txt

Flash Attention (optional):

pip install flash-attn==2.7.4.post1 --no-build-isolation

Reproduction

Running the Pipeline:

python3 -m pipeline \
    run_name=test \
    dataset=debugging \
    logger=local_logger/local \
    model=dseek1p3 \
    preprocessor=completion_loss_preprocessor/debugging \
    +additional_preprocessor=completion_loss_preprocessor/debugging \
    split=debugging \
    trainer=universal_trainer/debugging

The raw training dataset can be recreated using the following file:

import pandas as pd

df = pd.read_parquet('datasets/raw/datapoints.parquet')
print(df.shape)    # (361052, 3)
print(df.columns)  # ['repo', 'commit_hash', 'completion_file']

The used benchmark is available here.

License

MIT & LPPL (paper subdirectory)

Acknowledgments

This thesis is based on the author's internships at JetBrains Research and was supervised by Evgenii Glukhov, M.Sc.

Citation

@mastersthesis{sapronov2025projectadaptation,
  author       = {Maksim Sapronov},
  title        = {Project Adaptation in Code Completion via In-Context Learning},
  school       = {Czech Technical University in Prague},
  year         = {2025},
  type         = {Bachelor's thesis},
  address      = {Prague, Czech Republic},
  url          = {https://github.com/sapromak/adaptive-code-completion}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Adaptation in Code Completion
via In-Context Learning

The source codes for bachelor's thesis

Abstract

Contributions

Implemented

Demonstrated

Released

Structure

Installation

Virtual Environment

Dependencies

Reproduction

License

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
configs		configs
datasets		datasets
demo		demo
evaluation		evaluation
incontext		incontext
paper		paper
pipeline		pipeline
requirements		requirements
runs		runs
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
thesis.pdf		thesis.pdf

License

sapromak/adaptive-code-completion

Folders and files

Latest commit

History

Repository files navigation

Project Adaptation in Code Completionvia In-Context Learning

The source codes for bachelor's thesis

Abstract

Contributions

Implemented

Demonstrated

Released

Structure

Installation

Virtual Environment

Dependencies

Reproduction

License

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Project Adaptation in Code Completion
via In-Context Learning

Packages