Skip to content
View 1kkiRen's full-sized avatar

Highlights

  • Pro

Block or report 1kkiRen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
1kkiRen/README.md

Dmitrii Kuzmin — NLP Engineer & Researcher

Helping large language models understand the world a bit better. I build and adapt tokenizers, fine-tune multimodal models, and streamline LLM pipelines for production-grade use.

Portfolio Email Download CV Profile Views

Jump to what interests you

Quick highlights

  • Research Intern at Mohamed bin Zayed University of Artificial Intelligence (Jun 2025 – present) exploring alternative tokenization methods and preparing publication-ready research.
  • Middle NLP Engineer at DeepPavlov (May 2025 – present) driving R&D, benchmarking LLMs, and working with GPU stacks from diverse vendors.
  • Previously at Center for Applied AI (Skolkovo), Higher School of Economics, Moscow Aviation Institute, and Innopolis University, shipping tokenizer tooling, fine-tuning Qwen and Llama models, and deploying NLP services.
  • Active open-source maintainer of tokenizer tooling.

What I work with

NLP & Deep Learning Stack

PyTorch Transformers Tokenizers LangChain NumPy pandas

DevOps & Tooling

Git Docker Linux Bash

Backend & Communication

MongoDB Telegram Bot API

Languages & soft skills
  • English (proficient)
  • Russian (native)
  • Flexibility · Responsibility · Curiosity

Recent experience

Research Intern · MBZUAI — Abu Dhabi, UAE (Jun 2025 – present)
  • Design and evaluate alternative tokenization strategies for LLM inference.
  • Author an academic paper on tokenizer-driven performance gains.
Middle NLP Engineer · DeepPavlov — Moscow, Russia (May 2025 – present)
  • Running R&D initiatives and evaluation workflows.
  • Running comparative testing on GPU infrastructures from Chinese manufacturers.
Middle NLP Engineer · Center for Applied AI, Skolkovo — Moscow, Russia (Feb 2025 – May 2025)
  • Tuned the Qwen2.5-VL model and built supporting pipelines.
  • Designed prompting strategies to generate actionable feedback on heterogeneous specifications.
NLP Researcher · Higher School of Economics — Moscow, Russia (Jun 2024 – May 2025)
  • Fine-tuned Llama3-8B-Instruct for Russian-language tasks.
  • Developed a Russian BPE tokenizer and tooling to manipulate existing vocabularies safely.
  • Built a grammar benchmark suite to quantify improvements across downstream tasks.
ML / Backend Engineer · Moscow Aviation Institute — Moscow, Russia (Jul 2023 – Oct 2023)
  • Delivered a sentence theme classifier and optimized database queries.
  • Integrated Telegram-based interfaces for model delivery.
NLP Engineer · Innopolis University — Innopolis, Russia (Jun 2023 – Jul 2023)
  • Developed a deep-learning sentiment model for YouTube comments.
  • Fine-tuned BERT for domain-specific tone classification.

Publications & research

Rethinking Tokenization — EACL 2026 (under review)

Researcher & writer, 2025. Investigates how alternative tokenizations of the same text impact LLM inference quality.

TokenSubstitution — ACL 2026 (in progress)

Proposes cost-effective adaptation approach for improving the performance of LLM generation in target language.

Multi-Aspect Tokenizer Evaluation — Russian AI Journey 2025 (accepted)

Demonstrates tokenizer adaptation as a cost-effective technique by analyzing text quality and token efficiency across diverse benchmarks

Open-source projects

TokenizerChanger — modify tokenizers
EmbeddingsDivision — adapt LLM embeddings
CRUD Calendar LLM Chatbot — Telegram assistant
  • Features: calendar CRUD, summarise latest news, voice reminders.
  • Stack: Telegram Bot API, FastAPI, RAG pipeline with Qwen2.5-VL.

Education

Innopolis University — B.S. in Data Analysis & Artificial Intelligence (2022 – 2026)
Key coursework: Software Systems Analysis and Design, Human-AI Interaction, Mathematical Analysis.

Beyond work

  • Tutor for first-year students at Innopolis University (Sep 2023 – Jan 2024), helping newcomers acclimate and organizing community events.
  • Always exploring ways to make LLM tooling more accessible and efficient.

Let’s connect


Pinned Loading

  1. Tokenizer-Changer Tokenizer-Changer Public

    Python script for manipulating the existing tokenizer.

    Python 20 1

  2. Embeddings-Division Embeddings-Division Public

    Python script for dividing embedding layer of LLM.

    Python

  3. Crossy-Road-Course-Project Crossy-Road-Course-Project Public

    Python 2

  4. 1kkiRen.github.io 1kkiRen.github.io Public

    Svelte