Autoscaling

Vertical autoscaling for a fleet of postgres instances running in a Kubernetes cluster.

For more on how Neon's Autoscaling works, check out https://neon.tech/docs/introduction/autoscaling.

Development status

Autoscaling is used internally within Neon, and makes some minor assumptions about Neon-specifics.

We do not officially support use of autoscaling externally — in other words, you're welcome to try it out yourself, submit bugs, fork the code, etc., but we make no guarantees about timely responses to issues from running locally.

For help from the community, check out our Discord: https://neon.tech/discord.

Quick access

The deployment files and a vm-builder binary are attached to each release.

Check out Building and running below for local development.

How it works

We want to dynamically change the amount of CPUs and memory of running postgres instances, without breaking TCP connections to postgres.

This relatively easy when there's already spare resources on the physical (Kubernetes) node, but it takes careful coordination to move postgres instances from one node to another when the original node doesn't have the room.

We've tried a bunch of existing tools and settled on the following:

Use VM live migration to move running postgres instances between physical nodes
QEMU is used as our hypervisor
NeonVM orchestrates NeonVM VMs as custom resources in K8s, and is responsible for scaling allocated resources (CPU and memory)
A modified K8s scheduler ensures that we don't overcommit resources and triggers migrations when demand is above a pre-configured threshold
Each K8s node has an autoscaler-agent pod that triggers scaling decisions and makes resource requests to the K8s scheduler on the VMs' behalf to reserve additional resources for them
Each compute node runs the vm-monitor binary, which communicates to the autoscaler-agent so that it can immediately respond to memory pressure by scaling up (among other things).
For Neon's postgres instances, we also track cache usage and potentially scale based on the heuristically determined working set size, which dramatically speeds up OLTP workloads.

Networking is preserved across migrations by giving each VM an additional IP address on a bridge network spanning the cluster with a flat topology; the L2 network figures out "by itself" where to send the packets after migration.

For more information, refer to ARCHITECTURE.md.

Building and running

Note

NeonVM and Autoscaling are not expected to work outside Linux x86.

Install dependencies

To run autoscaling locally you need to install dependencies:

kubectl
kind/k3d
kuttl (for e2e tests)

Running locally

Build NeonVM Linux kernel (it takes time, can be run only once)

make kernel

Build docker images:

make docker-build

Start local cluster with kind or k3d:

make kind-setup # or make k3d-setup

Deploy NeonVM and Autoscaling components

make deploy

Build and load the test VM:

make vm-examples

Start the test VM:

kubectl apply -f vm-deploy.yaml

Running pgbench

Broadly, the run-bench.sh script just exists to be expensive on CPU, so that more vCPU will be allocated to the vm. You can run it with:

scripts/run-bench.sh
# or:
VM_NAME=postgres16-disk-test scripts/run-bench.sh

Running `allocate-loop`

To test on-demand memory reservation, the allocate-loop binary is built into the test VM, and can be used to slowly increasing memory allocations of arbitrary size. For example:

# After ssh-ing into the VM:
cgexec -g memory:neon-test allocate-loop 256 2280
#^^^^^^^^^^^^^^^^^^^^^^^^^               ^^^ ^^^^
# run it in the neon-test cgroup  ;  use 256 <-> 2280 MiB

E2E tests

To run the end-to-end tests, you need to have kuttl installed. You can run the tests with:

make e2e

`make run-e2e`

During active development, when the kernel is already built and cluster created, one can do make run-e2e to test current code, which resolves to make deploy vm-examples e2e.

Contributing

Splitting PRs into commits is preferred, as it allows for cleaner git history and makes the review easier.

For all commits, we require PR number to be present in the commit subject:

  neonvm: Remove neonvm-runner version [2/2] (#1381)
                                             ^ like that

This happens automatically when PRs are merged with squash. When PRs are merged with rebase, we have a helper script scripts/git-pr-number.sh.

Therefore, workflow becomes:

Create your commits locally.
Create a PR.
Run scripts/git-pr-number.sh, it will detect there is a PR open and will adjust the commit subjects.
git push --force-with-lease origin <YOUR BRANCH>. Warning: Using git push -f (force push) can overwrite remote branch history and delete commits made by others. It is safer to use git push --force-with-lease, which ensures the force push only proceeds if the remote branch has not been updated since your last fetch.

Name		Name	Last commit message	Last commit date
Latest commit History 1,346 Commits
.github		.github
autoscale-scheduler		autoscale-scheduler
autoscaler-agent		autoscaler-agent
cluster-autoscaler		cluster-autoscaler
doc/vm-builder		doc/vm-builder
k3d		k3d
kind		kind
neonvm-controller		neonvm-controller
neonvm-daemon		neonvm-daemon
neonvm-kernel		neonvm-kernel
neonvm-runner		neonvm-runner
neonvm-vxlan-controller		neonvm-vxlan-controller
neonvm		neonvm
pkg		pkg
scripts		scripts
tests		tests
vm-builder		vm-builder
vm-examples/pg16-disk-test		vm-examples/pg16-disk-test
.dockerignore		.dockerignore
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
ARCHITECTURE-network-diagram.org		ARCHITECTURE-network-diagram.org
ARCHITECTURE-network-diagram.png		ARCHITECTURE-network-diagram.png
ARCHITECTURE.md		ARCHITECTURE.md
LICENSE		LICENSE
LOGGING.md		LOGGING.md
Makefile		Makefile
README-NeonVM.md		README-NeonVM.md
README.md		README.md
UPDATING-K8S-CHECKLIST.md		UPDATING-K8S-CHECKLIST.md
UPDATING-K8S.md		UPDATING-K8S.md
go-base.Dockerfile		go-base.Dockerfile
go.mod		go.mod
go.sum		go.sum
requirements.txt		requirements.txt
scripts-common.sh		scripts-common.sh
versions.env		versions.env
vm-deploy.yaml		vm-deploy.yaml
vmscrape.yaml		vmscrape.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autoscaling

Development status

Quick access

How it works

Building and running

Install dependencies

Running locally

Running pgbench

Running `allocate-loop`

E2E tests

`make run-e2e`

Contributing

About

Uh oh!

Releases 156

Packages

Uh oh!

Uh oh!

Contributors 35

Languages

License

neondatabase/autoscaling

Folders and files

Latest commit

History

Repository files navigation

Autoscaling

Development status

Quick access

How it works

Building and running

Install dependencies

Running locally

Running pgbench

Running allocate-loop

E2E tests

make run-e2e

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 156

Packages 0

Uh oh!

Uh oh!

Contributors 35

Languages

Running `allocate-loop`

`make run-e2e`

Packages