Skip to content

Using and Playing Nice On the Compute Server

Claire Le Goues edited this page Aug 17, 2023 · 1 revision

squaresLab has two shared servers: one relatively beefy compute server that you can use to run experiments of all stripes, and a GPU server that was thrown together on the cheap using GPUs donated to our group by LTI. We do not have explicit job or process management; instead, we trust users to run things appropriately to avoid hogging all the resources.

This wiki page provides basic guidelines on sharing the resources, but is not comprehensive on how to do systems-y things at the command line in Linux; for that, we encourage you to educate yourself using the many sources available on the internet, and ask for help for things you find confusing.

Basic Server Info

hostname: rhombus@ (CPU), quaternion@ (GPU)

Contact me (Claire) for usernames (specify which machine). Change your password on first login. When you sell or replace a computer, please clear out your credentials first. There is a true story here about cryptomining that I am eliding for brevity.

Most of our interaction is conducted via ssh; different users have different preferences for managing sessions (tee, tmux, etc). Ask around for suggestions if you need help.

A random collection of server specs: FIXME

Join the #serverandtech channel on Slack for discussion and updates; also useful for discussing running of things.

Backups and failure recovery: The disk is RAID-5 for protection against random failure. Backups are done regularly (weekly, in principle) but note that restoring from backup is kind of a PITA; it's not as easy as like, restoring from source control. You should only rely on that as a "last resort" kind of approach.

If you notice a job running away with resources, and it's not yours, give a shout on #serverandtech (or #general if it's desperate) to get someone's attention, if you can. Life is better if we catch things like that before the machine becomes totally unusable.

Actually hard-rebooting the machine involves me (Claire) walking to a room several buildings away and entering a PIN and swiping a card to access a box with ANOTHER key that opens a lock containing a swipe card that...needless to say, it's annoying. Please try to help avoid this situation; it takes a village.

Installing things

You do not have root access. You don't want it. Trust me.

"But Claire, I need X, Y, and Z packages installed!"

You have a few options:

  1. run everything in a container (Docker, Vagrant, or VBox). This should be your default behavior anyway, see below.
  2. install locally. There are many different ways to do this depending on what you're trying to do (pip installs user-level, for example).
  3. Hop on Slack, #serverandtech, and send messages of the form apt-get install PACKAGENAMES and someone with root (me or Chris) will install it for you as quickly as we see it (usually pretty quickly). Please be sure you provide the specific specific package names you need; we're bad at guessing.

Playing nice

Here are some questions you should ask about your code before you run it:

"What is the most memory this can consume?"

"Can this eat all CPUs?"

"Is the thing I'm running secretly a cryptominer?"

"Will this chew up the entire HD?"

...It's not actually that simple, unfortunately, because no one runs a process expecting it to take over a full server. So instead, here are some principles and practices:

(0) be mindful of deadlines.

(1) Containers. Containerize your work, and limit your container resource usage. Containers in general and perhaps even docker in particular are good for many reasons. They simplify reproducibility, for both yourself and the other consumers of our research. You can get what you need installed at what looks like user-level and set up any kind of weird library you want without having to convince someone with root to install things for you.

Docker is a good choice by default; Chris tells me Vagrant may be better for RAM-intensive jobs; some students have had success with running VirtualBox in headless mode (if you need something a bit more heavyweight). Please use them, with the caveats that (A) they don't necessarily resource-limit by default, so you do need to specify/restrict them yourself e.g., the -m16g option to docker limit the container to no more than 16 gigs, which will at least help your process from using literally all of the RAM, and (B) clean up/prune old images periodically or docker will eventually silently eat up all the space. docker has good documentation on this: https://docs.docker.com/config/pruning/; seek out similar resources for your other containerization choices.

(2) Docker part 2: we have Docker set up rootless because of a security problem I can never remember. You need to do this: https://docs.docker.com/engine/security/rootless/#install

(2) speaking of disk space, here is a parable: When I renovated my house, I wanted to increase the size of the deck, because it was always too full during parties. My spouse said "but Claire, if we get a bigger deck, you'll just invite more people over." And he was right, and we made it bigger, and now I invite more people over.

Point is: There is never enough room on the server HD. I can make it bigger over time, but if I do that, it will still fill up, so I resist as much as I can.

Please balance (A) having adequate logs to be able to analyze/replicate experiments with (B) not taking up multiple TB of disk space that you don't need. When you are done running experiments, please clean up by deleting extraneous files that are not critical to either replication or further analysis. Prune your containers. Etc.

Clone this wiki locally