A ready-to-use VS Code Dev Container for doing data science in Haskell with Jupyter notebooks.
This repo is batteries included environment for the DataHaskell ecosystem:
- VS Code + Dev Containers
- IHaskell Jupyter kernel
- DataHaskell libraries (e.g.
dataframe,hasktorch, etc.) - Example notebooks to get you started
- You prefer an easy to use, pinned environment over installing everything locally.
- You like notebooks (Jupyter / VS Code) for exploration.
If you’re new to DataHaskell, this is the recommended way to get a working setup.
You can set up a devcontainer in your current folder by running the command below:
- Linux/MacOS:
curl -sSL https://raw.githubusercontent.com/DataHaskell/datahaskell-starter/refs/heads/main/setup.sh | sh
When you next open VS Code you should see a modal asking if you want to re-open the project in a devcontainer.
hscript allows you to run GHCi scripts without a main and with inline imports. It enables writing code like:
:set -package text
:set -XOverloadedStrings
:set -XTemplateHaskell
import qualified DataFrame as D
import qualified DataFrame.Functions as F
import Data.Text (Text)
import DataFrame ((|>))
import DataFrame.Functions ((.==), (.>))
-- Read Iris dataset
iris <- D.readParquet "../dataframe/data/iris.parquet"
-- Filter large setosas
iris |>
D.filterWhere (F.col @Text "variety" .== "Setosa") |>
D.filterWhere (F.col @Double "sepal.length" .> 5.4)
-- Declare column variables
_ = (); F.declareColumns iris
-- Create a new feature
D.derive "ratio" (sepal_width / sepal_length) irisLinux/MacOS: Download https://raw.githubusercontent.com/DataHaskell/datahaskell-starter/refs/heads/main/hscript.sh and add it to your PATH.
You can then run files like the one above by typing:
hscript runme.hsYou’ll need:
- VS Code
- Docker (Docker Desktop on macOS/Windows, or Docker Engine on Linux)
- VS Code extensions:
- Jupyter
- Dev Containers
- Haskell (recommended)
You do not need to install GHC, Cabal, or IHaskell on your host machine. Everything lives inside the container.