Skip to content

sandboxnu/major-scraper

Repository files navigation

GraduateNU Major Scraper

This repo houses GraduateNU's major requirements scraper. It scrapes the Northeastern Academic Catalog.

Setup

Clone the repo and run:
pnpm install

Running

After install in dependencies you can run the scraper with:
pnpm scrape.

The scraper scrapes the current catalog by default, but you can specify one or more years for it to scrape as command line arguments. For example to scrape the catalog for 2021, 2022, and the current year, you'd write the following:
pnpm scrape 2021 2022 current

This will populate the results folder with parsed JSON files and the catalogCache folder with cached HTML.

There is a separate command that can scrape a single academic catalog log by providing a link. To do that, run the following:
pnpm scrape-link <link>

About

Scraping Northeastern's Academic Catalog for use in GraduateNU.

Resources

License

Stars

Watchers

Forks

Contributors 5