Overview

spatula is a modern Python library for writing maintainable web scrapers.

Please note, the official repository has changed to Codeberg; GitHub will only be used as a mirror.

Source: https://codeberg.org/jpt/spatula/

Documentation: https://jamesturk.github.io/spatula/

Issues: https://codeberg.org/jpt/spatula/issues

Features

Page-oriented design: Encourages writing understandable & maintainable scrapers.
Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
Fully Typed: Makes full use of Python 3 type annotations.

Name		Name	Last commit message	Last commit date
Latest commit History 305 Commits
.github		.github
docs		docs
src/spatula		src/spatula
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Overview

Features

About

Uh oh!

Releases 13

Sponsor this project

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

jamesturk/spatula

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 13

Sponsor this project

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages