Skip to content

A modern Python library for writing maintainable web scrapers.

License

Notifications You must be signed in to change notification settings

jamesturk/spatula

Overview

spatula is a modern Python library for writing maintainable web scrapers.

Please note, the official repository has changed to Codeberg; GitHub will only be used as a mirror.

Source: https://codeberg.org/jpt/spatula/

Documentation: https://jamesturk.github.io/spatula/

Issues: https://codeberg.org/jpt/spatula/issues

PyPI badge

Features

  • Page-oriented design: Encourages writing understandable & maintainable scrapers.
  • Not Just HTML: Provides built in handlers for common data formats including CSV, JSON, XML, PDF, and Excel. Or write your own.
  • Fast HTML parsing: Uses lxml.html for fast, consistent, and reliable parsing of HTML.
  • Flexible Data Model Support: Compatible with dataclasses, attrs, pydantic, or bring your own data model classes for storing & validating your scraped data.
  • CLI Tools: Offers several CLI utilities that can help streamline development & testing cycle.
  • Fully Typed: Makes full use of Python 3 type annotations.

About

A modern Python library for writing maintainable web scrapers.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project

 

Contributors 2

  •  
  •  

Languages