|
| 1 | +# Anyparser LangChain: Seamless Integration of Anyparser with LangChain |
| 2 | + |
| 3 | +https://anyparser.com |
| 4 | + |
| 5 | +**Integrate Anyparser's powerful content extraction capabilities with LangChain for enhanced AI workflows.** This integration package enables seamless use of Anyparser's document processing and data extraction features within your LangChain applications, making it easier than ever to build sophisticated AI pipelines. |
| 6 | + |
| 7 | +## Installation |
| 8 | + |
| 9 | +```bash |
| 10 | +pip install anyparser-langchain |
| 11 | +``` |
| 12 | + |
| 13 | +## Anyparser LangChain Examples |
| 14 | + |
| 15 | +This `examples` directory contains examples demonstrating different ways to use the Anyparser LangChain integration. |
| 16 | + |
| 17 | +```bash |
| 18 | +python examples/01_single_file_json.py |
| 19 | +python examples/02_single_file_markdown.py |
| 20 | +python examples/03_multiple_files_json.py |
| 21 | +python examples/04_multiple_files_markdown.py |
| 22 | +python examples/05_load_folder.py |
| 23 | +python examples/06_ocr_markdown.py |
| 24 | +python examples/07_ocr_json.py |
| 25 | +python examples/08_crawler.py |
| 26 | +``` |
| 27 | + |
| 28 | +## Setup |
| 29 | + |
| 30 | +Before running the examples, make sure to set your Anyparser API credentials as environment variables: |
| 31 | + |
| 32 | +```bash |
| 33 | +export ANYPARSER_API_KEY="your-api-key" |
| 34 | +export ANYPARSER_API_URL="https://anyparserapi.com" |
| 35 | +``` |
| 36 | + |
| 37 | +## Examples |
| 38 | + |
| 39 | +### 1. Single File Processing |
| 40 | +- `01_single_file_json.py`: Process a single file with JSON output |
| 41 | +- `02_single_file_markdown.py`: Process a single file with markdown output |
| 42 | + |
| 43 | +### 2. Multiple File Processing |
| 44 | +- `03_multiple_files_json.py`: Process multiple files with JSON output |
| 45 | +- `04_multiple_files_markdown.py`: Process multiple files with markdown output |
| 46 | +- `05_load_folder.py`: Load and process all files from a folder (max 5 files) |
| 47 | + |
| 48 | +### 3. OCR Processing |
| 49 | +- `06_ocr_markdown.py`: Process images/scans with OCR (markdown output) |
| 50 | +- `07_ocr_json.py`: Process images/scans with OCR (JSON output) |
| 51 | + |
| 52 | +### 4. Web Crawling |
| 53 | +- `08_crawler_basic.py`: Basic web crawling with essential settings |
| 54 | + |
| 55 | +## Features Demonstrated |
| 56 | + |
| 57 | +### Document Processing |
| 58 | +- Different output formats (markdown, JSON) |
| 59 | +- Multiple file handling |
| 60 | +- Folder processing |
| 61 | +- Metadata handling |
| 62 | + |
| 63 | +### OCR Capabilities |
| 64 | +- Language support (ISO 639-2 codes) |
| 65 | +- OCR presets (fast, balanced, scan) |
| 66 | +- Image and table extraction |
| 67 | + |
| 68 | +### Web Crawling |
| 69 | +- Basic crawling with depth and scope control |
| 70 | +- Advanced URL and content filtering |
| 71 | +- Crawling strategies (BFS, LIFO) |
| 72 | +- Rate limiting and robots.txt respect |
| 73 | + |
| 74 | +## Notes |
| 75 | + |
| 76 | +- All examples use async/await for better performance |
| 77 | +- Error handling is included in all examples |
| 78 | +- Each example includes detailed comments explaining the options used |
| 79 | +- OCR examples support multiple languages |
| 80 | +- Crawler examples demonstrate various filtering and control options |
| 81 | + |
| 82 | +## Features Demonstrated |
| 83 | + |
| 84 | +- Different output formats (markdown, JSON) |
| 85 | +- OCR capabilities with language support |
| 86 | +- OCR performance presets |
| 87 | +- Image extraction |
| 88 | +- Table extraction |
| 89 | +- Metadata handling |
| 90 | +- Error handling |
| 91 | +- Async/await usage |
| 92 | + |
| 93 | +## License |
| 94 | + |
| 95 | +Apache-2.0 |
0 commit comments