This project is a highly efficient web scraper built using Python in a Jupyter Notebook. It extracts product data from the Sigma website with optimal performance, clean code structure, and minimal resource usage โ making it a solid choice for scalable and customizable scraping tasks.
โ
Fast Page Parsing โ Uses requests
for HTTP calls and BeautifulSoup
for lightweight HTML parsing.
โ
Optimized Looping โ Efficiently loops through pages and products without redundancy.
โ
Clean Data Handling โ Stores scraped data in a structured pandas
DataFrame.
โ
Easy to Modify โ Well-commented, modular code allows easy customization (e.g., for different product categories).
โ
Export Ready โ Outputs data directly to a CSV file for further analysis or integration.
The notebook extracts the following for each product:
- Product Name
- Product Link
- Product Description
requests
โ for sending HTTP requestsBeautifulSoup
โ for HTML parsingpandas
โ for tabular data processingcsv
โ for exporting the results
- Send HTTP Request to the target Sigma product category page.
- Parse HTML Content using BeautifulSoup.
- Loop Through Product Listings to extract name, link, and description.
- Store Data in a pandas DataFrame.
- Export Results to a CSV file.
- Open the notebook:
scraping_sigma_website.ipynb
- Run all cells in order.
- A CSV file named
sigma_products.csv
will be generated with the results.
Hereโs a sample of the actual output:
Product Name | Product Link | Product Description |
---|---|---|
Aldrichยฎ Chemistry | https://www.sigmaaldrich.com/US/en/products/aldrich | Chemistry Products |
Supelcoยฎ Analytical | https://www.sigmaaldrich.com/US/en/products/supelco | Analytical Products |
โ The output is clean, structured, and ready for use in analysis or applications.
This scraper is for educational purposes only. Please ensure you are authorized to scrape content from the Sigma website and always respect their robots.txt and terms of use.
Open for academic and non-commercial use under the MIT License.