GitHub

IceCat

IceCat module pulls down a local copy of data from the http://icecat.biz/ open catalog. The module requires login credentials to the IceCat website. The basic catalog version if free with 500k products. The full catalog contains ~3mln products and distrubuted with a paid subscription.

Requirements

python 3.3 or above, (64-bit for full catalog import)
requests, urlib3, xml2dict, progressbar2 libraries.
see requirements.txt in the source distribution for details

Features

For each product category id, manufacturer id are resolved to their actual names.
Product detail data can be added to the daily and full index, with flexible data fields
English language data import
The output is a flat JSON file (nested lists are flattened)
Fast parallel download of the product xml files with threads
Source data files are preserved in the filesystem for reference
Flexible XML field mapping
Tested against live IceCat web API

Basic usage:

from IceCat import IceCat

# setup temp data directory, output file name, auth info
data_dir = '_daily_test_data/'
auth = ('icat_user', 'icat_passwd')
output_file = 'daily.json'

# specify additional product detail keys
detail_keys=['ProductDescription[@LongDesc]',
			'ShortSummaryDescription',
			'LongSummaryDescription',
			'ProductDescription[@ShortDesc]']

# create the catalog instance. 
# this will pull reference files, and the daily produc index file
catalog = IceCat.IceCatCatalog(data_dir=data_dir, auth=auth)

# add product details
# this will download and parse individual product XML for 
# every item listed in the daily file
catalog.add_product_details_parallel(keys=detail_keys,connections=100)

# save the results to a JSON file
catalog.dump_to_file(output_file)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
IceCat		IceCat
_test_data		_test_data
docs		docs
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IceCat

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

moonlitesolutions/pyIceCat

Folders and files

Latest commit

History

Repository files navigation

IceCat

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages