- 
                Notifications
    
You must be signed in to change notification settings  - Fork 300
 
Open
Description
I have problems running the following commands in python:
import ember
ember.create_vectorized_features("/data/ember2018/")I have installed the dependencies and tried on docker with leif versions 0.9.0, 0.10.1 and i still get the same failure:
ember.create_vectorized_features("./ember/")
Vectorizing training set
  0%|                                                                                    | 0/900000 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 44, in vectorize_unpack
    return vectorize(*args)
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 31, in vectorize
    feature_vector = extractor.process_raw_features(raw_features)
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in process_raw_features
    feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 552, in <listcomp>
    feature_vectors = [fe.process_raw_features(raw_obj[fe.name]) for fe in self.features]
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/features.py", line 192, in process_raw_features
    entry_name_hashed = FeatureHasher(50, input_type="string").transform([raw_obj['entry']]).toarray()[0]
  File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/sklearn/feature_extraction/_hash.py", line 170, in transform
    raise ValueError(
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 75, in create_vectorized_features
  File "/opt/conda/lib/python3.8/site-packages/ember-0.1.0-py3.8.egg/ember/__init__.py", line 60, in vectorize_subset
  File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/opt/conda/lib/python3.8/multiprocessing/pool.py", line 868, in next
    raise value
ValueError: Samples can not be a single string. The input must be an iterable over iterables of strings.
>>>
I seems from the error msg, that the input is not the same format as expected in the vectorizor?
Any fix to this?
jensbirk, keremgirenes and itslucky333
Metadata
Metadata
Assignees
Labels
No labels