All datas published to RabbitMQ are persistent + the queue must be durable.
- MIN_VALUE -
Define the minimum value generated by
floatorinteger - MAX_VALUE -
Define the maximum value generated by
floatorinteger - SENZING_DATA_TEMPLATE - (Already possible on original repository, for documentation) Define a new template for the random JSON generator. For example : '{"SENSOR":"Temp1","DATE":"date_now", "VALUE":"float"}'.
- SENZING_RABBITMQ_EXCHANGE - Define the RabbitMQ Exchange to which you want to send messages.
- SENZING_RABBITMQ_ROUTINGKEY - Define the RoutingKey used to send messages.
- If you defined a SENZING_RABBITMQ_EXCHANGE:
- If you do not define a SENZING_RABBITMQ_ROUTINGKEY, the messages will be send to your Exchange, with an empty RoutingKey (
'') - If you define a SENZING_RABBITMQ_ROUTINGKEY, the messages will be send to your Exchange with your RoutingKey
- If you do not define a SENZING_RABBITMQ_ROUTINGKEY, the messages will be send to your Exchange, with an empty RoutingKey (
- If you did not define a SENZING_RABBITMQ_EXCHANGE but you defined a SENZING_RABBITMQ_QUEUE:
- A durable Queue named SENZING_RABBITMQ_QUEUE will be created, if it does not exist.
- Messages will be send to the empty Exchange (
'') with RoutingKey named with your Queue name. So messages will arrived in your (new) defined Queue.
- address_city
- address_state
- address_street
- address_zipcode
- date_of_birth - Between 1950 and 2018
- first_name
- last_name
- gender
- date_now -- Current timestamp
- integer -- Integer value between MIN_VALUE and MAX_VALUE (see above)
- float -- Float value between MIN_VALUE and MAX_VALUE (see above)
The mock-data-generator.py python script produces mock data for Senzing.
The senzing/mock-data-generator docker image produces mock data for Senzing for use in
docker formations (e.g. docker-compose, kubernetes).
mock-data-generator.py has a number of subcommands for performing different types of Senzing mock data creation.
To see all of the subcommands, run:
$ ./mock-data-generator.py --help
usage: mock-data-generator.py [-h]
{version,random-to-stdout,random-to-kafka,url-to-stdout,url-to-kafka}
...
Generate mock data from a URL-addressable file or templated random data. For
more information, see https://github.com/Senzing/mock-data-generator
positional arguments:
{version,random-to-stdout,random-to-kafka,url-to-stdout,url-to-kafka}
Subcommands (SENZING_SUBCOMMAND):
version Print version of mock-data-generator.py.
random-to-stdout Send random data to STDOUT
random-to-kafka Send random data to Kafka
random-to-rabbitmq Send random data to RabbitMQ
url-to-stdout Send HTTP or file data to STDOUT
url-to-kafka Send HTTP or file data to Kafka
url-to-rabbitmq Send HTTP or file data to RabbitMQ
optional arguments:
-h, --help show this help message and exitTo see the options for a subcommand, run commands like:
./mock-data-generator.py random-to-stdout --helpSee Clone repository.
-
YUM installs - For Red Hat, CentOS, openSuse, and others.
sudo xargs yum -y install < ${GIT_REPOSITORY_DIR}/src/yum-packages.txt -
APT installs - For Debian, Ubuntu, and others
sudo xargs apt -y install < ${GIT_REPOSITORY_DIR}/src/apt-packages.txt -
PIP installs
sudo pip install -r ${GIT_REPOSITORY_DIR}/requirements.txt
-
Show help. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py --help ./mock-data-generator.py random-to-stdout --help
-
Show random file output. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout
-
Show random file output with 1 record per second. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --records-per-second 1
-
Show repeatable "random" output using random seed. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --random-seed 1
-
Show generating 10 (repeatable) random records at the rate of 2 per second. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --random-seed 22 \ --record-min 1 \ --record-max 10 \ --records-per-second 2
-
Show sending output to a file of JSON-lines. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --random-seed 22 \ --record-min 1 \ --record-max 10 \ --records-per-second 2 \ > output-file.jsonlines
-
Show reading 5 records from URL-based file at the rate of 3 per second. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py url-to-stdout \ --input-url https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json \ --record-min 1 \ --record-max 5 \ --records-per-second 3
This repository and demonstration require 6 GB free disk space.
Budget 40 minutes to get the demonstration up-and-running, depending on CPU and network speeds.
This repository assumes a working knowledge of:
- SENZING_DATA_SOURCE -
If a JSON line does not have the
DATA_SOURCEkey/value, this value is inserted. No default. - SENZING_DEBUG - Enable debug information. Values: 0=no debug; 1=debug. Default: 0.
- SENZING_ENTITY_TYPE -
If a JSON line does not have the
ENTITY_TYPEkey/value, this value is inserted. No default. - SENZING_INPUT_URL - URL of source file. Default: https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json
- SENZING_KAFKA_BOOTSTRAP_SERVER - Hostname and port of Kafka server. Default: "localhost"
- SENZING_KAFKA_TOPIC - Kafka topic. Default: "senzing-kafka-topic"
- SENZING_RABBITMQ_HOST - Host name of the RabbitMQ exchange. Default: "localhost:5672"
- SENZING_RABBITMQ_PASSWORD - The password for the RabbitMQ queue. Default: "bitnami"
- SENZING_RABBITMQ_QUEUE - Name of the RabbitMQ queue to create/connect with. Default: "senzing-rabbitmq-queue"
- SENZING_RABBITMQ_USERNAME - The username for the RabbitMQ queue. Default: "user"
- SENZING_RANDOM_SEED - Identify seed for random number generator. Value of 0 uses system clock. Values greater than 0 give repeatable results. Default: "0"
- SENZING_RECORD_MAX - Identify highest record number to generate. Value of 0 means no maximum. Default: "0"
- SENZING_RECORD_MIN - Identify lowest record number to generate. Default: "1"
- SENZING_RECORD_MONITOR - Write a log record every N mock records. Default: "10000"
- SENZING_RECORDS_PER_SECOND - Throttle output to a specified records per second. Value of 0 means no throttling. Default: "0"
- SENZING_SUBCOMMAND -
Identify the subcommand to be run. See
mock-data-generator.py --helpfor complete list. No default.
-
To determine which configuration parameters are use for each
<subcommand>, run:./mock-data-generator.py <subcommand> --help
-
✏️ Set environment variables. Example:
export SENZING_SUBCOMMAND=random-to-stdout export SENZING_RANDOM_SEED=0 export SENZING_RECORD_MAX=10 export SENZING_RECORD_MIN=1 export SENZING_RECORDS_PER_SECOND=0
-
Run the docker container. Example:
sudo docker run \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_RANDOM_SEED="${SENZING_RANDOM_SEED}" \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ --interactive \ --rm \ --tty \ senzing/mock-data-generator
-
✏️ Determine docker network. Example:
sudo docker network ls # Choose value from NAME column of docker network ls export SENZING_NETWORK=nameofthe_network
-
✏️ Set environment variables. Example:
export SENZING_SUBCOMMAND=random-to-kafka export SENZING_KAFKA_BOOTSTRAP_SERVER=senzing-kafka:9092 export SENZING_KAFKA_TOPIC="senzing-kafka-topic" export SENZING_NETWORK=senzingdockercomposestreamloaderdemo_backend export SENZING_RANDOM_SEED=1 export SENZING_RECORD_MAX=220 export SENZING_RECORD_MIN=210 export SENZING_RECORDS_PER_SECOND=1
-
Run the docker container. Example:
sudo docker run \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_KAFKA_BOOTSTRAP_SERVER=${SENZING_KAFKA_BOOTSTRAP_SERVER} \ --env SENZING_KAFKA_TOPIC=${SENZING_KAFKA_TOPIC} \ --env SENZING_RANDOM_SEED="${SENZING_RANDOM_SEED}" \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ --interactive \ --net ${SENZING_NETWORK} \ --rm \ --tty \ senzing/mock-data-generator
-
✏️ Set environment variables. Example:
export SENZING_SUBCOMMAND=url-to-stdout export SENZING_INPUT_URL=https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json export SENZING_RECORD_MAX=250 export SENZING_RECORD_MIN=240 export SENZING_RECORDS_PER_SECOND=0
-
Run the docker container. Example:
sudo docker run \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_INPUT_URL=${SENZING_INPUT_URL} \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ --interactive \ --rm \ --tty \ senzing/mock-data-generator
-
✏️ Determine docker network. Example:
sudo docker network ls # Choose value from NAME column of docker network ls export SENZING_NETWORK=nameofthe_network
-
✏️ Set environment variables. Example:
export SENZING_SUBCOMMAND=url-to-kafka export SENZING_INPUT_URL=https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json export SENZING_KAFKA_BOOTSTRAP_SERVER=senzing-kafka:9092 export SENZING_KAFKA_TOPIC="senzing-kafka-topic" export SENZING_NETWORK=senzingdockercomposestreamloaderdemo_backend export SENZING_RECORD_MAX=300 export SENZING_RECORD_MIN=260 export SENZING_RECORD_MONITOR=10 export SENZING_RECORDS_PER_SECOND=10
-
Run the docker container. Example:
sudo docker run \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_INPUT_URL=${SENZING_INPUT_URL} \ --env SENZING_KAFKA_BOOTSTRAP_SERVER=${SENZING_KAFKA_BOOTSTRAP_SERVER} \ --env SENZING_KAFKA_TOPIC=${SENZING_KAFKA_TOPIC} \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORD_MONITOR="${SENZING_RECORD_MONITOR}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ --interactive \ --net ${SENZING_NETWORK} \ --rm \ --tty \ senzing/mock-data-generator
The following software programs need to be installed:
-
Set these environment variable values:
export GIT_ACCOUNT=senzing export GIT_REPOSITORY=mock-data-generator
-
Follow steps in clone-repository to install the Git repository.
-
After the repository has been cloned, be sure the following are set:
export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}"
-
Option #1 - Using docker command and GitHub.
sudo docker build --tag senzing/mock-data-generator https://github.com/senzing/mock-data-generator.git -
Option #2 - Using docker command and local repository.
cd ${GIT_REPOSITORY_DIR} sudo docker build --tag senzing/mock-data-generator .
-
Option #3 - Using make command.
cd ${GIT_REPOSITORY_DIR} sudo make docker-build
- Examples of use:
- See doc/errors.md.