Skip to content

pcourbin-teaching/mock-data-generator

 
 

Differences with great original repository

All datas published to RabbitMQ are persistent + the queue must be durable.

Configuration (Docker)

  • MIN_VALUE - Define the minimum value generated by float or integer
  • MAX_VALUE - Define the maximum value generated by float or integer
  • SENZING_DATA_TEMPLATE - (Already possible on original repository, for documentation) Define a new template for the random JSON generator. For example : '{"SENSOR":"Temp1","DATE":"date_now", "VALUE":"float"}'.

For RabbitMQ:

  • SENZING_RABBITMQ_EXCHANGE - Define the RabbitMQ Exchange to which you want to send messages.
  • SENZING_RABBITMQ_ROUTINGKEY - Define the RoutingKey used to send messages.
Cases (see details in logs):
  • If you defined a SENZING_RABBITMQ_EXCHANGE:
    • If you do not define a SENZING_RABBITMQ_ROUTINGKEY, the messages will be send to your Exchange, with an empty RoutingKey ('')
    • If you define a SENZING_RABBITMQ_ROUTINGKEY, the messages will be send to your Exchange with your RoutingKey
  • If you did not define a SENZING_RABBITMQ_EXCHANGE but you defined a SENZING_RABBITMQ_QUEUE:
    • A durable Queue named SENZING_RABBITMQ_QUEUE will be created, if it does not exist.
    • Messages will be send to the empty Exchange ('') with RoutingKey named with your Queue name. So messages will arrived in your (new) defined Queue.

Type of variables

From the original repository

  • address_city
  • address_state
  • address_street
  • address_zipcode
  • date_of_birth - Between 1950 and 2018
  • first_name
  • last_name
  • gender

New in this repository

  • date_now -- Current timestamp
  • integer -- Integer value between MIN_VALUE and MAX_VALUE (see above)
  • float -- Float value between MIN_VALUE and MAX_VALUE (see above)

mock-data-generator

Overview

The mock-data-generator.py python script produces mock data for Senzing. The senzing/mock-data-generator docker image produces mock data for Senzing for use in docker formations (e.g. docker-compose, kubernetes).

mock-data-generator.py has a number of subcommands for performing different types of Senzing mock data creation.

To see all of the subcommands, run:

$ ./mock-data-generator.py --help
usage: mock-data-generator.py [-h]
                              {version,random-to-stdout,random-to-kafka,url-to-stdout,url-to-kafka}
                              ...

Generate mock data from a URL-addressable file or templated random data. For
more information, see https://github.com/Senzing/mock-data-generator

positional arguments:
  {version,random-to-stdout,random-to-kafka,url-to-stdout,url-to-kafka}
                        Subcommands (SENZING_SUBCOMMAND):
    version             Print version of mock-data-generator.py.
    random-to-stdout    Send random data to STDOUT
    random-to-kafka     Send random data to Kafka
    random-to-rabbitmq  Send random data to RabbitMQ
    url-to-stdout       Send HTTP or file data to STDOUT
    url-to-kafka        Send HTTP or file data to Kafka
    url-to-rabbitmq     Send HTTP or file data to RabbitMQ

optional arguments:
  -h, --help            show this help message and exit

To see the options for a subcommand, run commands like:

./mock-data-generator.py random-to-stdout --help

Contents

Using Command Line

Install

See Clone repository.

Install dependencies

  1. YUM installs - For Red Hat, CentOS, openSuse, and others.

    sudo xargs yum -y install < ${GIT_REPOSITORY_DIR}/src/yum-packages.txt
  2. APT installs - For Debian, Ubuntu, and others

    sudo xargs apt -y install < ${GIT_REPOSITORY_DIR}/src/apt-packages.txt
  3. PIP installs

    sudo pip install -r ${GIT_REPOSITORY_DIR}/requirements.txt

Demonstrate

  1. Show help. Example:

    cd ${GIT_REPOSITORY_DIR}
    ./mock-data-generator.py --help
    ./mock-data-generator.py random-to-stdout --help
  2. Show random file output. Example:

    cd ${GIT_REPOSITORY_DIR}
    ./mock-data-generator.py random-to-stdout
  3. Show random file output with 1 record per second. Example:

    cd ${GIT_REPOSITORY_DIR}
    ./mock-data-generator.py random-to-stdout \
      --records-per-second 1
  4. Show repeatable "random" output using random seed. Example:

    cd ${GIT_REPOSITORY_DIR}
    ./mock-data-generator.py random-to-stdout \
      --random-seed 1
  5. Show generating 10 (repeatable) random records at the rate of 2 per second. Example:

    cd ${GIT_REPOSITORY_DIR}
    ./mock-data-generator.py random-to-stdout \
      --random-seed 22 \
      --record-min 1 \
      --record-max 10 \
      --records-per-second 2
  6. Show sending output to a file of JSON-lines. Example:

    cd ${GIT_REPOSITORY_DIR}
    ./mock-data-generator.py random-to-stdout \
      --random-seed 22 \
      --record-min 1 \
      --record-max 10 \
      --records-per-second 2 \
      > output-file.jsonlines
  7. Show reading 5 records from URL-based file at the rate of 3 per second. Example:

    cd ${GIT_REPOSITORY_DIR}
    ./mock-data-generator.py url-to-stdout \
      --input-url https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json \
      --record-min 1 \
      --record-max 5 \
      --records-per-second 3

Using Docker

Expectations

Space

This repository and demonstration require 6 GB free disk space.

Time

Budget 40 minutes to get the demonstration up-and-running, depending on CPU and network speeds.

Background knowledge

This repository assumes a working knowledge of:

  1. Docker

Configuration

  • SENZING_DATA_SOURCE - If a JSON line does not have the DATA_SOURCE key/value, this value is inserted. No default.
  • SENZING_DEBUG - Enable debug information. Values: 0=no debug; 1=debug. Default: 0.
  • SENZING_ENTITY_TYPE - If a JSON line does not have the ENTITY_TYPE key/value, this value is inserted. No default.
  • SENZING_INPUT_URL - URL of source file. Default: https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json
  • SENZING_KAFKA_BOOTSTRAP_SERVER - Hostname and port of Kafka server. Default: "localhost"
  • SENZING_KAFKA_TOPIC - Kafka topic. Default: "senzing-kafka-topic"
  • SENZING_RABBITMQ_HOST - Host name of the RabbitMQ exchange. Default: "localhost:5672"
  • SENZING_RABBITMQ_PASSWORD - The password for the RabbitMQ queue. Default: "bitnami"
  • SENZING_RABBITMQ_QUEUE - Name of the RabbitMQ queue to create/connect with. Default: "senzing-rabbitmq-queue"
  • SENZING_RABBITMQ_USERNAME - The username for the RabbitMQ queue. Default: "user"
  • SENZING_RANDOM_SEED - Identify seed for random number generator. Value of 0 uses system clock. Values greater than 0 give repeatable results. Default: "0"
  • SENZING_RECORD_MAX - Identify highest record number to generate. Value of 0 means no maximum. Default: "0"
  • SENZING_RECORD_MIN - Identify lowest record number to generate. Default: "1"
  • SENZING_RECORD_MONITOR - Write a log record every N mock records. Default: "10000"
  • SENZING_RECORDS_PER_SECOND - Throttle output to a specified records per second. Value of 0 means no throttling. Default: "0"
  • SENZING_SUBCOMMAND - Identify the subcommand to be run. See mock-data-generator.py --help for complete list. No default.
  1. To determine which configuration parameters are use for each <subcommand>, run:

    ./mock-data-generator.py <subcommand> --help

Run docker container

Demonstrate random to STDOUT

  1. ✏️ Set environment variables. Example:

    export SENZING_SUBCOMMAND=random-to-stdout
    export SENZING_RANDOM_SEED=0
    export SENZING_RECORD_MAX=10
    export SENZING_RECORD_MIN=1
    export SENZING_RECORDS_PER_SECOND=0
  2. Run the docker container. Example:

    sudo docker run \
      --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \
      --env SENZING_RANDOM_SEED="${SENZING_RANDOM_SEED}" \
      --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \
      --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \
      --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \
      --interactive \
      --rm \
      --tty \
      senzing/mock-data-generator

Demonstrate random to Kafka

  1. ✏️ Determine docker network. Example:

    sudo docker network ls
    
    # Choose value from NAME column of docker network ls
    export SENZING_NETWORK=nameofthe_network
  2. ✏️ Set environment variables. Example:

    export SENZING_SUBCOMMAND=random-to-kafka
    
    export SENZING_KAFKA_BOOTSTRAP_SERVER=senzing-kafka:9092
    export SENZING_KAFKA_TOPIC="senzing-kafka-topic"
    export SENZING_NETWORK=senzingdockercomposestreamloaderdemo_backend
    export SENZING_RANDOM_SEED=1
    export SENZING_RECORD_MAX=220
    export SENZING_RECORD_MIN=210
    export SENZING_RECORDS_PER_SECOND=1
  3. Run the docker container. Example:

    sudo docker run \
      --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \
      --env SENZING_KAFKA_BOOTSTRAP_SERVER=${SENZING_KAFKA_BOOTSTRAP_SERVER} \
      --env SENZING_KAFKA_TOPIC=${SENZING_KAFKA_TOPIC} \
      --env SENZING_RANDOM_SEED="${SENZING_RANDOM_SEED}" \
      --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \
      --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \
      --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \
      --interactive \
      --net ${SENZING_NETWORK} \
      --rm \
      --tty \
      senzing/mock-data-generator

Demonstrate URL to STDOUT

  1. ✏️ Set environment variables. Example:

    export SENZING_SUBCOMMAND=url-to-stdout
    
    export SENZING_INPUT_URL=https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json
    export SENZING_RECORD_MAX=250
    export SENZING_RECORD_MIN=240
    export SENZING_RECORDS_PER_SECOND=0
  2. Run the docker container. Example:

    sudo docker run \
      --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \
      --env SENZING_INPUT_URL=${SENZING_INPUT_URL} \
      --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \
      --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \
      --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \
      --interactive \
      --rm \
      --tty \
      senzing/mock-data-generator

Demonstrate URL to Kafka

  1. ✏️ Determine docker network. Example:

    sudo docker network ls
    
    # Choose value from NAME column of docker network ls
    export SENZING_NETWORK=nameofthe_network
  2. ✏️ Set environment variables. Example:

    export SENZING_SUBCOMMAND=url-to-kafka
    
    export SENZING_INPUT_URL=https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json
    export SENZING_KAFKA_BOOTSTRAP_SERVER=senzing-kafka:9092
    export SENZING_KAFKA_TOPIC="senzing-kafka-topic"
    export SENZING_NETWORK=senzingdockercomposestreamloaderdemo_backend
    export SENZING_RECORD_MAX=300
    export SENZING_RECORD_MIN=260
    export SENZING_RECORD_MONITOR=10
    export SENZING_RECORDS_PER_SECOND=10
  3. Run the docker container. Example:

    sudo docker run \
      --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \
      --env SENZING_INPUT_URL=${SENZING_INPUT_URL} \
      --env SENZING_KAFKA_BOOTSTRAP_SERVER=${SENZING_KAFKA_BOOTSTRAP_SERVER} \
      --env SENZING_KAFKA_TOPIC=${SENZING_KAFKA_TOPIC} \
      --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \
      --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \
      --env SENZING_RECORD_MONITOR="${SENZING_RECORD_MONITOR}" \
      --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \
      --interactive \
      --net ${SENZING_NETWORK} \
      --rm \
      --tty \
      senzing/mock-data-generator

Develop

Prerequisite software

The following software programs need to be installed:

  1. git
  2. make
  3. docker

Clone repository

  1. Set these environment variable values:

    export GIT_ACCOUNT=senzing
    export GIT_REPOSITORY=mock-data-generator
  2. Follow steps in clone-repository to install the Git repository.

  3. After the repository has been cloned, be sure the following are set:

    export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git
    export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}"

Build docker image for development

  1. Option #1 - Using docker command and GitHub.

    sudo docker build --tag senzing/mock-data-generator https://github.com/senzing/mock-data-generator.git
  2. Option #2 - Using docker command and local repository.

    cd ${GIT_REPOSITORY_DIR}
    sudo docker build --tag senzing/mock-data-generator .
  3. Option #3 - Using make command.

    cd ${GIT_REPOSITORY_DIR}
    sudo make docker-build

Examples

  1. Examples of use:
    1. docker-compose-stream-loader-kafka-demo
    2. kubernetes-demo
    3. rancher-demo

Errors

  1. See doc/errors.md.

About

Python tool for generating mock Senzing data and sending it to Kafka, RabbitMQ, or STDOUT.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 94.5%
  • Makefile 2.8%
  • Dockerfile 2.6%
  • Shell 0.1%