This model is one of the first, if not the first, to generate synthetic data in associative models.
I give an example of a transaction table in subject-verb-object format. For example, the subject could be a customer or a company, the verb could be an action like “purchase”, “payment”, “return”, and the object could be a product or service. There are also optional date and amount columns, and you can input three mandatory columns or with optional columns
You can learn more about associative data models here.
A key advantage of the model: It preserves associative relationships (e.g., “users who buy X often take Y”), which makes synthetic data not just random, but practically applicable to tasks where hidden patterns are important. This distinguishes it from simply generating random numbers or context-free data.
Examples of use:
- Create realistic but secure data for testing transactional applications.
- Generating additional data to train models when real data is insufficient or unbalanced.
- Replacing real data with synthetic data to comply with GDPR, HIPAA and other regulations.
- Simulation of rare or extreme scenarios for risk prediction.
- Create visual examples to train employees or customers.
- Generation of realistic game scenarios related to economics.
- Creating training data for models dealing with textual transactions.
Dependencies:
pip install sdv pandas numpy
Further plans to improve the model, make it better, write GUI etc