Resolve Issue #204: Complete TFRecord Generation Solution for Learning to Simulate #646

nsrawat0333 · 2025-08-10T20:39:52Z

🔬 Resolves Issue #204: TFRecord Generation for Learning to Simulate

This PR provides a comprehensive solution to the long-standing Issue #204, addressing the 3+ year conversation thread about generating train.tfrecord files for custom physics simulations.

❓ Original Questions Resolved:

@cwl1999: "Can you provide the generated data train.tfrecord Source dataset file? When I forcibly open it, I can only see the garbled code."
- ✅ SOLVED with complete generation pipeline and human-readable reader
@yours612: "How are vel_mean, vel_std, acc_mean, and acc_std calculated in metadata?"
- ✅ SOLVED with detailed implementation matching paper methodology
@Social-Mean: "How can I create such a test.tfrecord file?"
- ✅ SOLVED with complete workflow and cloth simulation examples
@yq60523: Multiple questions about step_context, statistics computation, and error accumulation
- ✅ SOLVED with comprehensive documentation addressing all aspects

📦 Solution Components:

1. `generate_tfrecord_dataset.py` (500+ lines)

Complete TFRecord generation from simulation data
Automatic statistics calculation (vel_mean, vel_std, acc_mean, acc_std)
Sample cloth dataset creation for testing
Support for step_context (global features)
Proper binary encoding matching Learning to Simulate format

2. `tfrecord_reader_example.py` (300+ lines)

Human-readable TFRecord content inspection
Solves the "garbled code" issue (TFRecords are binary format)
Raw parsing demonstration and debugging tools
Statistics verification against metadata

3. `TFRECORD_GENERATION_GUIDE.md` (2,000+ words)

Comprehensive documentation explaining TFRecord format
Step-by-step workflow from simulation data to trained model
Statistics calculation methodology with code examples
Troubleshooting guide for common issues

4. `requirements-tfrecord.txt` + `ISSUE_204_SOLUTION.md`

Dependencies specification
Complete solution summary for GitHub issue

🔬 Key Technical Contributions:

TFRecord Format Explanation:

tf.train.SequenceExample {
    context: {                    # Static per trajectory
        'key': trajectory_id,
        'particle_type': bytes   # [N_particles] 
    },
    feature_lists: {             # Time-varying
        'position': [bytes, ...], # [time_steps][N_particles, dims]
        'step_context': [bytes, ...] # [time_steps][context_dims]
    }
}

- Update aiohttp to address potential security vulnerabilities - Maintains compatibility with existing codebase - Addresses dependency security recommendations

…-deepmind#588 - Create download_polygen_models.py script for automated model downloading - Add comprehensive documentation for pre-trained model access - Provide multiple download methods (Python script, gsutil, wget) - Add troubleshooting section addressing Issue google-deepmind#588 confusion - Create requirements-download.txt for download dependencies Addresses Issue google-deepmind#588: 'where is the face_model.tar and the vertices_model.tar' The issue was caused by: 1. Unclear documentation about model file locations 2. Confusion about file names (face_model.tar.gz vs face_model.tar) 3. No clear download instructions outside of Colab environment 4. Missing troubleshooting guidance Solutions provided: 1. Python download script with progress bars and verification 2. Clear documentation of all download methods 3. Correct file names and locations specified 4. Comprehensive troubleshooting section 5. Multiple fallback options for different environments Users can now easily access PolyGen pre-trained models using: - Automated Python script (recommended) - Manual gsutil commands - Direct HTTP downloads - Built-in verification and error handling

- Fixed URL construction in download_dataset.sh to prevent double slashes - Added comprehensive Python download script with progress tracking - Enhanced error handling and validation for dataset downloads - Updated README with alternative download methods and troubleshooting - Added requirements-download.txt for download dependencies Key improvements: Proper URL construction: Fixed BASE_URL to avoid double slash issue Python downloader: Cross-platform solution with progress bars Error handling: Clear error messages for 404 and network issues Dataset validation: Verify all required files are present User experience: List datasets, verify downloads, detailed progress Addresses Issue google-deepmind#596 where users reported 404 errors when downloading MeshGraphNet datasets. Multiple users confirmed this issue affecting research reproducibility. Files changed: - meshgraphnets/download_dataset.sh: Fixed URL construction and added validation - meshgraphnets/download_meshgraphnet_datasets.py: New Python download tool - meshgraphnets/README.md: Updated with alternative download methods - meshgraphnets/requirements-download.txt: Download dependencies

- Fixed broken S3 download URLs in scripts/download.sh - Added comprehensive Python download script with progress tracking - Enhanced error handling and dataset verification - Updated README with alternative download methods and troubleshooting - Added requirements-download.txt for download dependencies Key improvements: Working URLs: Replaced broken S3 amazonaws URLs with working wikitext.smerity.com URLs Python downloader: Cross-platform solution with progress bars and error handling Dataset verification: Ensure all required files are present and valid Modular downloads: Download WikiText-103 and Freebase separately or together User experience: Clear error messages, progress tracking, automatic verification Root cause analysis: The original script used S3 URLs (https://s3.amazonaws.com/research.metamind.io/wikitext/) which are no longer accessible, causing 404 errors and missing wiki.train.tokens files. Fixed by using alternative working URLs from wikitext.smerity.com. Addresses Issue google-deepmind#575 where PhD student reported FileNotFoundError: '/tmp/data/wikitext-103/wiki.train.tokens' blocking research work. Files changed: - wikigraphs/scripts/download.sh: Fixed S3 URLs to working alternatives - wikigraphs/scripts/download_wikigraphs_datasets.py: New Python download tool - wikigraphs/README.md: Updated with alternative download methods - wikigraphs/requirements-download.txt: Download dependencies Credit: Solution inspired by pgemos/deepmind-research fork with working URLs.

…ixes google-deepmind#569 - Added airfoil dataset to download script (addresses Issue google-deepmind#569) - Created comprehensive DATASETS.md guide with all dataset information - Updated README.md with complete dataset listing and download methods - Enhanced dataset descriptions with research applications and use cases Key improvements: Airfoil dataset access: Added missing 'airfoil' dataset to available downloads Comprehensive documentation: Complete guide covering all 10 MeshGraphNets datasets Research context: Detailed descriptions for each dataset with CFD, cloth, and structural categories Usage examples: Training commands, evaluation, and visualization for each dataset type Troubleshooting: Common issues, download sizes, and solution guidance Dataset categories added: - Fluid Dynamics (CFD): airfoil, cylinder_flow - Cloth/Structural Dynamics: flag_simple, flag_minimal, flag_dynamic, flag_dynamic_sizing - Structural Mechanics: deforming_plate, sphere_simple, sphere_dynamic, sphere_dynamic_sizing Addresses Issue google-deepmind#569 where user (MatthewRajan-WA) requested access to AirFoil Steady State dataset mentioned in MeshGraphNets paper for research purposes. Files changed: - meshgraphnets/download_meshgraphnet_datasets.py: Added airfoil dataset option - meshgraphnets/DATASETS.md: New comprehensive dataset guide - meshgraphnets/README.md: Enhanced with complete dataset information Impact: Enables researchers to access all MeshGraphNets datasets for CFD, cloth simulation, and structural mechanics research as referenced in the original paper.

…e Remeshing Explanation - Add detailed technical explanation of adaptive remeshing mechanics - Address core questions about node count changes during remeshing - Explain training procedures with variable mesh topology - Demonstrate ground truth interpolation for loss computation - Include working Python demo showing concepts in action - Provide mathematical formulations for loss with topology changes - Show how SIZE node type enables sizing field prediction - Complete solution addressing all confusion in Issue google-deepmind#519 Files added: - ADAPTIVE_REMESHING_EXPLAINED.md: Comprehensive technical documentation - remeshing_demo.py: Working demonstration script - ISSUE_519_SOLUTION.md: GitHub issue response This resolves the research community's questions about: 1. Node count changes during remeshing (YES, they change) 2. Remeshing during training (YES, for *_sizing datasets) 3. Loss computation with variable topology (ground truth interpolation) 4. Implementation details and mathematical formulations

@cwl1999

…ion for Learning to Simulate - Add comprehensive TFRecord dataset generation script (500+ lines) - Include TFRecord reader tool for debugging garbled code issue - Provide detailed documentation with format explanation - Address all questions from long GitHub conversation thread - Enable custom cloth simulation dataset creation - Implement proper statistics calculation (vel_mean, vel_std, acc_mean, acc_std) - Support step_context for global features - Include sample dataset generation for testing Files added: - generate_tfrecord_dataset.py: Complete generation pipeline - tfrecord_reader_example.py: Human-readable TFRecord inspection - TFRECORD_GENERATION_GUIDE.md: Comprehensive documentation (2000+ words) - ISSUE_204_SOLUTION.md: GitHub response summary - requirements-tfrecord.txt: Dependencies specification Key technical contributions: 1. Solves 'garbled code' issue - TFRecord files are binary format 2. Provides statistics calculation matching paper methodology 3. Enables custom physics simulation dataset creation 4. Addresses error accumulation and step_context questions 5. Complete workflow from simulation data to trained model This resolves all questions from the extensive conversation between @cwl1999, @alvarosg, @Social-Mean, @oasis-asu, @yours612, @yq60523 spanning 3+ years of discussion about TFRecord generation.

polarbe · 2025-08-10T20:40:27Z

Dooray! Failure Notice Failure Notice Your message sent to ***@***.***) has failed to be delivered. Please refer to the below for details. * Recipient : ***@***.***) * Sent time : 2025-08-11T05:40:20 * Subject : [google-deepmind/deepmind-research] Resolve Issue #204: Complete TFRecord Generation Solution for Learning to Simulate (PR #646) * Remote host said : Your mail was denied from the receiver. This message was sent from a notification-only address that cannot accept incoming email. For more information, please contact ***@***.*** © Dooray!.

nsrawat0333 added 7 commits August 10, 2025 23:01

Bump aiohttp from 3.6.2 to 3.12.14 in gated_linear_networks

8b14f1d

- Update aiohttp to address potential security vulnerabilities - Maintains compatibility with existing codebase - Addresses dependency security recommendations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resolve Issue #204: Complete TFRecord Generation Solution for Learning to Simulate #646

Resolve Issue #204: Complete TFRecord Generation Solution for Learning to Simulate #646

Uh oh!

nsrawat0333 commented Aug 10, 2025

Uh oh!

polarbe commented Aug 10, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Resolve Issue #204: Complete TFRecord Generation Solution for Learning to Simulate #646

Are you sure you want to change the base?

Resolve Issue #204: Complete TFRecord Generation Solution for Learning to Simulate #646

Uh oh!

Conversation

nsrawat0333 commented Aug 10, 2025

🔬 Resolves Issue #204: TFRecord Generation for Learning to Simulate

❓ Original Questions Resolved:

📦 Solution Components:

1. generate_tfrecord_dataset.py (500+ lines)

2. tfrecord_reader_example.py (300+ lines)

3. TFRECORD_GENERATION_GUIDE.md (2,000+ words)

4. requirements-tfrecord.txt + ISSUE_204_SOLUTION.md

🔬 Key Technical Contributions:

TFRecord Format Explanation:

Uh oh!

polarbe commented Aug 10, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `generate_tfrecord_dataset.py` (500+ lines)

2. `tfrecord_reader_example.py` (300+ lines)

3. `TFRECORD_GENERATION_GUIDE.md` (2,000+ words)

4. `requirements-tfrecord.txt` + `ISSUE_204_SOLUTION.md`