Skip to content

[In Submission] Calibration of Tactile Sensors for High Resolution Stress Tensor and Deformation for Dexterous Manipulation

Notifications You must be signed in to change notification settings

armlabstanford/tensor-touch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

image TensorTouch

Calibration of Tactile Sensors for High Resolution Stress Tensor and Deformation for Dexterous Manipulation

Stanford University | University of Pennsylvania

Project ArXiv

TO-DO (Release dates are at the latest -- we aim to release as soon as possible!)

  • Release model on PyTorch Hub (July 17th, 2025)
  • Release model inference (Sept 16, 2025)
  • Release model training code (September 30th, 2025)
  • Release the datasets (September 30th, 2025)
  • Release the data collection code (October 15th, 2025)
  • Release FEM simulation pipeline code (October 15th, 2025)

Model Inference

Use Torch Hub to load our models! It's easy!

We have released the training code, but have yet to release the datasets, which will be released next week.

pip install torch torchvision yacs timm matplotlib
>>> import torch
>>> model = torch.hub.load('peasant98/DenseTact-Model', 'hiera', pretrained=True, map_location='cpu', trust_repo=True)
>>> model = model.cuda()

We have a demo to run on sample images:

We also provide steps for running the encoder, which can be found in the file!

python3 model/test_hub.py

Model Lessons

We detail some lessons about training these kinds of models for some insights.

  • With an optimized training paradigm for different architectures, sometimes certain architectures are just straight up better. We found that a compact hierarchical ViT "Hiera" was exceedingly better than the other models. Thanks SAM2 for the inspiration!

  • Pretraining gets you very far for ViTs. ViTs become slightly overrated when you don't pretrain them compared to the tried and true Resnets of the world. The attention operation is quite expensive (On^2), warranting the use of patches (16 by 16) for ViTs. There are optimizations that can be made with attention (flash or deformable attention), but we didn't get to them.

  • Don't get fancy with optimizers and learning rates; if you are trying to get your dense prediction model to work with tiny adjustments to the learning rate, you should look into things like dataset/architecture/etc.

  • Sparse prediction in vision is pretty vicious compared to dense prediction. No one seems to have "won" sparse prediction, but it looks like dense prediction scales nicely with 1. a simple architecture with a simple loss function (L1), 2. good and curated data, and 3. a massive amount of that data.

  • More quality, diverse data has a bigger effect than you would think for this kind of training. Models like Pi3 easily clear VGGT probably because they absolutely sent it with dynamic data.

About

[In Submission] Calibration of Tactile Sensors for High Resolution Stress Tensor and Deformation for Dexterous Manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •