*Equal contribution.
This is a python implementation of the paper "GPT-Fabric-plus-plus: Improved Bimanual Fabric Manipulation via Pre-trained foundation models". This repository contains the code for fabric folding. All experiments have been simulated in SoftGym. The code for performing fabric smoothing can be found here.
Private repo for GPT-Fabric-plus-plus Folding
I strongly recommend the user to go through this wonderful blog written by Daniel Seita on setting up SoftGym.
-
Clone the repository. Run the command
conda env create -f environment.ymlto create thegptfab-foldingenvironment. -
If you are using Ubuntu 22.04 then run the following commands to recompile SoftGym
docker run \
-v PATH_TO_GPT_FABRIC:/workspace/GPT-Fabric-Smoothing \
-v PATH_TO_CONDA:PATH_TO_CONDA \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-e DISPLAY=$DISPLAY \
-e QT_X11_NO_MITSHM=1 \
-it xingyu/softgym:latest bash
If you are using other versions of Ubuntu then use the following commands:
nvidia-docker run \
-v PATH_TO_GPT_FABRIC:/workspace/GPT-Fabric-Smoothing \
-v PATH_TO_CONDA:PATH_TO_CONDA \
-v /tmp/.X11-unix:/tmp/.X11-unix \
--gpus all \
-e DISPLAY=$DISPLAY \
-e QT_X11_NO_MITSHM=1 \
-it xingyu/softgym:latest bash
- Following this you should enter the docker. Run the following commands inside the docker:
root@9ac1efa91ca9:/workspace# cd GPT-Fabric-Smoothing / root@9ac1efa91ca9:/workspace/GPT-Fabric-Smoothing# export PATH="PATH_TO_CONDA/bin:$PATH" root@9ac1efa91ca9:/workspace/GPT-Fabric-Smoothing# . ./prepare_1.0.sh (gptfab-smoothing) root@9ac1efa91ca9:/workspace/GPT-Fabric-Smoothing# . ./compile_1.0.sh
If the compilation is successful, you should see the following message at the end
[100%] Linking CXX shared module pyflex.cpython-38-x86_64-linux-gnu.so [100%] Built target pyflex
You can quit the docker by typing exit
- In the regular command line, use the following commands
conda activate gptfab-smoothing
export PYFLEXROOT=${PWD}/PyFlex
export PYTHONPATH=${PYFLEXROOT}/bindings/build:$PYTHONPATH
export LD_LIBRARY_PATH=${PYFLEXROOT}/external/SDL2-2.0.4/lib/x64:$LD_LIBRARY_PATH
A better way to do this is to add all these lines into a .sh file and then run the file. In this repository, this file is called prepapre_gpt.sh
(base) rajeshgayathri2003@cappuccino:~/GPT-Fabric-plus-plus$ . ./prepare_gpt.sh (gptfab-folding) rajeshgayathri2003@cappuccino:~/GPT-Fabric-plus-plus$ ```
Before we evaluate GPT-Fabric++ folding, we need to have the necessary sub-goal sequences and starting configurations.
- To be consistent with prior work, we use the initial evaluation configurations for square and rectangular shaped fabric used by Foldsformer. You can find these in
cached configs/ - The initial configurations can also be generated by running the following script
python generate_configs.py --num_cached 100 --cloth_type squareHerenum_cacheddenotes the number of configurations andcloth_typedenotes whether the given cloth is of square, rectangle or other shapes. - In addition to the sub-goal sequences used by GPT-Fabric, we also introduce bimanual manipulation. The sub-goal sequences used can be downloaded from here.
In order to evaluate the folds produced by GPT-Fabric++, we need some system that can produce expert folds. We compare the two results to find the mean particle distance error of the folds achieved. This expert system can be found in the Demonstrator directory and can be run using
python generate_demonstrations.py --gui --task DoubleTriangle --img_size 128 --cached square python generate_demonstrations.py --gui --task DoubleStraightBimanual --img_size 128 --cached rectangle python generate_demonstrations.py --gui --task AllCornersInward --img_size 128 --cached square python generate_demonstrations.py --gui --task CornersEdgesInwardBimanual --img_size 128 --cached square
where --task specifies the task name, --img_size specifies the image size captured by the camera in the simulator, and --cached specifies the filename of the cached configurations. You can remove --gui to run headless. These generated demonstrations will be saved in data/demonstrations.
Note that since the same folding task could be achieved in various ways for the same cloth configuration, we consider all the different possible final cloth configurations corresponding to such a successful cloth fold as per the expert driven heuristic (aka the Demonstrator).
For each cloth configuration, 0.png is the top-down image corresponding to the initial state. {step#}-{fold#}.png is the top-down image corresponding to the given step number {step#}for the given specifgic way of achieving the successful fold represented as {fold#}. The final cloth configuration will be saved as a pickle file given as info-{fold#}.pkl. To compute the mean particle position error (in mm) for evaluation, we consider the distances for all the possible final cloth configurations from the acheived final cloth configuration by GPT-Fabric and take the minimum of those.
We fine-tune GPT-4o to help the system generate better folding instructions. Here are some useful links that will guide you on fine-tuning GPT-4o
The .jsonl file that we used for fine-tuning in this paper is given in training_set.jsonl and validation_set.jsonl.
In case you wish to create your own jsonl files, download the dataset from here. This dataset is based on the evaluation images used by FabricFlowNet.
Convert the images into base64 encoding using the convert_single_arm.js and convert_bimanual.js files. The output will be saved as unimanual.json and bimanual.json. In this case, the language instruction for each fold needs to be manually editted. Split the dataset in the ratio of 80:20 between training and validation.
You can validate these two files using the script in validate_data.py. Note that the jsonl file must be properly formatted.
Once the dataset is ready, we can proceed to fine-tuning GPT-4o. [Refer to fine_tuning.py]
Use the below script to upload your tarining dataset and validation dataset.
response_train = client.files.create(
file=open("training_set.jsonl", "rb"),
purpose="fine-tune"
)
print(response_train)
response_validate = client.files.create(
file=open("validation_set.jsonl", "rb"),
purpose="fine-tune"
)
print(response_validate)
Once the training and validation datasets have been successfully created, use this script to fine-tune GPT-4o.
ft_job = client.fine_tuning.jobs.create(
training_file="file-id obtained from response_train",
model="gpt-4o-2024-08-06",
hyperparameters = {
"n_epochs": 2
},
validation_file = "file-id obtained from response_validate"
)
The fine-tuning process will take a while (<30 minutes). OpenAI will send an email to your registered email address once the process is complete and your model is ready to use.
To reproduce the results produced by GPT-Fabric++ run the following commands.
python eval_finetuned.py --task DoubleStraightBimanual --cached rectangle --save_vid True --total_runs 5 --eval_type zero-shot python eval_finetuned.py --task DoubleTriangle --cached square --save_vid True --total_runs 5 --eval_type zero-shot python eval_finetuned.py --task AllCornersInward --cached square --save_vid True --total_runs 5 --eval_type zero-shot python eval_finetuned.py --task CornersEdgesInwardBimanual --cached square --save_vid True --total_runs 5 --eval_type zero-shot
A significant part of this work is based on GPT-Fabric by Raval et. al. Check out their work here!
For any further queries, feel free to write to Gayathri at [email protected]
Older:
python eval_vanilla.py --task DoubleTriangle --eval_type zero-shot --cached square
python eval_vanilla.py --task DoubleStraight --eval_type zero-shot --cached rectangle
python eval_vanilla.py --task DoubleStraightBimanual --eval_type zero-shot --cached rectangle
python eval_vanilla.py --task AllCornersInward --eval_type zero-shot --cached square
python code_generation_bimanualv2.py --task DoubleTriangle --eval_type zero-shot --cached square --num_exexute 5
python code_generation_bimanualv2.py --task DoubleStraight --eval_type zero-shot --cached rectangle --num_exexute 5
python code_generation_bimanualv2.py --task DoubleStraightBimanual --eval_type zero-shot --cached rectangle --num_exexute 5
python code_generation_bimanualv2.py --task AllCornersInward --eval_type zero-shot --cached square --num_exexute 5
[There are 40 configurations totally. We can run a small scale experiment with just 5]
List of deprectaed files:
gpt_eval.py(Based on FabricFlowNet, has the segmentation fault)gpt_eval_2.py(Based on FabricFlowNet, has the segmentation fault)fold_eval.py(From earlier experiments, set-of-mark prompting)fold_eval_2.py(using the FabricFlowNet method)code_generation.py,code_generation_2.py,code_generation_bimanual.pyolder code versions