Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba
SketchAgent leverages an off-the-shelf multimodal LLM to facilitate language-driven, sequential sketch generation through an intuitive sketching language. It can sketch diverse concepts, engage in interactive sketching with humans, and edit content via chat.
Clone the repository and navigate to the project folder:
git clone https://github.com/yael-vinker/SketchAgent.git
cd SketchAgent
Set up the environment:
conda env create -f environment.yml
conda activate sketch_agent
For Mac users, use the following environment file instead:
conda env create -f mac_environment.yml
conda activate sketch_agent
If python flashes a warning at you, try reinstalling cairosvg
:
conda uninstall cairosvg && conda install cairosvg
This repository requires an Anthropic API key. If you don't have one, create an Anthropic account and follow the instructions to obtain a key.
Once you have the key, save it in the .env
file:
ANTHROPIC_API_KEY=<your_key>
Generate a single sketch by running:
python gen_sketch.py --concept_to_draw "<your_concept_here>"
For example:
python gen_sketch.py --concept_to_draw "sailboat"
Optional arguments:
--seed_mode
Default is"deterministic"
for reproducible results. Set to"stochastic"
for increased variability.--path2save
By default, results are saved toresults/test/
.
Collaborate with SketchAgent by alternating strokes! To use the interactive interface:
python collab_sketch.py
This will launch a Flask-based web application. Once running, look for the following output in the terminal:
Server running at: http://<your-ip-address>:5000
Open the provided URL in your web browser to interact with the application. Results are saved to results/collab_sketching/
.
Use the text box to change the concept to be drawn.
- The
gen_sketch.py
script produces sketches with variability. Try running it multiple times to explore different outcomes. - Prompts are available in the
prompts.py file
. For unique concepts, ensure that your input prompt is clear and meaningful.
- Add support for chat based editing.
- Add SVG drawing process animations in HTML.
- Add support of other backbone models (GPT4o, LLama3).
If you find this useful for your research, please cite the following:
@misc{vinker2024sketchagent,
title={SketchAgent: Language-Driven Sequential Sketch Generation},
author={Yael Vinker and Tamar Rott Shaham and Kristine Zheng and Alex Zhao and Judith E Fan and Antonio Torralba},
year={2024},
eprint={2411.17673},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.17673},
}