This web sample demonstrates how to use the LLM Inference API to run common text-to-text generation tasks like information retrieval, email drafting, and document summarization, on web.
- A browser with WebGPU support (eg. Chrome on macOS or Windows).
Follow the following instructions to run the sample on your device:
- Make a folder for the task, named as
llm_task, and copy the index.html and index.js files into yourllm_taskfolder. - Download Gemma 2B (TensorFlow Lite 2b-it-gpu-int4 or 2b-it-gpu-int8) or convert an external LLM (Phi-2, Falcon, or StableLM) following the guide (only gpu backend is currently supported), into the
llm_taskfolder. - In your
index.jsfile, updatemodelFileNamewith your model file's name. - Run
python3 -m http.server 8000under thellm_taskfolder to host the three files (orpython -m SimpleHTTPServer 8000for older python versions). - Open
localhost:8000in Chrome. Then the button on the webpage will be enabled when the task is ready (~10 seconds).