Software

3 minute read

Pipeline to create .task files for MediaPipe LLM Inference API

February 4, 2026

Pipeline to create .task files for MediaPipe LLM Inference API

Written by Georgios Soloupis, AI and Android GDE.

MediaPipe Solutions offers a powerful suite of libraries and tools designed to help you quickly integrate artificial intelligence (AI) and machine learning (ML) into your applications. These solutions are ready to use out of the box, fully customizable, and compatible across multiple development platforms.

The LLM Inference API enables on-device execution of large language models (LLMs), allowing your applications to perform a wide range of tasks, including text generation, natural language information retrieval, and document summarization, without relying on cloud services.

In this showcase, we’ll demonstrate how to create a .task bundle format, which packages and configures your model for use with the LLM Inference API. This format ensures efficient on-device deployment and seamless integration with your applications.

AI Edge Torch

AI Edge Torch is a python library that supports converting PyTorch models into a .tflite format, which can then be run with TensorFlow Lite and MediaPipe. This enables applications for Android, iOS and IoT that can run models completely on-device. AI Edge Torch offers broad CPU coverage, with initial GPU and NPU support. AI Edge Torch seeks to closely integrate with PyTorch, building on top of torch.export()

All steps have been verified at a Google Colab environment.

Clone the repository.

!git clone https://github.com/google-ai-edge/ai-edge-torch.git

2. Version of TensorFlow to use.

!pip install tensorflow==2.19.0

3. Install additional libraries.

!pip install torch-xla2
!pip install ai_edge_quantizer
!pip install ai_edge_litert

4. Install huggingface_hub library.

!pip install -U "huggingface_hub[cli]"

5. You can use login to HuggingFace in case you want to download models that needs verification.

from huggingface_hub import login
from google.colab import userdata
HF_TOKEN=userdata.get('HF_READ')

if HF_TOKEN:
    login(HF_TOKEN)
    print("Successfully logged in to Hugging Face!")
else:
    print("Token is not set. Please save the token first.")

6. Download Gemma 3 1B model.

!huggingface-cli download google/gemma-3-1b-pt --repo-type model --local-dir ./models/google/gemma-3-1b-pt

7. Use the script to convert the model and create the .tflite file.

!python -m ai_edge_torch.generative.examples.gemma3.convert_gemma3_to_tflite 
    --quantize="dynamic_int8" 
    --checkpoint_path=/content/ai-edge-torch/models/google/gemma-3-1b-pt --output_path="https://medium.com/content/ai-edge-torch" 
    --prefill_seq_lens=2048 --kv_cache_max_len=4096

8. Create the tokenizer.model that is going to be used by the LLM Inference API.

%cd ai_edge_torch/generative/tools

!python tokenizer_to_sentencepiece.py 
    --checkpoint=google/gemma-3-1b-pt 
    --output_path=/content/tokenizer.model

9. Install MediaPipe library.

!pip install mediapipe

10. Bundle everything together at a .task file

TFLITE_MODEL = "https://medium.com/content/ai-edge-torch/gemma3-1b_q8_ekv4096.tflite"
TOKENIZER_MODEL = "https://medium.com/content/tokenizer.model"
START_TOKEN=""
STOP_TOKENS=["", ""]
from mediapipe.tasks.python.genai import bundler
config = bundler.BundleConfig(
    tflite_model=TFLITE_MODEL,
    tokenizer_model=TOKENIZER_MODEL,
    start_token=START_TOKEN,
    stop_tokens=STOP_TOKENS,
    output_filename="https://medium.com/content/gemma3.task",
    prompt_prefix="usern",
    prompt_suffix="nmodeln",
)
bundler.create_bundle(config)

Full notebook with all the steps can be found here.

An android project to immediately use the generated .task file is this.

Conclusion
MediaPipe Solutions offers a comprehensive suite of tools that make it easy to integrate AI and ML into applications across multiple platforms. Among these, the LLM Inference API enables large language models (LLMs) to run fully on-device, unlocking real-time capabilities without relying on the cloud.
To support this, we showcased the creation of the .task bundle format, MediaPipe’s deployable package for on-device inference. Using steps that run seamlessly on Colab notebooks, we outline the full pipeline for converting a pre-trained language model from Hugging Face into a .task file. The workflow involves four stages: setting up the environment with dependencies such as tensorflow, ai-edge-torch, and mediapipe; downloading the model using a Hugging Face token; converting it to an optimized TensorFlow Lite (.tflite) format with quantization for mobile efficiency; and finally, bundling the model with its tokenizer into a .task file using MediaPipe’s bundler. This step also includes defining critical metadata like prompt format and start/stop tokens to ensure correct inference behavior. The resulting .task file enables seamless integration into GenAI applications on mobile and edge devices.

Pipeline to create .task files for MediaPipe LLM Inference API was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

I Got Tired of cd-ing Into Nested Folders, So I Built a File Finder in Rust

February 4, 2026

Software

Why Your Competitive Intelligence Scrapers Fail: A Deep Dive into Browser Fingerprinting

February 4, 2026

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

When will AI replace Software Developers?

How Many Jira Tickets Does a “Good” Engineer Close Per Week?

The Forbidden Fruit Has Already Been Bitten

Trending Tags

Pipeline to create .task files for MediaPipe LLM Inference API

Pipeline to create .task files for MediaPipe LLM Inference API

AI Edge Torch

Leave a Reply Cancel reply

Previous Post

I Got Tired of cd-ing Into Nested Folders, So I Built a File Finder in Rust

Next Post

Why Your Competitive Intelligence Scrapers Fail: A Deep Dive into Browser Fingerprinting

Pipeline to create .task files for MediaPipe LLM Inference API

Pipeline to create .task files for MediaPipe LLM Inference API

AI Edge Torch

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts