Introduction
Before the rise of LLMs, AI on Edge was one of the hot topic thanks to its remarkable capability to run ML models directly on devices. It’s not that this topic has lost its relevance; in fact, many tech giants are now shifting their attention to deploying LLMs on mobile platforms.
While we won’t be discussing Generative AI today, we’ll be revisiting the classic computer vision task of object detection. This blog offers a comprehensive tutorial on converting and quantizing the latest YOLOv10 object detection model from Ultralytics into LiteRT (formerly TensorFlow Lite) format, running inference on the resulting LiteRT model, and deploying it on Android for real-time detection.
If you have experience with object detection and deploying models on devices, you may be wondering why MobileNet SSD or EfficientDet Lite aren’t the best choices. Here’s why:
Why YOLOv10 over the others?
While MobileNet SSD and EfficientDet Lite perform well, they struggle with detecting smaller objects. YOLOv10, however, can detect smaller objects quickly and effectively.
Before we get started, let’s take a brief look at the YOLOv10 model and what LiteRT is.
YOLOv10
An advanced version of the YOLO model family, YOLOv10 is the latest go-to choice for real-time object detection tasks. Its enhanced architecture and training techniques make it particularly efficient for edge deployment.
Among all the variants, the nano version (YOLOv10-N) is the most suitable for mobile deployment due to its ability to operate in resource-constrained environments. Learn more about YOLOv10 here.
Note: We’ll be using the pre-trained YOLOv10-N model, which has been trained on the COCO dataset.
LiteRT
LiteRT, formerly known as TensorFlow Lite, is Google’s high-performance runtime for on-device AI. It allows you to effortlessly convert and run TensorFlow, PyTorch, and JAX models in the TFLite format.
Now that you have the overview, let’s dive into the coding part. Here’s the pipeline for our project:
Step 1: Model Conversion
A few years ago, converting YOLO models to TF Lite was quite challenging due to the complex steps and significant architectural differences in the models. However, that’s no longer the case, as Ultralytics now handles all the heavy lifting for you.
Get started with the Colab notebook by cloning this repository.
# Install Ultralytics.
!pip install ultralytics
# Load the YOLOv10n model.
model = YOLO("yolov10n.pt")
# Export the model to LiteRT (TF Lite) format.
model.export(format="tflite")
The export() function accepts some parameters as follows:
- format: Output format of the model such as tflite, onnx, tfjs, openvino, torchscript, etc.
- imgsz: Desired image size of the model input (height, width). The default is 640 x 640.
- int8 : Enables INT8 quantization of the model for faster inference. This is set to false by default.
There are many other parameters you can adjust based on your use-case, but the ones mentioned above should work well for now.
That’s it. In just two lines of code, you can completely convert the YOLO PyTorch model into LiteRT format. Here’s how the conversion process works in the background: PyTorch → ONNX Graph→ TensorFlow SavedModel→ LiteRT.
Step 2: Interpret the LiteRT model
Google AI Edge provides Model Explorer, a model visualization tool similar to Netron, which offers detailed insights into the model’s graph and architecture.
# Install Model Explorer.
!pip install ai-edge-model-explorer
LITE_RT_EXPORT_PATH = "yolov10n_saved_model/" # @param {type : 'string'}
LITE_RT_MODEL = "yolov10n_float16.tflite" # @param {type : 'string'}
LITE_RT_MODEL_PATH = LITE_RT_EXPORT_PATH + LITE_RT_MODEL
# Load the LiteRT model in Model Explorer.
model_explorer.visualize(LITE_RT_MODEL_PATH)
If you look at the output tensor, you’ll see there is only one node (Identity) with the shape [1, 300, 6], unlike the MobileNet SSD model, which typically had four output tensors.
You can also interpret the model using the AI Edge LiteRT library.
# Install Google AI Edge LiteRT
!pip install ai-edge-litert
# Load the TF Lite model.
interpreter = Interpreter(model_path = LITE_RT_MODEL_PATH)
interpreter.allocate_tensors()
# Get input and output details.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(f"Model input size: {input_size}")
print(f"Output tensor shape: {output_details[0]['shape']}")
The model input size is 640, and the output tensor shape [1, 300, 6] indicates the batch size (1), the maximum number of detections per image (300), and the values [xmin, ymin, xmax, ymax, score, class], respectively.
Step 3: Inference the converted LiteRT model
It’s inference time. Now that we’ve interpreted the model’s architecture, we can move forward with making inferences on Python using OpenCV.
Note: The results of the exported LiteRT model require post-processing, which involves normalizing the bounding box coordinates and mapping the class IDs to their corresponding labels.
In the Colab notebook, I’ve included some utility functions that handle all the required post-processing steps.
def detect(input_data, is_video_frame=False):
input_size = input_details[0]['shape'][1]
if is_video_frame:
original_height, original_width = input_data.shape[:2]
image = cv2.cvtColor(input_data, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (input_size, input_size))
image = image / 255.0
else:
image, (original_height, original_width) = load_image(input_data, input_size)
interpreter.set_tensor(input_details[0]['index'], np.expand_dims(image, axis=0).astype(np.float32))
interpreter.invoke()
output_data = [interpreter.get_tensor(detail['index']) for detail in output_details]
return output_data, (original_height, original_width)
# Postprocess the output.
def postprocess_output(output_data, original_dims, labels, confidence_threshold):
output_tensor = output_data[0]
detections = []
original_height, original_width = original_dims
for i in range(output_tensor.shape[1]):
box = output_tensor[0, i, :4]
confidence = output_tensor[0, i, 4]
class_id = int(output_tensor[0, i, 5])
if confidence > confidence_threshold:
x_min = int(box[0] * original_width)
y_min = int(box[1] * original_height)
x_max = int(box[2] * original_width)
y_max = int(box[3] * original_height)
label_name = labels.get(str(class_id), "Unknown")
detections.append({
"box": [y_min, x_min, y_max, x_max],
"score": confidence,
"class": class_id,
"label": label_name
})
return detections
The Colab notebook supports inference for both images and videos. Here are some of the results I obtained.
It’s impressive that the converted LiteRT model continues to perform exceptionally well after quantization, effectively detecting even tiny objects.
Feel free to try it out with any images or videos you have and see the results for yourself. 📷
Now, we’re all set to deploy the model on Android for on-device inference.
Step 4: Deploy the model on Android
In Step 1, we cloned the repository to run the Colab notebook, which also includes a sample Android app.
The final step in the notebook lets you download the LiteRT model. Once downloaded, copy it into the assets folder of the Android app. The default file name is yolov10n_float16.tflite. If you use a different file name, make sure to update Line 4 in the Constants.kt file accordingly.
// Change this with your TF Lite model name.
const val MODEL_PATH = "yolov10n_float16.tflite"
Note: The Android app code is adapted from [here]. Credits to the original author.
The Detector.kt file contains the logic for performing inference, as well as extracting bounding boxes, confidence scores, and labels for the detected objects.
// Detects the objects.
class Detector(
private val context: Context,
private val modelPath: String,
private val labelPath: String?,
private val detectorListener: DetectorListener,
private val message: (String) -> Unit
) {
private var interpreter: Interpreter
private var labels = mutableListOf()
private var tensorWidth = 0
private var tensorHeight = 0
private var numChannel = 0
private var numElements = 0
private val imageProcessor = ImageProcessor.Builder()
.add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION))
.add(CastOp(INPUT_IMAGE_TYPE))
.build()
init {
val options = Interpreter.Options().apply{
this.setNumThreads(4)
}
val model = FileUtil.loadMappedFile(context, modelPath)
interpreter = Interpreter(model, options)
labels.addAll(extractNamesFromMetadata(model))
if (labels.isEmpty()) {
if (labelPath == null) {
message("Model not contains metadata, provide LABELS_PATH in Constants.kt")
labels.addAll(MetaData.TEMP_CLASSES)
} else {
labels.addAll(extractNamesFromLabelFile(context, labelPath))
}
}
labels.forEach(::println)
val inputShape = interpreter.getInputTensor(0)?.shape()
val outputShape = interpreter.getOutputTensor(0)?.shape()
if (inputShape != null) {
tensorWidth = inputShape[1]
tensorHeight = inputShape[2]
// If in case input shape is in format of [1, 3, ..., ...]
if (inputShape[1] == 3) {
tensorWidth = inputShape[2]
tensorHeight = inputShape[3]
}
}
if (outputShape != null) {
numElements = outputShape[1]
numChannel = outputShape[2]
}
}
// Extracts bounding box, label, confidence.
private fun bestBox(array: FloatArray) : List{
val boundingBoxes = mutableListOf()
for (r in 0 until numElements) {
val cnf = array[r * numChannel + 4]
if (cnf > CONFIDENCE_THRESHOLD) {
val x1 = array[r * numChannel]
val y1 = array[r * numChannel + 1]
val x2 = array[r * numChannel + 2]
val y2 = array[r * numChannel + 3]
val cls = array[r * numChannel + 5].toInt()
val clsName = labels[cls]
boundingBoxes.add(
BoundingBox(
x1 = x1, y1 = y1, x2 = x2, y2 = y2,
cnf = cnf, cls = cls, clsName = clsName
)
)
}
}
return boundingBoxes
}
After that, the OverlayView.kt normalizes the bounding box coordinates and overlays them on the camera stream to visualize the results.
class OverlayView(context: Context?, attrs: AttributeSet?) : View(context, attrs) {
private var results = listOf()
private val boxPaint = Paint()
private val textBackgroundPaint = Paint()
private val textPaint = Paint()
private var bounds = Rect()
private val colorMap = mutableMapOf()
init {
initPaints()
}
fun clear() {
results = listOf()
textPaint.reset()
textBackgroundPaint.reset()
boxPaint.reset()
invalidate()
initPaints()
}
private fun initPaints() {
textBackgroundPaint.color = Color.WHITE
textBackgroundPaint.style = Paint.Style.FILL
textBackgroundPaint.textSize = 42f
textPaint.color = Color.WHITE
textPaint.style = Paint.Style.FILL
textPaint.textSize = 42f
}
override fun draw(canvas: Canvas) {
super.draw(canvas)
results.forEach { boundingBox ->
// Get or create a color for this label
val color = getColorForLabel(boundingBox.clsName)
boxPaint.color = color
boxPaint.strokeWidth = 8F
boxPaint.style = Paint.Style.STROKE
val left = boundingBox.x1 * width
val top = boundingBox.y1 * height
val right = boundingBox.x2 * width
val bottom = boundingBox.y2 * height
canvas.drawRoundRect(left, top, right, bottom, 16f, 16f, boxPaint)
val drawableText = "${boundingBox.clsName} ${Math.round(boundingBox.cnf * 100.0) / 100.0}"
textBackgroundPaint.getTextBounds(drawableText, 0, drawableText.length, bounds)
val textWidth = bounds.width()
val textHeight = bounds.height()
val textBackgroundRect = RectF(
left,
top,
left + textWidth + BOUNDING_RECT_TEXT_PADDING,
top + textHeight + BOUNDING_RECT_TEXT_PADDING
)
textBackgroundPaint.color = color // Set background color same as bounding box
canvas.drawRoundRect(textBackgroundRect, 8f, 8f, textBackgroundPaint)
canvas.drawText(drawableText, left, top + textHeight, textPaint)
}
}
private fun getColorForLabel(label: String): Int {
return colorMap.getOrPut(label) {
// Generate a random color or you can use a predefined set of colors
Color.rgb((0..255).random(), (0..255).random(), (0..255).random())
}
}
fun setResults(boundingBoxes: List) {
results = boundingBoxes
invalidate()
}
companion object {
private const val BOUNDING_RECT_TEXT_PADDING = 8
}
}
Finally, open the project in Android Studio, build it, then plug in your phone to install the app.
Here’s the final output on Android. The inference time was nearly 300ms. 🤩
That concludes this blog. I hope you found it enjoyable and gained valuable insights on converting a YOLOv10 model to LiteRT and deploying it on the edge.
If you have any questions, feel free to reach out to me on LinkedIn. Until then, keep learning, and stay tuned for more engaging content.
References & Resources
- YOLOv10-LiteRT-Android GitHub repository
- Colab notebook to convert YOLOv10-N to LiteRT
- YOLOv10 official documentation by Ultralytics
- Google AI Edge LiteRT
Acknowledgment
This project was developed during Google’s ML Developer Programs AI Sprint. Thanks to the MLDP team for providing Google Cloud credits to support this project.
YOLOv10 to LiteRT: Object Detection on Android with Google AI Edge was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.