Software

6 minute read

YOLOv10 to LiteRT: Object Detection on Android with Google AI Edge

November 13, 2024

yolov10-to-litert:-object-detection-on-android-with-google-ai-edge

Introduction

Before the rise of LLMs, AI on Edge was one of the hot topic thanks to its remarkable capability to run ML models directly on devices. It’s not that this topic has lost its relevance; in fact, many tech giants are now shifting their attention to deploying LLMs on mobile platforms.

While we won’t be discussing Generative AI today, we’ll be revisiting the classic computer vision task of object detection. This blog offers a comprehensive tutorial on converting and quantizing the latest YOLOv10 object detection model from Ultralytics into LiteRT (formerly TensorFlow Lite) format, running inference on the resulting LiteRT model, and deploying it on Android for real-time detection.

If you have experience with object detection and deploying models on devices, you may be wondering why MobileNet SSD or EfficientDet Lite aren’t the best choices. Here’s why:

Why YOLOv10 over the others?
While MobileNet SSD and EfficientDet Lite perform well, they struggle with detecting smaller objects. YOLOv10, however, can detect smaller objects quickly and effectively.

Before we get started, let’s take a brief look at the YOLOv10 model and what LiteRT is.

YOLOv10

An advanced version of the YOLO model family, YOLOv10 is the latest go-to choice for real-time object detection tasks. Its enhanced architecture and training techniques make it particularly efficient for edge deployment.

Among all the variants, the nano version (YOLOv10-N) is the most suitable for mobile deployment due to its ability to operate in resource-constrained environments. Learn more about YOLOv10 here.

Note: We’ll be using the pre-trained YOLOv10-N model, which has been trained on the COCO dataset.

LiteRT

LiteRT, formerly known as TensorFlow Lite, is Google’s high-performance runtime for on-device AI. It allows you to effortlessly convert and run TensorFlow, PyTorch, and JAX models in the TFLite format.

Now that you have the overview, let’s dive into the coding part. Here’s the pipeline for our project:

Pipeline: YOLOv10-N to LiteRT on Android

Step 1: Model Conversion

A few years ago, converting YOLO models to TF Lite was quite challenging due to the complex steps and significant architectural differences in the models. However, that’s no longer the case, as Ultralytics now handles all the heavy lifting for you.

Get started with the Colab notebook by cloning this repository.

# Install Ultralytics.
!pip install ultralytics

# Load the YOLOv10n model.
model = YOLO("yolov10n.pt")

# Export the model to LiteRT (TF Lite) format.
model.export(format="tflite")

The export() function accepts some parameters as follows:

format: Output format of the model such as tflite, onnx, tfjs, openvino, torchscript, etc.
imgsz: Desired image size of the model input (height, width). The default is 640 x 640.
int8 : Enables INT8 quantization of the model for faster inference. This is set to false by default.

There are many other parameters you can adjust based on your use-case, but the ones mentioned above should work well for now.

That’s it. In just two lines of code, you can completely convert the YOLO PyTorch model into LiteRT format. Here’s how the conversion process works in the background: PyTorch → ONNX Graph→ TensorFlow SavedModel→ LiteRT.

Step 2: Interpret the LiteRT model

Google AI Edge provides Model Explorer, a model visualization tool similar to Netron, which offers detailed insights into the model’s graph and architecture.

# Install Model Explorer.
!pip install ai-edge-model-explorer

LITE_RT_EXPORT_PATH = "yolov10n_saved_model/" # @param {type : 'string'}
LITE_RT_MODEL = "yolov10n_float16.tflite" # @param {type : 'string'}

LITE_RT_MODEL_PATH = LITE_RT_EXPORT_PATH + LITE_RT_MODEL

# Load the LiteRT model in Model Explorer.
model_explorer.visualize(LITE_RT_MODEL_PATH)

**yolov10_float16.tflite** visualized on Model Explorer

If you look at the output tensor, you’ll see there is only one node (Identity) with the shape [1, 300, 6], unlike the MobileNet SSD model, which typically had four output tensors.

You can also interpret the model using the AI Edge LiteRT library.

# Install Google AI Edge LiteRT
!pip install ai-edge-litert

# Load the TF Lite model.
interpreter = Interpreter(model_path = LITE_RT_MODEL_PATH)
interpreter.allocate_tensors()

# Get input and output details.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(f"Model input size: {input_size}")
print(f"Output tensor shape: {output_details[0]['shape']}")

The model input size is 640, and the output tensor shape [1, 300, 6] indicates the batch size (1), the maximum number of detections per image (300), and the values [xmin, ymin, xmax, ymax, score, class], respectively.

Step 3: Inference the converted LiteRT model

It’s inference time. Now that we’ve interpreted the model’s architecture, we can move forward with making inferences on Python using OpenCV.

Note: The results of the exported LiteRT model require post-processing, which involves normalizing the bounding box coordinates and mapping the class IDs to their corresponding labels.

In the Colab notebook, I’ve included some utility functions that handle all the required post-processing steps.

def detect(input_data, is_video_frame=False):
    input_size = input_details[0]['shape'][1]

    if is_video_frame:
        original_height, original_width = input_data.shape[:2]
        image = cv2.cvtColor(input_data, cv2.COLOR_BGR2RGB)
        image = cv2.resize(image, (input_size, input_size))
        image = image / 255.0
    else:
        image, (original_height, original_width) = load_image(input_data, input_size)

    interpreter.set_tensor(input_details[0]['index'], np.expand_dims(image, axis=0).astype(np.float32))
    interpreter.invoke()

    output_data = [interpreter.get_tensor(detail['index']) for detail in output_details]
    return output_data, (original_height, original_width)



# Postprocess the output.
def postprocess_output(output_data, original_dims, labels, confidence_threshold):
  output_tensor = output_data[0]
  detections = []
  original_height, original_width = original_dims

  for i in range(output_tensor.shape[1]):
    box = output_tensor[0, i, :4]
    confidence = output_tensor[0, i, 4]
    class_id = int(output_tensor[0, i, 5])

    if confidence > confidence_threshold:
      x_min = int(box[0] * original_width)
      y_min = int(box[1] * original_height)
      x_max = int(box[2] * original_width)
      y_max = int(box[3] * original_height)

      label_name = labels.get(str(class_id), "Unknown")

      detections.append({
          "box": [y_min, x_min, y_max, x_max],
          "score": confidence,
          "class": class_id,
          "label": label_name
      })

  return detections

The Colab notebook supports inference for both images and videos. Here are some of the results I obtained.

Inference on image | Location: Leh, Ladakh

Inference on image | Image source: DCP Expeditions | Location: CSMT, Mumbai

Inference on video | Source: YouTube | Location: London, UK

It’s impressive that the converted LiteRT model continues to perform exceptionally well after quantization, effectively detecting even tiny objects.

Feel free to try it out with any images or videos you have and see the results for yourself. 📷

Now, we’re all set to deploy the model on Android for on-device inference.

Step 4: Deploy the model on Android

In Step 1, we cloned the repository to run the Colab notebook, which also includes a sample Android app.

The final step in the notebook lets you download the LiteRT model. Once downloaded, copy it into the assets folder of the Android app. The default file name is yolov10n_float16.tflite. If you use a different file name, make sure to update Line 4 in the Constants.kt file accordingly.

// Change this with your TF Lite model name.
const val MODEL_PATH = "yolov10n_float16.tflite"

Note: The Android app code is adapted from [here]. Credits to the original author.

The Detector.kt file contains the logic for performing inference, as well as extracting bounding boxes, confidence scores, and labels for the detected objects.

// Detects the objects.
class Detector(
    private val context: Context,
    private val modelPath: String,
    private val labelPath: String?,
    private val detectorListener: DetectorListener,
    private val message: (String) -> Unit
) {
    private var interpreter: Interpreter
    private var labels = mutableListOf()

    private var tensorWidth = 0
    private var tensorHeight = 0
    private var numChannel = 0
    private var numElements = 0

    private val imageProcessor = ImageProcessor.Builder()
        .add(NormalizeOp(INPUT_MEAN, INPUT_STANDARD_DEVIATION))
        .add(CastOp(INPUT_IMAGE_TYPE))
        .build()

    init {
        val options = Interpreter.Options().apply{
            this.setNumThreads(4)
        }

        val model = FileUtil.loadMappedFile(context, modelPath)
        interpreter = Interpreter(model, options)

        labels.addAll(extractNamesFromMetadata(model))
        if (labels.isEmpty()) {
            if (labelPath == null) {
                message("Model not contains metadata, provide LABELS_PATH in Constants.kt")
                labels.addAll(MetaData.TEMP_CLASSES)
            } else {
                labels.addAll(extractNamesFromLabelFile(context, labelPath))
            }
        }

        labels.forEach(::println)

        val inputShape = interpreter.getInputTensor(0)?.shape()
        val outputShape = interpreter.getOutputTensor(0)?.shape()

        if (inputShape != null) {
            tensorWidth = inputShape[1]
            tensorHeight = inputShape[2]

            // If in case input shape is in format of [1, 3, ..., ...]
            if (inputShape[1] == 3) {
                tensorWidth = inputShape[2]
                tensorHeight = inputShape[3]
            }
        }

        if (outputShape != null) {
            numElements = outputShape[1]
            numChannel = outputShape[2]
        }
    }

// Extracts bounding box, label, confidence.
private fun bestBox(array: FloatArray) : List {
    val boundingBoxes = mutableListOf()
    for (r in 0 until numElements) {
        val cnf = array[r * numChannel + 4]
        if (cnf > CONFIDENCE_THRESHOLD) {
            val x1 = array[r * numChannel]
            val y1 = array[r * numChannel + 1]
            val x2 = array[r * numChannel + 2]
            val y2 = array[r * numChannel + 3]
            val cls = array[r * numChannel + 5].toInt()
            val clsName = labels[cls]
            boundingBoxes.add(
                BoundingBox(
                    x1 = x1, y1 = y1, x2 = x2, y2 = y2,
                    cnf = cnf, cls = cls, clsName = clsName
                )
            )
        }
    }
    return boundingBoxes
}

After that, the OverlayView.kt normalizes the bounding box coordinates and overlays them on the camera stream to visualize the results.

class OverlayView(context: Context?, attrs: AttributeSet?) : View(context, attrs) {

    private var results = listOf()
    private val boxPaint = Paint()
    private val textBackgroundPaint = Paint()
    private val textPaint = Paint()

    private var bounds = Rect()
    private val colorMap = mutableMapOf()

    init {
        initPaints()
    }

    fun clear() {
        results = listOf()
        textPaint.reset()
        textBackgroundPaint.reset()
        boxPaint.reset()
        invalidate()
        initPaints()
    }

    private fun initPaints() {
        textBackgroundPaint.color = Color.WHITE
        textBackgroundPaint.style = Paint.Style.FILL
        textBackgroundPaint.textSize = 42f

        textPaint.color = Color.WHITE
        textPaint.style = Paint.Style.FILL
        textPaint.textSize = 42f
    }

    override fun draw(canvas: Canvas) {
        super.draw(canvas)

        results.forEach { boundingBox ->
            // Get or create a color for this label
            val color = getColorForLabel(boundingBox.clsName)
            boxPaint.color = color
            boxPaint.strokeWidth = 8F
            boxPaint.style = Paint.Style.STROKE

            val left = boundingBox.x1 * width
            val top = boundingBox.y1 * height
            val right = boundingBox.x2 * width
            val bottom = boundingBox.y2 * height

            canvas.drawRoundRect(left, top, right, bottom, 16f, 16f, boxPaint)

            val drawableText = "${boundingBox.clsName} ${Math.round(boundingBox.cnf * 100.0) / 100.0}"

            textBackgroundPaint.getTextBounds(drawableText, 0, drawableText.length, bounds)
            val textWidth = bounds.width()
            val textHeight = bounds.height()

            val textBackgroundRect = RectF(
                left,
                top,
                left + textWidth + BOUNDING_RECT_TEXT_PADDING,
                top + textHeight + BOUNDING_RECT_TEXT_PADDING
            )
            textBackgroundPaint.color = color // Set background color same as bounding box
            canvas.drawRoundRect(textBackgroundRect, 8f, 8f, textBackgroundPaint)

            canvas.drawText(drawableText, left, top + textHeight, textPaint)
        }
    }

    private fun getColorForLabel(label: String): Int {
        return colorMap.getOrPut(label) {
            // Generate a random color or you can use a predefined set of colors
            Color.rgb((0..255).random(), (0..255).random(), (0..255).random())
        }
    }

    fun setResults(boundingBoxes: List) {
        results = boundingBoxes
        invalidate()
    }

    companion object {
        private const val BOUNDING_RECT_TEXT_PADDING = 8
    }
}

Finally, open the project in Android Studio, build it, then plug in your phone to install the app.

Here’s the final output on Android. The inference time was nearly 300ms. 🤩

That concludes this blog. I hope you found it enjoyable and gained valuable insights on converting a YOLOv10 model to LiteRT and deploying it on the edge.

If you have any questions, feel free to reach out to me on LinkedIn. Until then, keep learning, and stay tuned for more engaging content.

References & Resources

Acknowledgment

This project was developed during Google’s ML Developer Programs AI Sprint. Thanks to the MLDP team for providing Google Cloud credits to support this project.

YOLOv10 to LiteRT: Object Detection on Android with Google AI Edge was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Simple Website Personalization That Increased Conversions by 560%

November 13, 2024

Software

[Sep] ML Community — Highlights and Achievements

November 13, 2024

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

A Small Prototype for Cost-Aware Bug Investigation

OpenAI says GPT 5.6 is the ‘preferred model’ for Microsoft Copilot amid breakup chatter

An AI agent startup just let its agent run its $100 million fundraise

Trending Tags

YOLOv10 to LiteRT: Object Detection on Android with Google AI Edge

Introduction

YOLOv10

LiteRT

Step 1: Model Conversion

Step 2: Interpret the LiteRT model

Step 3: Inference the converted LiteRT model

Step 4: Deploy the model on Android

References & Resources

Acknowledgment

Leave a Reply Cancel reply

Previous Post

The Simple Website Personalization That Increased Conversions by 560%

Next Post

[Sep] ML Community — Highlights and Achievements

YOLOv10 to LiteRT: Object Detection on Android with Google AI Edge

Introduction

YOLOv10

LiteRT

Step 1: Model Conversion

Step 2: Interpret the LiteRT model

Step 3: Inference the converted LiteRT model

Step 4: Deploy the model on Android

References & Resources

Acknowledgment

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts