Posted by Paul Ruiz, Developer Relations Engineer
Back in May we released MediaPipe Solutions, a set of tools for no-code and low-code solutions to common on-device machine learning tasks, for Android, web, and Python. Today weโre happy to announce that the initial version of the iOS SDK, plus an update for the Python SDK to support the Raspberry Pi, are available. These include support for audio classification, face landmark detection, and various natural language processing tasks. Letโs take a look at how you can use these tools for the new platforms.
Object Detection for Raspberry Pi
Aside from setting up your Raspberry Pi hardware with a camera, you can start by installing the MediaPipe dependency, along with OpenCV and NumPy if you donโt have them already.
python -m pip install mediapipe |
From there you can create a new Python file and add your imports to the top.
import mediapipe as mp |
You will also want to make sure you have an object detection model stored locally on your Raspberry Pi. For your convenience, weโve provided a default model, EfficientDet-Lite0, that you can retrieve with the following command.
wget -q -O efficientdet.tflite -q https://storage.googleapis.com/mediapipe-models/object_detector/efficientdet_lite0/int8/1/efficientdet_lite0.tflite |
Once you have your model downloaded, you can start creating your new ObjectDetector, including some customizations, like the max results that you want to receive, or the confidence threshold that must be exceeded before a result can be returned.
# Initialize the object detection model options = vision.ObjectDetectorOptions( ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย base_options=base_options, ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย running_mode=vision.RunningMode.LIVE_STREAM, ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย max_results=max_results, ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย score_threshold=score_threshold, ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย result_callback=save_result) |
After creating the ObjectDetector, you will need to open the Raspberry Pi camera to read the continuous frames. There are a few preprocessing steps that will be omitted here, but are available in our sample on GitHub.
Within that loop you can convert the processed camera image into a new MediaPipe.Image, then run detection on that new MediaPipe.Image before displaying the results that are received in an associated listener.
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_image) |
Once you draw out those results and detected bounding boxes, you should be able to see something like this:
You can find the complete Raspberry Pi example shown above on GitHub, or see the official documentation here.
Text Classification on iOS
While text classification is one of the more direct examples, the core ideas will still apply to the rest of the available iOS Tasks. Similar to the Raspberry Pi, youโll start by creating a new MediaPipe Tasks object, which in this case is a TextClassifier.
var textClassifier: TextClassifier?
|
Now that you have your TextClassifier, you just need to pass a String to it to get a TextClassifierResult.
func classify(text: String) -> TextClassifierResult? { |
You can do this from elsewhere in your app, such as a ViewController DispatchQueue, before displaying the results.
let result = self?.textClassifier.classify(text: inputText) |
You can find the rest of the code for this project on GitHub, as well as see the full documentation on developers.google.com/mediapipe.
Getting started
To learn more, watch our I/O 2023 sessions: Easy on-device ML with MediaPipe, Supercharge your web app with machine learning and MediaPipe, and What’s new in machine learning, and check out the official documentation over on developers.google.com/mediapipe.
We look forward to all the exciting things you make, so be sure to share them with @googledevs and your developer communities!