LiteRT’s new Qualcomm AI Engine Direct (QNN) Accelerator unlocks dedicated NPU power for on-device GenAI on Android. It offers a unified mobile deployment workflow, SOTA performance (up to 100x speedup over CPU), and full model delegation. This enables smooth, real-time AI experiences, with FastVLM-0.5B achieving over 11,000 tokens/sec prefill on Snapdragon 8 Elite Gen 5 NPU.
Related Posts
Large Language Models On-Device with MediaPipe and TensorFlow Lite
Test out the MediaPipe LLM Inference API via our web demo. The Web SDK will be released in…
How Artificial Intelligence (AI) is Used in Various Business Niches
New developments in artificial intelligence are changing business practices, and encouraging companies to rethink how they do business…
[ML Story] Exploring the Future of Text Generation: A Deep Dive into Vertex AI using Jupyter…
[ML Story] Exploring the Future of Text Generation: A Deep Dive into Vertex AI using Jupyter Notebooks A…