How to add text similarity to your Android applications easily using MediaPipe and Kotlin

how-to-add-text-similarity-to-your-android-applications-easily-using-mediapipe-and-kotlin

Machine learning is everywhere when using a mobile app: recommending people or products, detecting people or tags in an image, etc.

Recommending means getting similar items where the user can see and consume something new.

Embedding is an essential component when you need to obtain similarities and is used for:

  • Semantic search
  • Clustering
  • Recommendations
  • Anomaly detection
  • Classification

An embedding is a numerical representation or a vector of floating point numbers, like this:

embedding": [
-0.006929283495992422,
-0.005336422007530928,
...
-4.547132266452536e-05,
-0.024047505110502243
]

Now you can perform operations between two embeddings, for example, distance measures of their relationship. Small distances suggest a high relationship and large distances suggest a low relationship. There are three most commonly used measures listed below.:

  • Euclidean Distance — Distance between ends of vectors
  • Cosine — Cosine of the angle between vectors
  • Dot product — Cosine multiplied by lengths of both vectors

How do you get embeddings? This is our next question. There are some libraries and APIs to get embeddings, like the following:

  • MediaPipe
  • Vertex AI client libraries
  • PaLM API by Google
  • OpenAI API
  • SentenceTransformers
  • FastText by Facebook

Next let’s talk about MediaPipe

MediaPipe Solutions

MediaPipe Solutions provide a set of libraries and tools to apply machine learning (ML) to your applications quickly. You can use SDKs for multiple platforms, including Android, Python, and Web, to get machine learning up and running in fewer minutes. Additionally, you can customize your models using transfer learning. MediaPipe offers libraries and resources for your applications and tools to customize and evaluate new solutions.

MediaPipe offers a library where you can get image and text embeddings; In our case, we use it for texts. The MediaPipe documentation recommends the Universal Sentence Encoder model.

How to get embeddings with MediaPipe

We are using the SDK for Python to get embeddings.

Here are the steps:

  • Add this dependency
dependencies {
implementation 'com.google.mediapipe:tasks-text:latest.release'
}
  • So you need to [download] the model and store it in your project. This model uses a dual encoder architecture and was trained on various question-answer datasets. The average latency on Pixel 6 using CPU/GPU is 18.21 ms with this model.
  • The model downloaded must be stored in the following path in your Android project:
/src/main/assets
  • First set the options like this:

https://medium.com/media/33d51243b80b33ee5d487bc7560b981e/href

  • Then you can get text embeddings like this:

https://medium.com/media/91e0ac91fae87b7abb7e6fb92fa23c0b/href

Okay, the next step would be to determine the level of similarity between the sentences.

Let’s look at an example with two sentences and their embeddings, something like this:

Sentences:

{
"text1":"How's it going?",
"text2":"I am fine"
}

Embeddings:

{
"text1embedding": "[127 16 185 127 82 127 128 50 127 127 172 10 127 128 127 127 7 160n 128 128 128 90 127 238 70 127 246 128 127 127 170 128 182 185 9 76n 154 196 4 42 136 127 127 127 128 28 151 127 127 4 135 127 80 157n 77 90 113 41 15 127 128 167 127 83 1 127 217 60 128 90 255 2n 161 232 24 171 127 9 55 12 127 210 127 87 181 79 127 88 128 124n 128 7 128 128 128 19 127 127 250 145]",
"text2embedding": "[127 44 209 127 35 127 128 128 127 81 176 26 127 128 127 127 242 180n 139 128 128 127 127 147 126 127 230 128 127 127 200 137 128 9 65 70n 217 128 22 124 142 127 118 127 194 131 128 127 110 245 142 127 127 151n 127 50 67 61 248 127 128 128 127 36 216 127 218 106 151 78 20 223n 182 189 222 233 127 1 76 11 127 253 127 33 186 127 127 235 128 121n 128 4 128 128 175 187 127 87 228 141]",
}

How to get the distance between two embeddings

There are three famous methods to obtain similarities between embeddings. These are:

  • Euclidean Distance — Distance between ends of vectors
Euclidean distance

Calculating Euclidean distance with Kotlin is something like:

https://medium.com/media/1c255b9ef7a7eb33a9e8e0b6133b84c3/href

  • Dot product — Cosine multiplied by lengths of both vectors
https://datahacker.rs/dot-product-inner-product/

The dot product is a simple vector operation in which we multiply element by element and then add those multiplication terms. There is a great library like Numpy with some similar functions for doing vector and matrix operations. This library is Multik and the dot product with this library is something like:

https://medium.com/media/f0bb4f329b59b19c4a0f96027a4baba5/href

  • Cosine — Cosine of the angle between vectors
https://medium.com/nerd-for-tech/a-comparison-of-cosine-similarity-vs-euclidean-distance-in-als-recommendation-engine-51898f9025e7

We can perform this operation with Kotlin is something like this:

https://medium.com/media/cabeb0706080d0f1d67f1c74afb91d8f/href

Although with Kotlin with Multik it is easy to calculate the similarity with MediaPipe, it is easier because you only use a method like this:

https://medium.com/media/53126b3ec60e939040c220006ef6b24e/href

So let’s look at a complete example.

https://medium.com/media/7d933d1071d71eec04534811918b8b16/href

This is an application that uses text embeddings with Kotlin, Compose, and MediaPipe.

The code is in the following link:

https://github.com/jggomez/AndroidMediaPipe

Using MediaPipe in Android

I hope this information is useful to you and remember to share this blog post, your comment is always welcome.

Visit my social networks:

Resources


How to add text similarity to your Android applications easily using MediaPipe and Kotlin was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
why-i-built-a-vscode-extension

Why I Built a Vscode Extension

Next Post
elevando-a-qualidade:-guia-pratico-de-testes-em-cypress-para-componentes-e-e2e-em-aplicacoes-react

Elevando a Qualidade: Guia Prático de Testes em Cypress para Componentes e E2E em Aplicações React

Related Posts