AI Chef: Turning Food Photos into Recipes with Gemini Vision Pro in Colab


Have you ever stared at a photo of a delicious dish and wondered what it was or how to make it? With the power of AI and image recognition, that question can now be a thing of the past. In this article, we’ll explore how to leverage Gemini Vision Pro, a large language model from Google AI, alongside Colaboratory (Colab), a free Jupyter Notebook environment in the cloud, to generate recipes based on an image.

What is Gemini Vision Pro?

Gemini Vision Pro is a cutting-edge vision model from Google AI capable of understanding and interpreting visual content. It can analyze images and extract meaningful information, making it a powerful tool for various tasks, including image classification and object detection. In our case, we’ll utilize its ability to recognize food items within an image.

Colab: Your Free Cloud Workspace

Colab is a fantastic platform for experimenting with code without worrying about setting up a local machine. It provides a web-based Jupyter notebook environment that allows you to run Python code and visualize results directly in your browser. This suits Google Colab perfectly for our recipe generation project, as we can leverage Python libraries and Gemini Vision Pro’s capabilities within the cloud.

Steps to Generate Recipes from an Image

How do we access the API for Gemini Vision Pro? We can do that easily through Google AI Studio,

Google AI Studio offers a streamlined approach to building generative AI applications. It provides a web-based interface where you can interact with Gemini models through prompts and fine-tune parameters to achieve the desired outcome. This eliminates the need for extensive Python coding, making AI development more accessible for users of all experience levels.

Here’s a breakdown of how to use Gemini Pro Vision to generate recipes with Google AI Studio:

  • Access Google AI Studio: Sign up for (or sign in to) Google AI Studio.
  • Select Create New Freeform Prompt: The default prompt when we open the page is the chat prompt, but we can get it by just using the freeform prompt.
  • Choose the Language Model: Select “Gemini Vision Pro” for your project’s language model.
  • Prepare your Image: Upload an image containing the food dish you want a recipe for.
  • Craft your Prompt: In the prompt section, describe your desired outcome. For example, “Identify the dish in this image, describe it, and suggest a recipe for this dish by specifying the ingredients and the instructions on how to make it. explain step by step with detail.”
  • Get the Code: After we’ve created our prompt, we can get the code and copy it to Colab.
  • Create a new API Key: The next thing to do is we need to create a new API key to use the API. We can do that from the Get API Key menu on the top left of the navbar.
  • Configure the API Key: Now that we have the newly created API key, copy it and replace the YOUR_API_KEY placeholder in the code.
  • Use our Image: Create a function to make the code cleaner. It accepts an image file path to replace the placeholder “image0.jpeg”. Return the model’s response.
def getRecipe(image_path:str):
# Validate that an image is present
if not (img := Path(image_path)).exists():
raise FileNotFoundError(f"Could not find image: {img}")
image_part = {
"mime_type": "image/jpeg",
"data": Path(image_path).read_bytes()
prompt_parts = [
You are a chef.
Identify the dish in this image, describe it, and suggest a recipe for this dish by specifying the ingredients and the instructions on how to make it.
explain step by step with detail.
response = model.generate_content(prompt_parts)
return response.text.strip()
  • Try it Out: Now that the code is done, try it to your heart’s content!
image_file = "sate.jpg"
This dish is called "Sate Padang" which is a popular Indonesian dish. It consists of skewered grilled beef, served with a peanut sauce.
To make this dish, you will need the following ingredients:
* 1 pound of beef sirloin, cut into thin strips
* 1/2 cup of soy sauce
* 1/4 cup of vegetable oil
* 1 teaspoon of garlic powder
* 1 teaspoon of ground coriander
* 1 teaspoon of ground cumin
* 1/2 teaspoon of turmeric powder
* 1/2 teaspoon of salt
* 1/4 teaspoon of black pepper
* 1/2 cup of peanut butter
* 1/2 cup of water
* 1 tablespoon of lime juice
* 1/4 teaspoon of red pepper flakes
* 1 cucumber, thinly sliced
* 1 tomato, thinly sliced
* 1 onion, thinly sliced

1. In a large bowl, combine the beef, soy sauce, vegetable oil, garlic powder, coriander, cumin, turmeric, salt, and pepper. Stir to coat the beef evenly.
2. Cover the bowl and refrigerate for at least 30 minutes or overnight.
3. Preheat the grill to medium-high heat.
4. Thread the beef strips onto skewers.
5. Grill the skewers for 5–7 minutes per side or until cooked through.
6. In a blender, combine the peanut butter, water, lime juice, and red pepper flakes. Blend until smooth.
7. Serve the skewers with the peanut sauce, cucumber, tomato, and onion.
8. Enjoy!

Start your AI culinary adventure today!

By following the steps outlined above and leveraging the power of Gemini Vision Pro and Colab, you can unlock a new way to explore food and discover delicious recipes based on any image that sparks your culinary curiosity.

AI Chef: Turning Food Photos into Recipes with Gemini Vision Pro in Colab was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

Using Gemini 1.5 Pro to create video trailers

Next Post

Creating your own UI extension points in Umbraco v14 – Part 6: Filters & Conditions

Related Posts