Software

12 minute read

Photo StoryTelling – Leveraging Generative AI and Google APIs to compose posts from your photo…

February 14, 2024

Photo StoryTelling – Leveraging Generative AI and Google APIs to compose posts from your photo albums

Nowadays we take tons of photos of our smartphones, many of them published to social networks or messaging apps. Sometimes images are not enough to express a moment you have captured in your daily routine, with your family, or in an unforgettable trip.

What if we could use Generative AI to express with words what our photos represent, and have AI composing a storytelling about your moments?
You could publish it online, share it with your dear ones, or keep that diary to yourself.

As that is a tool I would like to use myself, I decided to implement it not as a researcher / ML engineer / data scientist I am, but rather as a developer with a cool idea, interested in leveraging and connecting a bunch of powerful Google APIs to accomplish the task.

This article is accompanied by a Jupyter/Colab notebook that contains the whole solution, which includes EXIF photo metadata extraction, uses Google Maps API for extracting information about the places where the photos were taken, and Generative AI APIs like Vertex Imagen for image captioning and Vertex Palm API for the blog post generation.

The output of that pipeline is a generated blog post describing an album of photos. You can upload your own photo albums to the Colab notebook and easily see how Generative AI describes the moments registered with your camera.

Setup

This project uses Google Cloud Plaform (GCP) to get access to the APIs. If you want to run the Colab, you can use an existing GCP account or create a new one here with USD 300 of free credits.

If you want to run the notebook on Colab, with the provided photos or your own, the Setup section of the notebook will walk you through installing the necessary libraries, authenticating to GCP using Google Auth, getting Google Maps Platform API key and enabling the necessary APIs:

The last part of the setup is to download some example photos I have provided from my trips to Los Angeles and San Francisco

Processing photos

In this notebook section, you configure the path to the folder containing the photos of your album. It will process the photos using Pillow imaging library for the following tasks:

Extract EXIF metadata – digital photos typically contain metadata associated, that you can inspect by checking the properties of those files. In this project, we are interested in the date & time and geographic location (latitude/longitude) the photo was captured
Resize to a smaller size (max dimension (width or height) of 800 pixels, to minimize network traffic in API requests
Convert to JPG, as HEIF format for example is not supported by the APIs)
Save the converted photos to a photos_converted subfolder

Extracting location and nearby places using Google Maps APIs

Google Maps offers a number of specialized APIs for different tasks. Here we use the following APIs:

Reverse Geocoding — Returns the potential locations from a geographic coordinate.
Nearby Places — Returns places (e.g. landmarks, buildings, restaurants) near the coordinate.

After you have setup the Maps Platform API key, calling Geocoding API and Places API is very simple.

import googlemaps
gmaps = googlemaps.Client(key=MAPS_API_KEY)
locations = gmaps.reverse_geocode(latlng=(lat,lng))
nearby_places = gmaps.places_nearby(location=(lat,lng), radius=radius)

Photo captioning using Generative AI with Vertex Imagen

In this section of the notebook we start using Generative AI. Vertex Imagen provides an API for image captioning, i.e., capable of describing in textual format what is in the picture.

For that we first need to initialize the Vertex AI SDK with your GCP project.

import vertexai
from vertexai.vision_models import ImageTextModel, Image

vertexai.init(project=PROJECT_ID)
model = ImageTextModel.from_pretrained("imagetext")

Then getting a caption from an image is straightforward.

source_image = Image.load_from_file(location=path)
captions = model.get_captions(
             image=source_image,
             number_of_results=1,
             language="en",
)

Designing the prompt

In the context of a Large Language Model (LLM), a prompt is the input or query provided to the model to generate a response. The quality and specificity of the prompt play a crucial role in shaping the output of the model.

LLMs are typically finetuned to follow instructions in the prompt, being capable of performing tasks that they were not trained upon before. Designing a good prompt typically requires a trial-an-error process interacting with the LLM and checking if the output is close (or better) than expected.

This project required the design a prompt to instruct the LLM to generate a post describing the moments captured in a set of photos. That includes specifying some personality for the LLM writing, the format of the inputs (list with photos metadata), the constraints to follow (e.g. referencing the when describing a photo), and the expected output format, with interleaved text and photo placeholders.

You can check the prompt template I came up with encapsulated in the function below. Notice it includes a placeholder for the photo descriptions and for a context paragraph that you might want to give more information to the LLM about the context in which the photos were taken.

Prompt Engineering

Prompt engineering techniques are some design patterns on prompting that have been discovered and proposed by the researchers / community to help the LLMs generate better output. One of those techniques is the few-shot prompting, in which we provide some examples of inputs and expected outputs, like in our prompt template below. In my tests with Vertex Palm API this technique helps to get the desired output most of the times.

 def generate_prompt(context, pictures_infos):
    prompt = f"""
      You are a copywriter and journalist.
      Can you help me to write a photo tour that describes the moments
      registered in a photo album from a context and some information 
      I provide about the photos?

      The items were already sorted by the date and time the photos were taken.
      Pay attention to the dates and time to infer how many days were
      covered by these photos and at which time of the day they were taken.

      Please include descriptions of all the photos taken.
      Only report places or experiences that are described by the
       photo informations.


      The photos information has the following structure:

      -  | Date the photo was taken | Time the photo was taken |
        Photo Description generated by an LLM  |
        Approximate Locations where the photo was taken |
        Approximate Nearby locations where photo was taken

      Here is an example of photo information and how it should be generated 
      in plain text, interleaving photo descriptions and the .


      Example photo information:
      -  | Date: 08/04/2023 | Time: 07:53:13 |
      Photo Description: a man stands in front of a sign that says 
      welcome to the united states |
      Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles 
      International Airport, Terminal B, Los Angeles, Los Angeles County |
      Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge, 
      ICE International Currency Exchange, Relay, Bank of America.

      Expected output:
      I was happy finally arriving to my destination, Los Angeles.
      While I went into US Customs my heart was filled of anxiety 
      to leave the airport and get to visit the city.
      

      ```
      Photos album context: {context}

      Photos description:
      {pictures_infos}
      ```
      """
    return prompt

Generating the prompt

In this example, we provide the photos metadata and a short context paragraph to generate the prompt according to our template above.

album_context = """I flew to Los Angeles for a short trip, 
                and the album contains the photos 
                from the day I arrived there. 
                The man in those photos is myself.
                """

blog_prompt = generate_prompt(album_context, photos_info_concat)

Now give it a try. Just copy bellow prompt generated by this pipeline and paste it into a user-facing LLM chat system like BARD.
You might be impressed with the results like I was😀!

You are a copywriter and journalist.
Can you help me write a photo tour that describes the moments registered in a
photo album from a context and some information I provide about the photos?

The items are already sorted by the time the photos was taken.
Pay attention to the dates and time to infer how many days were
covered by these photos and in which time of the day they were taken.
Please include descriptions of all the photos taken.
Do not report any place or experience that is not described by the
photo informations.
The photos information has following structure:
-  | Date the photo was taken | Time the photo was taken |
  Photo Description generated by an LLM  |
  Approximate Locations where the photo was taken |
  Approximate Nearby locations where photo was taken
Here is an example of photo information and how it should be generated in plain
text,
interleaving photo descriptions and the .
Example photo information:
-  | Date: 08/04/2023 | Time: 07:53:13 |
Photo Description: a man stands in front of a sign that says welcome to the
united states |
Possible Photo Locations: BURBERRY LAX TERMINAL B, Los Angeles International
Airport, Terminal B, Los Angeles, Los Angeles County |
Possible Photo Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
Expected output:
I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety to leave the airport
and get to visit the city.


```
Photos album context: I flew to Los Angeles for a short trip, and the album
contains the photos from the day I arrived there. The man in those photos is myself.

Photos description:
-  | Date and time: 08/04/2023 (Friday) 07:53 AM | Photo Description: a
man stands in front of a sign that says welcome to the united states | Locations: BURBERRY
LAX TERMINAL B, Los Angeles International Airport, Los Angeles, Los Angeles County,
California | Possible Nearby locations: Los Angeles, Star Alliance Lounge, ICE
International Currency Exchange, Relay, Bank of America
-  | Date and time: 08/04/2023 (Friday) 09:32 AM | Photo Description: a man in a
nasa shirt is sitting in a white car | Locations: Los Angeles International Airport, Los
Angeles, Los Angeles County, California, United States | Possible Nearby locations: Los
Angeles
-  | Date and time: 08/04/2023 (Friday) 09:59 AM | Photo Description: a man in a
white shirt is driving a mustang | Locations: Westchester, Los Angeles, Los Angeles
County, California, United States | Possible Nearby locations: Plaza Towers OBGYN:
Lawrence Bruksch, MD, LA Fitness, Dr. Jitsen Chang, Obstetrician-gynecologist, Kinecta
Federal Credit Union - Westchester, Clarity Retirement
-  | Date and time: 08/04/2023 (Friday) 10:29 AM | Photo Description: a man
wearing a nasa shirt stands on a beach | Locations: Los Angeles, Los Angeles County,
California, United States | Possible Nearby locations: Los Angeles, Venice
-  | Date and time: 08/04/2023 (Friday) 11:29 AM | Photo Description: a man sits
on a bench in front of a subba gump shrimp restaurant | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Bubba Gump Shrimp
Co., Santa Monica Pier Rock Shop, Pier Burger, Santa Monica Police Pier Substation, 66-To-
Cali
-  | Date and time: 08/04/2023 (Friday) 11:43 AM | Photo Description: a man
stands on a pier with a ferris wheel in the background | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica Pier,
The eCenter, Character Drawings, Santa Monica Pier, ビーチ・サインズ&モア
-  | Date and time: 08/04/2023 (Friday) 11:46 AM | Photo Description: a man
stands on a pier with a seagull sitting on the railing | Locations: Santa Monica, Los
Angeles County, California, United States | Possible Nearby locations: Santa Monica,
Pacific Plunge, Inkie’s Scrambler, Fun 'N' Games, Pacific Wheel
-  | Date and time: 08/04/2023 (Friday) 11:52 AM | Photo Description: a man with
a backpack that says o'neill on it | Locations: Santa Monica, Los Angeles County,
California, United States | Possible Nearby locations: Coffee Bean & Tea Leaf, Japadog (at
Santa Monica Pier), Santa Monica Trapeze School, Pacific Park on the Santa Monica Pier,
Funnel Cakes
-  | Date and time: 08/04/2023 (Friday) 12:10 PM | Photo Description: a man poses
in front of the cheesecake factory | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
-  | Date and time: 08/04/2023 (Friday) 12:32 PM | Photo Description: a plate of
food with a napkin that says the cheesecake factory | Locations: Downtown, Santa Monica,
Los Angeles County, California, United States | Possible Nearby locations: Forever 21,
Tesla, Nike Santa Monica, Louis Vuitton Santa Monica Place, Pandora Jewelry
-  | Date and time: 08/04/2023 (Friday) 01:15 PM | Photo Description: a man
stands in front of a blue tesla model x | Locations: Downtown, Santa Monica, Los Angeles
County, California, United States | Possible Nearby locations: Forever 21, Tiffany & Co.,
Louis Vuitton Santa Monica Place, Pandora Jewelry, Johnny Was
-  | Date and time: 08/04/2023 (Friday) 05:03 PM | Photo Description: a green
trolley is parked in front of a gap store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Haagen-Dazs Ice Cream Shops,
Wetzel's Pretzels, Nike The Grove, Gap, Bar Verde
-  | Date and time: 08/04/2023 (Friday) 05:44 PM | Photo Description: a variety
of caramel apples are displayed in a store | Locations: La Brea, Central LA, Los Angeles,
Los Angeles County, California | Possible Nearby locations: Los Angeles, The Original
Farmers Market, The Dog Bakery - Fresh Baked Treats & Dog Birthday Cakes, Marconda's,
Littlejohn's English Toffee House & Fine Candies
-  | Date and time: 08/04/2023 (Friday) 06:01 PM | Photo Description: a man is
holding a scoop of ice cream in front of a sign that says " drinks " | Locations: Farmers
Market, La Brea, Central LA, Los Angeles, Los Angeles County | Possible Nearby locations:
Los Angeles, The Original Farmers Market, Littlejohn's English Toffee House & Fine
Candies, Hutchco Technologies, Marconda's
-  | Date and time: 08/04/2023 (Friday) 06:06 PM | Photo Description: cars are
parked in front of a ross store | Locations: 3rd / Ogden, La Brea, Central LA, Los
Angeles, Los Angeles County | Possible Nearby locations: A1 Locksmith & Keys, GapBody, 3rd
/ Ogden, 3rd & Ogden (Eastbound), Karsaz & Associates
-  | Date and time: 08/04/2023 (Friday) 09:33 PM | Photo Description: a hotel
room with a blue blanket on the bed | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, Kandoo Kitchen, Inland Faculty Medical Group
Inc, Pathway Healthcare
-  | Date and time: 08/04/2023 (Friday) 09:58 PM | Photo Description: two boxes
of food on a table with a fork | Locations: Eagle Rock, Northeast Los Angeles, Los
Angeles, Los Angeles County, California | Possible Nearby locations: Welcome Inn, North
East Los Angeles Hotel Owners Association, MV, Inland Faculty Medical Group Inc, Pathway
Healthcare
```

Generating post

Now we will use the TextGenerationModel from Vertex Palm API to submit above prompt and get the generated post. You can configure the level of randomness/creativeness of the generated text by setting the temperature, top_k and top_p, as explained in the comments and on the API docs.

from vertexai.language_models import TextGenerationModel
generation_model = TextGenerationModel.from_pretrained("text-bison")

def generate_text(prompt, temperature=1.0, 
                  top_p= 0.4, top_k=40, max_output_tokens=1024):
    parameters = {
        # Temperature controls the degree of randomness in token selection.
        "temperature": temperature,  
        # Tokens are selected from most probable to least until the sum 
        # of their probabilities equals the top_p value.
        "top_p": top_p,
        # A top_k of 1 means the selected token is the most probable 
        # among all tokens.
        "top_k": top_k,
        # Token limit determines the maximum amount of text output.
        "max_output_tokens": max_output_tokens,
    }

    generated_text = generation_model.predict(prompt=prompt, **parameters).text
    return generated_text

The output of Palm API is a generated post where descriptions are interleaved with placeholders, and the LLM decides to include the photo in the text. You can see below an example. Then I use regex to find the photo placeholders and replace them by the actual photo.

I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled of anxiety 
to leave the airport and get to visit the city.


I rented a car and drove to my hotel in Eagle Rock.
The hotel was nice and comfortable.


The next morning I went to Santa Monica Pier.
I had lunch at Bubba Gump Shrimp Co. and then walked around the pier.
, , , 

In the afternoon I went to the Cheesecake Factory.
I had a delicious meal and then went shopping at the mall.
, , 

In the evening I went to Farmers Market.
I bought some caramel apples and ice cream.
, , 

It was a long day but I had a lot of fun.
I can't wait to explore more of Los Angeles tomorrow.

You will check below examples of generated posts of my two trips that Photo StoryTelling pipeline was able to generate. The ouputs of LLMs are non-deterministic and vary in terms of quality and fidelity to the facts reported in the prompt. In order to generate different responses, you might want to try different configurations for temperature, top_p and top_k, or simply send a new request to TextGenerationModel.

AI Generated post about my trip to Los Angeles

album_context = """I flew to Los Angeles for a short trip, 
                and the album contains the photos 
                from the day I arrived there. 
                The man in those photos is myself.
                """
blog_prompt_sf = generate_prompt(album_context, photos_info_concat_sf)
generated_post = generate_text(prompt=blog_prompt)

I was happy finally arriving to my destination, Los Angeles.
While I went into US Customs my heart was filled with anxiety to leave the airport and visit the city.

I took a taxi to my hotel in Eagle Rock. The room was small but cozy.

After dropping off my bags, I went out to explore the city. I first drove to Santa Monica Pier.
I walked along the pier and took in the sights and sounds of the ocean. I even rode the Ferris wheel!

After spending some time at the pier, I went to the Cheesecake Factory for lunch. The food was delicious and the service was excellent.

In the afternoon, I went to the Farmers Market in La Brea. I bought some fresh produce and flowers. I also had some ice cream.

I ended my day by going back to my hotel room and relaxing.
I was tired from all the exploring I had done, but I was also excited to see what the next day would bring.

Generated post about my trip to San Francisco

album_context_sf = """Me and my wife travelled to San Francisco. 
          We spent a single day there. We rented a car in SF and visited. 
          many places during that day. 
          The man in the pictures is myself and the woman is my wife.
          """
blog_prompt_sf = generate_prompt(album_context_sf, photos_info_concat_sf)
generated_post_sf = generate_text(prompt=blog_prompt)

Me and my wife travelled to San Francisco. We spent a single day there. We rented a car in SF and visited many places during that day.

We started our day at the San Francisco International Airport. We were excited to finally be in San Francisco and ready to explore the city.

We drove to Russian Hill and found a parking spot. We walked around the neighborhood and took in the sights and sounds.

We walked over to the Golden Gate Bridge and took some pictures. It was a beautiful day and the bridge was stunning.

We sat on a bench and watched the boats go by. It was so peaceful and relaxing.

We walked back to our car and drove to the Marina District. We walked around the lake and enjoyed the scenery.

We stopped for dinner at a restaurant in the Marina District. The food was delicious and the atmosphere was lively.

We walked around the Marina District after dinner and took some more pictures.

We drove to Fisherman’s Wharf and walked around the shops and restaurants. We had some delicious seafood for dinner.

We walked around Fisherman’s Wharf after dinner and took some more pictures. We really enjoyed our time in this neighborhood.

We drove to Fort Mason and walked around the Ghirardelli Square. We had some delicious chocolate and ice cream.

We walked around Fort Mason after dinner and took some more pictures.

We had a wonderful time in San Francisco and we can’t wait to come back again soon.

Conclusion

If you reached this far in the post, you understand how powerful can it be to combine data extraction (e.g. EXIF metadata from images), data augmentation (e.g. locations from geographic coordinates using Google Maps API), prompt engineering (few-shot learning) and generative AI (Vertex Imagen and Palm API) to produce interesting content, in this case, a blog post describing an album of photos.

Hope you enjoyed this project and want to play with it, maybe using your own photos to see the blog post you get about your moments!

Photo StoryTelling – Leveraging Generative AI and Google APIs to compose posts from your photo… was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Stable Cascade Prompt Following Is Amazing — This Model Has Huge Potential — High Resolutions Uses Lesser VRAM

February 14, 2024

Product Management

An inside look into product leadership coaching – Kate Leto (Product Leadership Coach)

February 14, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Rick Beato: The Voice of Rock: Matt Pinfield Returns

Gemini CLI + VS Code: Native diffing and context-aware workflows

How to Become a Project Manager (And Get a Job)

Trending Tags

Photo StoryTelling – Leveraging Generative AI and Google APIs to compose posts from your photo…

Photo StoryTelling – Leveraging Generative AI and Google APIs to compose posts from your photo albums

Setup

Processing photos

Extracting location and nearby places using Google Maps APIs

Photo captioning using Generative AI with Vertex Imagen

Designing the prompt

Prompt Engineering

Generating the prompt

Generating post

AI Generated post about my trip to Los Angeles

Generated post about my trip to San Francisco

Conclusion

Leave a Reply Cancel reply

Previous Post

Stable Cascade Prompt Following Is Amazing — This Model Has Huge Potential — High Resolutions Uses Lesser VRAM

Next Post

An inside look into product leadership coaching – Kate Leto (Product Leadership Coach)

Photo StoryTelling – Leveraging Generative AI and Google APIs to compose posts from your photo…

Photo StoryTelling – Leveraging Generative AI and Google APIs to compose posts from your photo albums

Setup

Processing photos

Extracting location and nearby places using Google Maps APIs

Photo captioning using Generative AI with Vertex Imagen

Designing the prompt

Prompt Engineering

Generating the prompt

Generating post

AI Generated post about my trip to Los Angeles

Generated post about my trip to San Francisco

Conclusion

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts