Using Gemini 1.5 Pro to create video trailers

using-gemini-1.5-pro-to-create-video-trailers

Taking advantage of the Gemini’s multi-modal input to create trailers for any videos.

This year on February 15, Google announced the release of Gemini 1.5, this new version brought many improvements, and on top of impressive improvements in the language domain, this model can process a huge input context of up to 1 million tokens, to make it even better it is was trained as a multimodal model, this means that is can natively process text, images, audio or video. This combination of different input types and huge context got me excited with the opportunity to process long videos, so I revisited a previous project of mine to test Gemini’s new capabilities.

From Gemini’s 1.5 official announcement:

An AI model’s “context window” is made up of tokens, which are the building blocks used for processing information. Tokens can be entire parts or subsections of words, images, videos, audio or code. The bigger a model’s context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful.

Through a series of machine learning innovations, we’ve increased 1.5 Pro’s context window capacity far beyond the original 32,000 tokens for Gemini 1.0. We can now run up to 1 million tokens in production.

This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens.

Gemini 1.5 Pro context length comparison

In a previous project “AI trailer“, which you can learn more about in this blog post: “Creating Movie Trailers With AI“, I built a system able to use AI to create trailers from movies, with the new Gemini capabilities I took the opportunity to improve it, previously the user would retrieve the movie’s plot from IMDB but now we can simply ask Gemini to create a plot for a movie or any video, with this the user can create trailers for any kind of videos, not only movies. The video abilities of Gemini for this context will be the focus of this article.

Here you can find the trailer generation high-level workflow, but in this article, the focus is to explore how Gemini 1.5 deals with videos and improves the plot generation part.

Trailer generation workflow, Gemini replaces the 2nd step

Before jumping into the experimentation and Gemini’s results let’s look at a couple of examples that I created with the latest code.

https://medium.com/media/f43699a92d9091efaea3a31166e4f9c7/hrefhttps://medium.com/media/2626c439375617c1ca1d6080d7cc8e64/href

As I mentioned before, Gemini fits the project at the plot creation part. While taking the plot from IMDB works well for many movies, we might want to create plots for other kinds of videos that are not part of the IMDB database, like the ones from the examples above, aside from that, using an LLM has additional advantages, we can tune the prompt to ask the model to create shorter or longer plots, give it an exciting or mysterious touch, the options are limitless.

I only uploaded two examples to YouTube but during my experiments, I tested many different types of videos, I will go through some of them that I think will paint an interesting picture of Gemini’s flexibility and overall good performance.

Short animation

Original video: The Fermi Paradox — Where Are All The Aliens? (1/2)

Generated trailer: The Fermi Paradox — AI generated trailer

Prompting Gemini to create a plot for a short animation

Here I was mainly interested in how Gemini would be able to deal with animation videos, each animation has a different style and it can quickly get out of the known distribution. Another challenge was being able to provide a meaningful plot text for a short video without giving away too much information or too little.

The results were quite good, it starts with a hookup line and finishes with an invitation to watch the full video, the other paragraphs were quite good, they contain a brief description of the main topic (The Fermi Paradox) and also touch the other main topics of the video, like the Great filter and the Kardashev Scale.

20-minute museum tour

Original video: Natural History Museum (New Dinosaur Exhibit) Walking Tour in 4K — Washington, D.C.

Generated trailer: Natural History Museum — AI generated trailer

Prompting Gemini to create a plot for a 20-minute museum tour

The first thing that caught my attention was that Gemini was able to find out the name of the museum (Smithsonian National Museum of Natural History) but nowhere in the video it had this information, so it seems that the model had enough knowledge to look at the images and understand where it was filmed. The model was also able to properly identify the main exhibitions toured during the video. Once again it generated very good introduction and ending paragraphs.

40-minute silent black and white movie

Original video: Sherlock Jr. (1924) — Full Movie

Prompting Gemini to create a plot for a 40-minute silent black and white movie

The last experiment that I will mention here was probably the most challenging one, the 44-minute silent movie “Sherlock Jr. (1924)”. The model starts with a very good summary of the movie and then proceeds to create the trailer, each paragraph of the trailer is consistent with the initial summary which is already a good point, one could argue that this plot spoiled the movie, but in general it has a very good format.

Project ideas

Combining Gemini’s powerful video capabilities with the trailer creation setup can be quite useful for content creators who want to promote their content efficiently for many different target audiences, if you want to promote on short content platforms like Instagram or TikTok you could ask Gemini to make the trailer shorter, you could also ask it to target younger or technical audiences differently, since all the process is mostly automated this would be quite fast.

Keep learning

If you want to learn more about the trailer creation project that I used here, make sure to check my blog post that explains it in more detail.

Creating Movie Trailers With AI

If you want to learn how to use generative AI for other contexts like video clip creation, check out this blog post where I used Generative AI to create lo-fi music tracks with animated visuals.

How to generate music clips with AI


Using Gemini 1.5 Pro to create video trailers was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
who-should-block-ai-bots?

Who Should Block AI Bots?

Next Post
ai-chef:-turning-food-photos-into-recipes-with-gemini-vision-pro-in-colab

AI Chef: Turning Food Photos into Recipes with Gemini Vision Pro in Colab

Related Posts