Fashion Sketch2Runway with Gemini 2.0 Flash and Veo 3

fashion-sketch2runway-with-gemini-2.0-flash-and-veo-3

Fashion Design with Google Cloud GenAI

By Margaret Maynard-Reid & Nitin Tiwari, AI GDEs

In this blog post, we share with you the Fashion Sketch2Runway project that showcases Google Cloud Generative AI: Gemini 2.0 Flash and Veo 3 for fashion design. Sketch2Runway lets user draw a fashion sketch, turn it into a photo, edit it before turning it into a stunning video.

Image by the author (Margaret M.)

You will need access to Gemini 2.0 with Google AI Studio API key, and access to the video generation model Veo 3, a project created in Google Cloud Vertex AI. Follow along this tutorial by cloning the code in the project GitHub repo.

This tutorial is for developers interested in making creative apps with Google Cloud Generative AI models. The repo includes a Flask web app and a Colab notebook. This post contains a demo of the Flask web app and a step-by-step tutorial of the Colab.

The project is useful whether you are a professional fashion designer, an artist with a creative flair or someone who learns to draw. Try it out to see your sketches transformed to photorealistic images and then come to life with stunning videos.

https://medium.com/media/056adeb9bacde1dbd7cb6331c82525f6/href

The Flask web app

Follow the instructions on GitHub for cloning the code, install required packages and configure the .env file. Start the web server with `flask run’, then click on the URL http://127.0.0.1:5000 to bring up the web app UI

On a high level, there are 4 main screens in the web app UI:

  1. Sketch
  2. Generate Image
  3. Edit image
  4. Generate video

Image generation and editing are done with Gemini 2.0 Flash while video generation is done with Veo 3. Images and videos are saved to Google Cloud Buckets.

Sketch

You can draw with your mouse, with your finger on the trackpad or with a professional artist’s drawing tablet. You can choose a color, change the stroke width, clear all (the canvas/prompt), or click on generate to convert the sketch to a photo realistic image.

Here I draw a dress with yellow polka dots and entered the prompt: “Convert the sketch to a photorealistic full body portrait of a fashion model wearing a chic dress”.

Sketch, image by the author (Margaret M.)

Generate Image

An image gets generated with Gemini 2.0 Flash, with the input of the sketch image plus the prompt.

Gemini generated image, by the author (Margaret M.)

Note: for this example we are using the scenario of fashion sketch to photo realistic image since we will be creating a video afterwards. You could also convert the sketch to other styles, for example, water color painting.

Edit Image

Enter a prompt to change the look: for example, add an accessory or change the garment color.

In this example, I enter the prompt: “Add a beautiful matching bracelet that goes well with her dress”.

Gemini edited image, by the author (Margaret M.)

Note: I can go back to edit the image multiple times before deciding to generate a video.

Generate Video with Veo 3

Then with a simple prompt: “She walks forward gracefully with a smile. Add runway music”, I create a stunning runway video with music with Veo 3.

https://medium.com/media/93098e97496aacff6b28b65bd009560e/href

The Colab notebook tutorial

If you’re a developer who wants to understand the tech behind this project with minimal set up, we’ve got a Colab notebook for you to quickly try it out:

Open the Sketch2Runway.ipynb Google Colab notebook, and follow along. To keep the blog concise, we are only including important steps, so please refer to the notebook for the complete code.

Pre-requisites: Before we begin, ensure you have access to Gemini API key and an active GCP account with a billing project set up and Vertex AI API enabled.

Step 1: Install Google Gen AI SDK

%pip install -U -q google-genai

Step 2: Configure Gemini API key and GCS Project ID

Here, we configure the Gemini API key along with the GCS Project ID and a GCS Bucket where our images and videos will be stored.

# Configure Gemini API key.
GEMINI_API_KEY = userdata.get('GOOGLE_API_KEY')
# Configure GCS Bucket.
GCS_BUCKET = userdata.get('GCS_BUCKET')
# Configure GCP Project ID.
GCP_PROJECT_ID = userdata.get('GCP_PROJECT_ID')
image_client = genai.Client(api_key=GEMINI_API_KEY)
# Configure model.
MODEL_ID = "gemini-2.0-flash-exp-image-generation"

Step 3: Draw a sketch on canvas pad

In this step, we integrate HTML and JavaScript code to build an interactive canvas sketch pad for drawing outfit designs.

The sketch pad offers options to choose different pen colors and thicknesses for a smoother drawing experience. Once the sketch is complete, the image can be saved directly to the Colab environment. As an example, we’ve drawn a simple pink top paired with blue jeans.

Image by the author (Nitin Tiwari)

Step 4: Generate image from sketch

Now, we are ready to transform our sketch into a photorealistic image of a fashion model wearing that outfit. We use the Gemini 2.0 Flash Image Generation model that takes an image and a text prompt as inputs.

Prompt: “Convert this input sketch of an outfit into a catalogue photograph that shows a beautiful woman model wearing it.”

Note the response modality is `[‘TEXT’, ‘IMAGE’]`.

input_image = 'my_sketch.png' # @param {type : 'string'}


prompt = "Convert this input sketch of an outfit into a catalogue photograph that shows a beautiful woman model wearing it." # @param {type : 'string'}


img = Image.open(input_image)


# Run model.
response = image_client.models.generate_content(
model=MODEL_ID,
contents=[prompt, img],
config=types.GenerateContentConfig(
temperature=0.5,
safety_settings=safety_settings,
response_modalities=['TEXT', 'IMAGE']
)
)


for part in response.candidates[0].content.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save('generated_image.png')

Here’s the output of the generated image, which looks quite realistic:

Image by the author (Nitin Tiwari)

Step 5: Edit the generated image

The Gemini 2.0 Flash Image Generation model also offers powerful image editing capabilities that preserve the original subject. For instance, if you want to make adjustments such as adding jewelry or accessories: a necklace, or a watch, you can simply provide the previously generated image along with a new text prompt.

Prompt: Add a white bead necklace.

Image by the author (Nitin Tiwari)

The edited image looks impressive, with the necklace naturally worn around the model’s neck.

Step 6: Upload image on GCS Bucket

Once you’re satisfied with the final image, the next step is to upload it to your GCS bucket. Since Veo 3 currently only accepts image input from Google Cloud Storage for image-to-video generation, this upload is a necessary part of the workflow.

now = datetime.datetime.now()
datetime_str = now.strftime("%Y-%m-%d_%H-%M-%S")


# Upload the image on GCS bucket
GCS_BUCKET = userdata.get('GCS_BUCKET')
storage_client = storage.Client()


bucket = storage_client.bucket(GCS_BUCKET)
local_file_name = 'edited_image.png'
# Define the destination upload it to a folder 'images/' and file name
destination_blob_name_with_datetime = f'images/{datetime_str}_edited_image.png'


blob = bucket.blob(destination_blob_name_with_datetime)
blob.upload_from_filename(local_file_name)
print("Image uploaded on GCS bucket.")

Step 7: Generate video with Veo 3

This is the final step, and we’re all set to generate video with Veo 3. Interestingly, Veo 3 also supports native audio generation such as adding background music or natural sound suitable to the video’s subject.

prompt = "The fashion model walks toward the camera with a smile."  # @param {type: 'string'}


image_gcs = (
f"gs://{GCS_BUCKET}/{destination_blob_name_with_datetime}"
)
output_gcs = f"gs://{GCS_BUCKET}/videos"
enhance_prompt = True # @param {type: 'boolean'}
generate_audio = True # @param {type: 'boolean'}




operation = video_client.models.generate_videos(
model=video_model,
prompt=prompt,
image=types.Image(
gcs_uri=image_gcs,
mime_type="image/png",
),
config=types.GenerateVideosConfig(
aspect_ratio="16:9",
output_gcs_uri=output_gcs,
number_of_videos=1,
duration_seconds=8,
person_generation="allow_adult",
enhance_prompt=enhance_prompt,
generate_audio=generate_audio,
),
)


while not operation.done:
operation = video_client.operations.get(operation)
time.sleep(6)
print(operation)

Here’s the final output video generated:

Image by the author (Nitin Tiwari)

That’s all it takes. In just eight easy steps, you can bring your fashion sketches to life by creating vibrant models wearing your original designs.

We hope you enjoyed this tutorial. Check out the GitHub repo, try it out the code yourself, and watch our video tutorial. We are excited to see what you will build with Gemini 2.0 Flash and Veo 3. Stay tuned for more tutorials like these.

Project Inspiration

Nitin: I’ve worked on several AI projects that showcase Google’s powerful suite of models for image and video generation.

An example was Sketch2Paint— a fun weekend project where I built an app that transforms sketches into vibrant paintings using the Gemini 2.0 Flash Image Generation model. I later evolved it into Sketch2Video (enabling video generation by integrating Veo 3).

I teamed up with Margaret, who brings deep expertise in the fashion space, and we collaborated on this a project to add real-world value

Margaret: I’m very much into fashion design. I have worked on quite a few Google Cloud GenAI projects recently, for example, Fashion with Imagen & Veo: Part 1 (Blog | Video tutorial & YouTube shorts). It’s been a great collaboration with Nitin: we each bring our own expertise and experience for this project as part of the AI Sprint hosted by the Google AI Developer Programs Team.

Acknowledgment

Google Developer Expert Logo

We thank the Google AI Developers Program Team for providing us with Google Cloud credits that helped facilitate this project.


Fashion Sketch2Runway with Gemini 2.0 Flash and Veo 3 was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
the-most-common-skill-gap-in-aspiring-product-managers-—-and-how-to-fix-it

The most common skill gap in aspiring product managers — and how to fix it

Next Post
machine-learning-fundamentals:-boosting

Machine Learning Fundamentals: boosting

Related Posts