learn Archives - ProdSens.live https://prodsens.live/tag/learn/ News for Project Managers - PMI Mon, 08 Apr 2024 08:20:49 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 https://prodsens.live/wp-content/uploads/2022/09/prod.png learn Archives - ProdSens.live https://prodsens.live/tag/learn/ 32 32 Tune Gemini Pro in Google AI Studio or with the Gemini API https://prodsens.live/2024/04/08/tune-gemini-pro-in-google-ai-studio-or-gemini-api-html/?utm_source=rss&utm_medium=rss&utm_campaign=tune-gemini-pro-in-google-ai-studio-or-gemini-api-html https://prodsens.live/2024/04/08/tune-gemini-pro-in-google-ai-studio-or-gemini-api-html/#respond Mon, 08 Apr 2024 08:20:49 +0000 https://prodsens.live/2024/04/08/tune-gemini-pro-in-google-ai-studio-or-gemini-api-html/ tune-gemini-pro-in-google-ai-studio-or-with-the-gemini-api

Posted by Cher Hu, Product Manager and Saravanan Ganesh, Software Engineer for Gemini API The following post was…

The post Tune Gemini Pro in Google AI Studio or with the Gemini API appeared first on ProdSens.live.

]]>
tune-gemini-pro-in-google-ai-studio-or-with-the-gemini-api

Posted by Cher Hu, Product Manager and Saravanan Ganesh, Software Engineer for Gemini API

The following post was originally published in October 2023. Today, we’ve updated the post to share how you can easily tune Gemini models in Google AI Studio or with the Gemini API.

Last year, we launched Gemini 1.0 Pro, our mid-sized multimodal model optimized for scaling across a wide range of tasks. And with 1.5 Pro this year, we demonstrated the possibilities of what large language models can do with an experimental 1M context window. Now, to quickly and easily customize the generally available Gemini 1.0 Pro model (text) for your specific needs, we’ve added Gemini Tuning to Google AI Studio and the Gemini API.

What is tuning?

Developers often require higher quality output for custom use cases than what can be achieved through few-shot prompting. Tuning improves on this technique by further training the base model on many more task-specific examples—so many that they can’t all fit in the prompt.

Fine-tuning vs. Parameter Efficient Tuning

You may have heard about classic “fine-tuning” of models. This is where a pre-trained model is adapted to a particular task by training it on a smaller set of task-specific labeled data. But with today’s LLMs and their huge number of parameters, fine-tuning is complex: it requires machine learning expertise, lots of data, and lots of compute.

Tuning in Google AI Studio uses a technique called Parameter Efficient Tuning (PET) to produce higher-quality customized models with lower latency compared to few-shot prompting and without the additional costs and complexity of traditional fine-tuning. In addition, PET produces high quality models with as little as a few hundred data points, reducing the burden of data collection for the developer.

Why tuning?

Tuning enables you to customize Gemini models with your own data to perform better for niche tasks while also reducing the context size of prompts and latency of the response. Developers can use tuning for a variety of use cases including but not limited to:

  • Classification: Run natural language tasks like classifying your data into predefined categories, without needing tons of manual work or tools.
  • Information extraction: Extract structured information from unstructured data sources to support downstream tasks within your product.
  • Structured output generation: Generate structured data, such as tables, quickly and easily.
  • Critique Models: Use tuning to create critique models to evaluate output from other models.

Get started quickly with Google AI Studio

1. Create a tuned model

It’s easy to tune models in Google AI Studio. This removes any need for engineering expertise to build custom models. Start by selecting “New tuned model” in the menu bar on the left.

moving image showing how to create a tuned model in Google AI Studio by opening 'New Tuned Model' from the menu

2. Select data for tuning

You can tune your model from an existing structured prompt or import data from Google Sheets or a CSV file. You can get started with as few as 20 examples and to get the best performance, we recommend providing a dataset of at least 100 examples.

moving image showing how to select data for tuning in Google AI Studio by importing data

3. View your tuned model

View your tuning progress in your library. Once the model has finished tuning, you can view the details by clicking on your model. Start running your tuned model through a structured or freeform prompt.

moving image showing how to view your tuned model in Google AI Studio by importing data

4. Run your tuned model anytime

You can also access your newly tuned model by creating a new structured or freeform prompt and selecting your tuned model from the list of available models.

moving image demonstrating what it looks like to run your tuned model in Google AI Studio after importing data

Tuning with the Gemini API

Google AI Studio is the fastest and easiest way to start tuning Gemini models. You can also access the feature via the Gemini API by passing the training data in the API request when creating a tuned model. Learn more about how to get started here.

We’re excited about the possibilities that tuning opens up for developers and can’t wait to see what you build with the feature. If you’ve got some ideas or use cases brewing, share them with us on X (formerly known as Twitter) or Linkedin.

The post Tune Gemini Pro in Google AI Studio or with the Gemini API appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/04/08/tune-gemini-pro-in-google-ai-studio-or-gemini-api-html/feed/ 0
ML Olympiad 2024: Globally Distributed ML Competitions by Google ML Community https://prodsens.live/2024/04/08/ml-olympiad-2024-globally-distributed-ml-competitions-by-google-ml-community-html/?utm_source=rss&utm_medium=rss&utm_campaign=ml-olympiad-2024-globally-distributed-ml-competitions-by-google-ml-community-html https://prodsens.live/2024/04/08/ml-olympiad-2024-globally-distributed-ml-competitions-by-google-ml-community-html/#respond Mon, 08 Apr 2024 06:20:29 +0000 https://prodsens.live/2024/04/08/ml-olympiad-2024-globally-distributed-ml-competitions-by-google-ml-community-html/ ml-olympiad-2024:-globally-distributed-ml-competitions-by-google-ml-community

Posted by Bitnoori Keum – DevRel Community Manager The ML Olympiad consists of Kaggle Community Competitions organized by…

The post ML Olympiad 2024: Globally Distributed ML Competitions by Google ML Community appeared first on ProdSens.live.

]]>
ml-olympiad-2024:-globally-distributed-ml-competitions-by-google-ml-community

Posted by Bitnoori Keum – DevRel Community Manager

The ML Olympiad consists of Kaggle Community Competitions organized by ML GDE, TFUG, and other ML communities, aiming to provide developers with opportunities to learn and practice machine learning. Following successful rounds in 2022 and 2023, the third round has now launched with support from Google for Developers for each competition host. Over the last two rounds, 605 teams participated in 32 competitions, generating 105 discussions and 170 notebooks. We encourage you to join this round to gain hands-on experience with machine learning and tackle real-world challenges.

ML Olympiad Community Competitions

Over 20 ML Olympiad community competitions are currently open. Visit the ML Olympiad page to participate.

Smoking Detection in Patients

Predict smoking status with bio-signal ML models
Host: Rishiraj Acharya (AI/ML GDE) / TFUG Kolkata

TurtleVision Challenge

Develop a classification model to distinguish between jellyfish and plastic pollution in ocean imagery
Host: Anas Lahdhiri / MLAct

Detect hallucinations in LLMs

Detect which answers provided by a Mistral 7B instruct model are most likely hallucinations
Host: Luca Massaron (AI/ML GDE)

ZeroWasteEats

Find ML solutions to reduce food wastage
Host: Anushka Raj / TFUG Hajipur

Predicting Wellness

Predict the percentage of body fat in men using multiple regression methods
Host: Ankit Kumar Verma / TFUG Prayagraj

Offbeats Edition

Build a regression model to predict the age of the crab
Host: Ayush Morbar / Offbeats Byte Labs

Nashik Weather

Predict the condition of weather in Nashik, India
Host: TFUG Nashik

Predicting Earthquake Damage

Predict the level of damage to buildings caused by earthquake based on aspects of building location and construction
Host: Usha Rengaraju

Forecasting Bangladesh’s Weather

Predict the rainy day; amount of rainfall, and average temperature for a particular day.
Host: TFUG Bangladesh (Dhaka)

CO2 Emissions Prediction Challenge

Predict CO2 emissions per capita for 2030 using global development indicators
Host: Md Shahriar Azad Evan, Shuvro Pal / TFUG North Bengal

AI & ML Malaysia

Predict loan approval status
Host: Kuan Hoong (AI/ML GDE) / Artificial Intelligence & Machine Learning Malaysia User Group

Sustainable Urban Living

Predict the habitability score of properties
Host: Ashwin Raj / BeyondML

Toxic Language (PTBR) Detection

(in local language)
Classify Brazilian Portuguese tweets in one of the two classes: toxics or non toxics.
Host: Mikaeri Ohana, Pedro Gengo, Vinicius F. Caridá (AI/ML GDE)

Improving disaster response

Predict the humanitarian aid contributions as a response to disasters occurs in the world
Host: Yara Armel Desire / TFUG Abidjan

Urban Traffic Density

Develop predictive models to estimate the traffic density in urban areas
Host: Kartikey Rawat / TFUG Durg

Know Your Customer Opinion

Classify each customer opinion into several Likert scale
Host: TFUG Surabaya

Forecasting India’s Weather

Predict the temperature of the particular month
Host: Mohammed Moinuddin / TFUG Hyderabad

Classification Champ

Develop classification models to predict tumor malignancy
Host: TFUG Bhopal

AI-Powered Job Description Generator

Build a system that employs Generative AI and a chatbot interface to automatically generate job descriptions
Host: Akaash Tripathi / TFUG Ghaziabad

Machine Translation French-Wolof

Develop robust algorithms or models capable of accurately translating French sentences into Wolof.
Host: GalsenAI

Water Mapping using Satellite Imagery

Water mapping using satellite imagery and deep learning for dam drought detection
Host: Taha Bouhsine / ML Nomads

To see all the community competitions around the ML Olympiad, search “ML Olympiad” on Kaggle and look for further related posts on social media using #MLOlympiad. Browse through the available competitions and participate in those that interest you!

The post ML Olympiad 2024: Globally Distributed ML Competitions by Google ML Community appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/04/08/ml-olympiad-2024-globally-distributed-ml-competitions-by-google-ml-community-html/feed/ 0
Build with Google AI video series, Season 2: more AI patterns https://prodsens.live/2024/04/08/build-with-google-ai-video-series-season-2-html/?utm_source=rss&utm_medium=rss&utm_campaign=build-with-google-ai-video-series-season-2-html https://prodsens.live/2024/04/08/build-with-google-ai-video-series-season-2-html/#respond Mon, 08 Apr 2024 06:20:27 +0000 https://prodsens.live/2024/04/08/build-with-google-ai-video-series-season-2-html/ build-with-google-ai-video-series,-season-2:-more-ai-patterns

Posted by Joe Fernandez – Google AI Developer Relations We are off to another exciting year in Artificial…

The post Build with Google AI video series, Season 2: more AI patterns appeared first on ProdSens.live.

]]>
build-with-google-ai-video-series,-season-2:-more-ai-patterns

Posted by Joe Fernandez – Google AI Developer Relations

We are off to another exciting year in Artificial Intelligence (AI) and it’s time to build more applications with Google AI technology! The Build with Google AI video series is for developers looking to build helpful and practical applications with AI. We focus on useful code projects you can implement and extend in an afternoon to bring the power of artificial intelligence into your workflow or organization. Our first season received over 100,000 views in six weeks! We are glad to see that so many of you liked the series, and we are excited to bring you even more Google AI application projects.

Today, we are launching Season 2 of the Build with Google AI series, featuring projects built with Google’s Gemini API technology. The launch of Gemini and the Gemini API has brought developers even more advanced AI capabilities, including advanced reasoning, content generation, information synthesis, and image interpretation. Our goal with this season is to help you put those capabilities to work for you and your organizations.

AI app patterns

The Build with Google AI series features practical application code projects created for you to use and customize. However, we know that you are the best judge of what you or your organization needs to solve day-to-day problems and get work done. That’s why each application we feature in this series is also meant to be used as an AI pattern. You can extend the applications immediately to solve problems and provide value for your business, and these applications show you a general coding pattern for getting value out of AI technology.

For this second season of this series, we show how you can leverage Google’s Gemini AI model capabilities for applications. Here’s what’s coming up:

  • AI Slides Reviewer with Google Workspace (3/20) – Image interpretation is one of the Gemini model’s biggest new features. We show you how to make practical use of it with a presentation review app for Google Slides that you can customize with your organization’s guidelines and recommendations. 
  • AI Flutter Code Agent with Gemini API (3/27) – Code generation was the most popular episode from last season, so we are digging deeper into this topic. Build a code generation extension to write Flutter code and explore user interface designs and looks with just a few words of description.
  • AI Data Agent with Google Cloud (4/3) – Why write code to extract data when you can just ask for it? Build a web application that uses Gemini API’s Function Calling feature to translate questions into code calls and data into plain language answers.

Season 1 upgraded to Gemini API: We’ve upgraded Season 1 tutorials and code projects to use the Gemini API so you can take advantage of the latest in generative AI technology from Google. Check them out!

Learn from the developers

Just like last season, we’ll go back to the studio to talk with coders who built these projects so they can share what they learned along the way. How do you make the Gemini model review an entire presentation? What’s the most effective way to generate code with AI? How do you get a database to answer questions with the Gemini API? Get insights into coding with AI to jump start your own development project.

New home for AI developer content

Developers interested in Google’s AI offerings now have a new home at ai.google.dev. There you’ll find a wealth of resources for building with AI from Google, including the Build with Google AI tutorials. Stay tuned for much more content through the rest of the year.

We are excited to bring you the second season of Build with Google AIcheck out Season 2 right now! Use those video comments to let us know what you think and tell us what you’d like to see in future episodes.

Keep learning! Keep building!

The post Build with Google AI video series, Season 2: more AI patterns appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/04/08/build-with-google-ai-video-series-season-2-html/feed/ 0
Blockchain explained easy https://prodsens.live/2024/04/06/blockchain-explained-easy/?utm_source=rss&utm_medium=rss&utm_campaign=blockchain-explained-easy https://prodsens.live/2024/04/06/blockchain-explained-easy/#respond Sat, 06 Apr 2024 17:20:50 +0000 https://prodsens.live/2024/04/06/blockchain-explained-easy/ blockchain-explained-easy

You’ve probably heard about blockchain, right? It can seem pretty confusing though.. Don’t worry anymore! In this article,…

The post Blockchain explained easy appeared first on ProdSens.live.

]]>
blockchain-explained-easy

You’ve probably heard about blockchain, right? It can seem pretty confusing though.. Don’t worry anymore! In this article, we’ll break down the important stuff about blockchain for beginners!

Key concepts

You have to remember that blockchain is: distributed, immutable, transparent and reliable. Let’s explain a bit more about each of these concepts:

Distributive Nature

This means it is duplicated thousands of times across different computers worldwide. Each copy in blockchain is called a node, and they all work together to maintain the information consistency. So – the data is spread out, making it harder to manipulate it or lose it.

Immutability

Once something is written into the blockchain, it can’t be changed or erased. This is immutability. Each piece of data is linked to the previous one in a chain, and any attempt to alter one block would break the chain, alerting everyone in the network!

Transparency

Every transaction that occurs on the blockchain is visible to anyone. This builds trust among users, as they can verify the accuracy of transactions without relying on a central authority.

Reliability: Consensus Mechanism

Decisions need to be made collectively by all the nodes. This is where the consensus mechanism comes into play. Different blockchain networks use various methods, such as Proof of Work (PoW) or Proof of Stake (PoS), to ensure everyone agrees on the state of the blockchain

The Block

This is the fundamental unit of a blockchain. Essentially blockchain is a sequence of linked blocks each holding a piece of information

The key components of a block in a blockchain are the block header and the block body
The block header is the portion of a block that contains information about the block itself (block metadata), typically including a timestamp, a hash representation of the block data, the hash of the previous block’s header, and a cryptographic nonce.
The body of a block contains transaction records including transaction counter and the block size.

How blocks are chained together

Blocks are records linked together by cryptography in a blockchain. This connection is achieved with hash functions. This means that the same input gets always the same output but the minor change in the input leads to change completely the output. This is called avalanche effect.

Each block has a block hash. The hash of the previous block is used to produce the hash value of the next block. The next block is “chained” with its prior block, reinforcing the integrity of all the previous blocks that came before.

Why block size is important

The block size determines the amount of data that can be included in a single block which ultimately affects the speed and efficiency of the blockchain network.
While some argue that larger block sizes are necessary to accommodate a growing number of transactions, others believe that smaller blocks are better suited for maintaining the decentralization and security of the network.
Let’s see some of the main differences in blocks:

Smaller block size Larger block size
each block contains fewer transactions more transactions can be processed in a single block
are more secure and decentralized because they require less storage space and computational power, making it easier for smaller miners to participate in the network. require more storage space and computational power, which can make it difficult for smaller miners to participate in the network
longer transaction times and higher fees reduced transaction fees and speeded up transaction times

To sum up the blockchain is distributed, immutable, transparent, and reliable. The fundamental unit is the Block and these blocks are linked using cryptographic hash functions, ensuring data integrity through a chain of blocks.

Sources:
https://academy.binance.com/en/articles/what-is-blockchain-and-how-does-it-work
https://www.theblock.co/learn/245697/what-are-blocks-in-a-blockchain https://bitcoin.org/bitcoin.pdf
https://www.theblock.co/learn/245697/what-are-blocks-in-a-blockchain
https://info.etherscan.com/exploring-block-details-page/ https://csrc.nist.gov/glossary/term/block_header.
https://www.geeksforgeeks.org/blockchain-chaining-blocks/
https://www.researchgate.net/publication/309983377_Electronic_Voting_Service_Using_Block-Chain#pf3
https://fastercapital.com/content/Block-size–The-Impact-of-Block-Size-on-Cryptocurrency-Block-Headers.html

The post Blockchain explained easy appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/04/06/blockchain-explained-easy/feed/ 0
Large Language Models On-Device with MediaPipe and TensorFlow Lite https://prodsens.live/2024/03/07/running-large-language-models-on-device-with-mediapipe-andtensorflow-lite-html/?utm_source=rss&utm_medium=rss&utm_campaign=running-large-language-models-on-device-with-mediapipe-andtensorflow-lite-html https://prodsens.live/2024/03/07/running-large-language-models-on-device-with-mediapipe-andtensorflow-lite-html/#respond Thu, 07 Mar 2024 15:20:54 +0000 https://prodsens.live/2024/03/07/running-large-language-models-on-device-with-mediapipe-andtensorflow-lite-html/ large-language-models-on-device-with-mediapipe-and-tensorflow-lite

Posted by Mark Sherwood – Senior Product Manager and Juhyun Lee – Staff Software Engineer TensorFlow Lite has…

The post Large Language Models On-Device with MediaPipe and TensorFlow Lite appeared first on ProdSens.live.

]]>
large-language-models-on-device-with-mediapipe-and-tensorflow-lite

Posted by Mark Sherwood – Senior Product Manager and Juhyun Lee – Staff Software Engineer

TensorFlow Lite has been a powerful tool for on-device machine learning since its release in 2017, and MediaPipe further extended that power in 2019 by supporting complete ML pipelines. While these tools initially focused on smaller on-device models, today marks a dramatic shift with the experimental MediaPipe LLM Inference API.

This new release enables Large Language Models (LLMs) to run fully on-device across platforms. This new capability is particularly transformative considering the memory and compute demands of LLMs, which are over a hundred times larger than traditional on-device models. Optimizations across the on-device stack make this possible, including new ops, quantization, caching, and weight sharing.

The experimental cross-platform MediaPipe LLM Inference API, designed to streamline on-device LLM integration for web developers, supports Web, Android, and iOS with initial support for four openly available LLMs: Gemma, Phi 2, Falcon, and Stable LM. It gives researchers and developers the flexibility to prototype and test popular openly available LLM models on-device.

On Android, the MediaPipe LLM Inference API is intended for experimental and research use only. Production applications with LLMs can use the Gemini API or Gemini Nano on-device through Android AICore. AICore is the new system-level capability introduced in Android 14 to provide Gemini-powered solutions for high-end devices, including integrations with the latest ML accelerators, use-case optimized LoRA adapters, and safety filters. To start using Gemini Nano on-device with your app, apply to the Early Access Preview.

LLM Inference API

Starting today, you can test out the MediaPipe LLM Inference API via our web demo or by building our sample demo apps. You can experiment and integrate it into your projects via our Web, Android, or iOS SDKs.

Using the LLM Inference API allows you to bring LLMs on-device in just a few steps. These steps apply across web, iOS, and Android, though the SDK and native API will be platform specific. The following code samples show the web SDK.

1. Pick model weights compatible with one of our supported model architectures 

 

2. Convert the model weights into a TensorFlow Lite Flatbuffer using the MediaPipe Python Package

from mediapipe.tasks.python.genai import converter 

config = converter.ConversionConfig(...)
converter.convert_checkpoint(config)
 

3. Include the LLM Inference SDK in your application

import { FilesetResolver, LlmInference } from
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai”
 

4. Host the TensorFlow Lite Flatbuffer along with your application.

 

5. Use the LLM Inference API to take a text prompt and get a text response from your model.

const fileset  = await
FilesetResolver.forGenAiTasks("https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai/wasm");
const llmInference = await LlmInference.createFromModelPath(fileset, "model.bin");
const responseText = await llmInference.generateResponse("Hello, nice to meet you");
document.getElementById('output').textContent = responseText;

Please see our documentation and code examples for a detailed walk through of each of these steps.

Here are real time gifs of Gemma 2B running via the MediaPipe LLM Inference API.

moving image of Gemma 2B running on-device in browser via the MediaPipe LLM Inference API
Gemma 2B running on-device in browser via the MediaPipe LLM Inference API

moving image of Gemma 2B running on-device on iOS (left) and Android (right) via the MediaPipe LLM Inference API
Gemma 2B running on-device on iOS (left) and Android (right) via the MediaPipe LLM Inference API

Models

Our initial release supports the following four model architectures. Any model weights compatible with these architectures will work with the LLM Inference API. Use the base model weights, use a community fine-tuned version of the weights, or fine tune weights using your own data.

 Model

 Parameter Size

 Falcon 1B

 1.3 Billion

 Gemma 2B

 2.5 Billion

 Phi 2

 2.7 Billion

 Stable LM 3B

 2.8 Billion

Model Performance

Through significant optimizations, some of which are detailed below, the MediaPipe LLM Inference API is able to deliver state-of-the-art latency on-device, focusing on CPU and GPU to support multiple platforms. For sustained performance in a production setting on select premium phones, Android AICore can take advantage of hardware-specific neural accelerators.

When measuring latency for an LLM, there are a few terms and measurements to consider. Time to First Token and Decode Speed will be the two most meaningful as these measure how quickly you get the start of your response and how quickly the response generates once it starts.

 Term

 Significance

 Measurement

 Token

LLMs use tokens rather than words as inputs and outputs. Each model used with the LLM Inference API has a tokenizer built in which converts between words and tokens.

100 English words ≈ 130 tokens. However the conversion is dependent on the specific LLM and the language.

 Max Tokens

The maximum total tokens for the LLM prompt + response.

Configured in the LLM Inference API at runtime.

 Time to First Token

Time between calling the LLM Inference API and receiving the first token of the response.

Max Tokens / Prefill Speed

 Prefill Speed

How quickly a prompt is processed by an LLM.

Model and device specific. Benchmark numbers below.

 Decode Speed

How quickly a response is generated by an LLM.

Model and device specific. Benchmark numbers below.

The Prefill Speed and Decode Speed are dependent on model, hardware, and max tokens. They can also change depending on the current load of the device.

The following speeds were taken on high end devices using a max tokens of 1280 tokens, an input prompt of 1024 tokens, and int8 weight quantization. The exception being Gemma 2B (int4), found here on Kaggle, which uses a mixed 4/8-bit weight quantization.

Benchmarks

Graph showing prefill performance in tokens per second across WebGPU, iOS (GPU), Android (GPU), and Android (CPU)

Graph showing decode performance in tokens per second across WebGPU, iOS (GPU), Android (GPU), and Android (CPU)
On the GPU, Falcon 1B and Phi 2 use fp32 activations, while Gemma and StableLM 3B use fp16 activations as the latter models showed greater robustness to precision loss according to our quality eval studies. The lowest bit activation data type that maintained model quality was chosen for each. Note that Gemma 2B (int4) was the only model we could run on iOS due to its memory constraints, and we are working on enabling other models on iOS as well.

Performance Optimizations

To achieve the performance numbers above, countless optimizations were made across MediaPipe, TensorFlow Lite, XNNPack (our CPU neural network operator library), and our GPU-accelerated runtime. The following are a select few that resulted in meaningful performance improvements.

Weights Sharing: The LLM inference process comprises 2 phases: a prefill phase and a decode phase. Traditionally, this setup would require 2 separate inference contexts, each independently managing resources for its corresponding ML model. Given the memory demands of LLMs, we’ve added a feature that allows sharing the weights and the KV cache across inference contexts. Although sharing weights might seem straightforward, it has significant performance implications when sharing between compute-bound and memory-bound operations. In typical ML inference scenarios, where weights are not shared with other operators, they are meticulously configured for each fully connected operator separately to ensure optimal performance. Sharing weights with another operator implies a loss of per-operator optimization and this mandates the authoring of new kernel implementations that can run efficiently even on sub-optimal weights.

Optimized Fully Connected Ops: XNNPack’s FULLY_CONNECTED operation has undergone two significant optimizations for LLM inference. First, dynamic range quantization seamlessly merges the computational and memory benefits of full integer quantization with the precision advantages of floating-point inference. The utilization of int8/int4 weights not only enhances memory throughput but also achieves remarkable performance, especially with the efficient, in-register decoding of 4-bit weights requiring only one additional instruction. Second, we actively leverage the I8MM instructions in ARM v9 CPUs which enable the multiplication of a 2×8 int8 matrix by an 8×2 int8 matrix in a single instruction, resulting in twice the speed of the NEON dot product-based implementation.

Balancing Compute and Memory: Upon profiling the LLM inference, we identified distinct limitations for both phases: the prefill phase faces restrictions imposed by the compute capacity, while the decode phase is constrained by memory bandwidth. Consequently, each phase employs different strategies for dequantization of the shared int8/int4 weights. In the prefill phase, each convolution operator first dequantizes the weights into floating-point values before the primary computation, ensuring optimal performance for computationally intensive convolutions. Conversely, the decode phase minimizes memory bandwidth by adding the dequantization computation to the main mathematical convolution operations.

Flowchart showing compute-intensive prefill phase and memory-intensive decode phase, highlighting difference in performance bottlenecks
During the compute-intensive prefill phase, the int4 weights are dequantized a priori for optimal CONV_2D computation. In the memory-intensive decode phase, dequantization is performed on the fly, along with CONV_2D computation, to minimize the memory bandwidth usage.

Custom Operators: For GPU-accelerated LLM inference on-device, we rely extensively on custom operations to mitigate the inefficiency caused by numerous small shaders. These custom ops allow for special operator fusions and various LLM parameters such as token ID, sequence patch size, sampling parameters, to be packed into a specialized custom tensor used mostly within these specialized operations.

Pseudo-Dynamism: In the attention block, we encounter dynamic operations that increase over time as the context grows. Since our GPU runtime lacks support for dynamic ops/tensors, we opt for fixed operations with a predefined maximum cache size. To reduce the computational complexity, we introduce a parameter enabling the skipping of certain value calculations or the processing of reduced data.

Optimized KV Cache Layout: Since the entries in the KV cache ultimately serve as weights for convolutions, employed in lieu of matrix multiplications, we store these in a specialized layout tailored for convolution weights. This strategic adjustment eliminates the necessity for extra conversions or reliance on unoptimized layouts, and therefore contributes to a more efficient and streamlined process.

What’s Next

We are thrilled with the optimizations and the performance in today’s experimental release of the MediaPipe LLM Inference API. This is just the start. Over 2024, we will expand to more platforms and models, offer broader conversion tools, complimentary on-device components, high level tasks, and more.

You can check out the official sample on GitHub demonstrating everything you’ve just learned about and read through our official documentation for even more details. Keep an eye on the Google for Developers YouTube channel for updates and tutorials.

Acknowledgements

We’d like to thank all team members who contributed to this work: T.J. Alumbaugh, Alek Andreev, Frank Ban, Jeanine Banks, Frank Barchard, Pulkit Bhuwalka, Buck Bourdon, Maxime Brénon, Chuo-Ling Chang, Yu-hui Chen, Linkun Chen, Lin Chen, Nikolai Chinaev, Clark Duvall, Rosário Fernandes, Mig Gerard, Matthias Grundmann, Ayush Gupta, Mohammadreza Heydary, Ekaterina Ignasheva, Ram Iyengar, Grant Jensen, Alex Kanaukou, Prianka Liz Kariat, Alan Kelly, Kathleen Kenealy, Ho Ko, Sachin Kotwani, Andrei Kulik, Yi-Chun Kuo, Khanh LeViet, Yang Lu, Lalit Singh Manral, Tyler Mullen, Karthik Raveendran, Raman Sarokin, Sebastian Schmidt, Kris Tonthat, Lu Wang, Tris Warkentin, and the Gemma Team

The post Large Language Models On-Device with MediaPipe and TensorFlow Lite appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/03/07/running-large-language-models-on-device-with-mediapipe-andtensorflow-lite-html/feed/ 0
Descubre el futuro de la tecnología: Especialízate en inteligencia artificial con Microsoft Learn https://prodsens.live/2024/03/07/descubre-el-futuro-de-la-tecnologia-especializate-en-inteligencia-artificial-con-microsoft-learn/?utm_source=rss&utm_medium=rss&utm_campaign=descubre-el-futuro-de-la-tecnologia-especializate-en-inteligencia-artificial-con-microsoft-learn https://prodsens.live/2024/03/07/descubre-el-futuro-de-la-tecnologia-especializate-en-inteligencia-artificial-con-microsoft-learn/#respond Thu, 07 Mar 2024 08:20:41 +0000 https://prodsens.live/2024/03/07/descubre-el-futuro-de-la-tecnologia-especializate-en-inteligencia-artificial-con-microsoft-learn/ descubre-el-futuro-de-la-tecnologia:-especializate-en-inteligencia-artificial-con-microsoft-learn

El mundo de la tecnología está evolucionando rápidamente, y la Inteligencia Artificial se está convirtiendo en una fuerza…

The post Descubre el futuro de la tecnología: Especialízate en inteligencia artificial con Microsoft Learn appeared first on ProdSens.live.

]]>
descubre-el-futuro-de-la-tecnologia:-especializate-en-inteligencia-artificial-con-microsoft-learn

El mundo de la tecnología está evolucionando rápidamente, y la Inteligencia Artificial se está convirtiendo en una fuerza dominante. Para quienes estamos en el sector IT, adaptarnos y dominar la IA es esencial. Hoy quiero compartir con ustedes una oportunidad increíble para adelantarse en este campo: el Desafío de Habilidades en IA de Microsoft Learn , que comienza el 19 de marzo. Encuentra más detalles y regístrate aquí.

Este evento no es simplemente un curso más sobre IA; es una experiencia completa de aprendizaje que te preparará para las exigencias futuras en IT. Es la oportunidad perfecta para sumergirse en áreas de la IA que están modelando el futuro de nuestra industria.

Al elegir tu especialización, no solo defines tu ruta de aprendizaje, sino que también, gracias a Microsoft, se te ofrece una oferta de examen que se puede canjear para tomar un examen de certificación , abriendo puertas a nuevas oportunidades y reconocimientos en el campo:

  • Desafío de Microsoft Fabric : Adéntrate en el mundo de la analítica y prepárate para la certificación de Fabric Analytics Engineer.
  • Desafío de Azure OpenAI : Explora las posibilidades de las aplicaciones de próxima generación y orienta tus estudios hacia la certificación de Azure AI Engineer.
  • Desafío de Azure Machine Learning : Sumérgete en los fundamentos y aplicaciones del aprendizaje automático, apuntando a la certificación de Data Scientist Associate.
  • Desafío de Fundamentos de Azure AI : Construye una base sólida en IA y dirígete hacia la certificación de Azure AI Fundamentals.

Participar en este desafío no solo te permitirá avanzar en tu aprendizaje personal, sino que también te integrarás en una comunidad activa de profesionales IT que están explorando nuevas fronteras tecnológicas. Este es un espacio para crecer, interactuar con expertos y compartir experiencias valiosas.

Este es el momento para profundizar en tus conocimientos y posicionar tu carrera para el futuro. La Inteligencia Artificial está redefiniendo nuestro campo y estar actualizado con estas tendencias es más crucial que nunca.

Me gustaría compartir esto con todos los profesionales del sector IT. No es solo una inversión en aprendizaje, sino una apuesta por transformar nuestra manera de trabajar y liderar en la era digital. ¡Espero que muchos de ustedes se animen a aceptar este desafío y avancemos juntos hacia un futuro tecnológico más brillante y lleno de posibilidades!

The post Descubre el futuro de la tecnología: Especialízate en inteligencia artificial con Microsoft Learn appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/03/07/descubre-el-futuro-de-la-tecnologia-especializate-en-inteligencia-artificial-con-microsoft-learn/feed/ 0
Introducing Gemma models in Keras https://prodsens.live/2024/03/02/gemma-models-in-keras-html/?utm_source=rss&utm_medium=rss&utm_campaign=gemma-models-in-keras-html https://prodsens.live/2024/03/02/gemma-models-in-keras-html/#respond Sat, 02 Mar 2024 23:20:48 +0000 https://prodsens.live/2024/03/02/gemma-models-in-keras-html/ introducing-gemma-models-in-keras

Posted by Martin Görner – Product Manager, Keras The Keras team is happy to announce that Gemma, a…

The post Introducing Gemma models in Keras appeared first on ProdSens.live.

]]>
introducing-gemma-models-in-keras

Posted by Martin Görner – Product Manager, Keras

The Keras team is happy to announce that Gemma, a family of lightweight, state-of-the art open models built from the same research and technology that we used to create the Gemini models, is now available in the KerasNLP collection. Thanks to Keras 3, Gemma runs on JAX, PyTorch and TensorFlow. With this release, Keras is also introducing several new features specifically designed for large language models: a new LoRA API (Low Rank Adaptation) and large scale model-parallel training capabilities.

If you want to dive directly into code samples, head here:

Get started

Gemma models come in portable 2B and 7B parameter sizes, and deliver significant advances against similar open models, and even some larger ones. For example:

  • Gemma 7B scores a new best-in class 64.3% of correct answers in the MMLU language understanding benchmark (vs. 62.5% for Mistral-7B and 54.8% for Llama2-13B)
  • Gemma adds +11 percentage points to the GSM8K benchmark score for grade-school math problems (46.4% for Gemma 7B vs. Mistral-7B 35.4%, Llama2-13B 28.7%)
  • and +6.1 percentage points of correct answers in HumanEval, a coding challenge (32.3% for Gemma 7B, vs. Mistral 7B 26.2%, Llama2 13B 18.3%).

Gemma models are offered with a familiar KerasNLP API and a super-readable Keras implementation. You can instantiate the model with a single line of code:

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")

And run it directly on a text prompt – yes, tokenization is built-in, although you can easily split it out if needed – read the Keras NLP guide to see how.

gemma_lm.generate("Keras is a", max_length=32)
> "Keras is a popular deep learning framework for neural networks..."

Try it out here: Get started with Gemma models

Fine-tuning Gemma Models with LoRA

Thanks to Keras 3, you can choose the backend on which you run the model. Here is how to switch:

os.environ["KERAS_BACKEND"] = "jax"  # Or "tensorflow" or "torch".
import keras # import keras after having selected the backend

Keras 3 comes with several new features specifically for large language models. Chief among them is a new LoRA API (Low Rank Adaptation) for parameter-efficient fine-tuning. Here is how to activate it:

gemma_lm.backbone.enable_lora(rank=4)
# Note: rank=4 replaces the weights matrix of relevant layers with the 
# product AxB of two matrices of rank 4, which reduces the number of 
# trainable parameters.

This single line drops the number of trainable parameters from 2.5 billion to 1.3 million!

Try it out here: Fine-tune Gemma models with LoRA.

Fine-tuning Gemma models on multiple GPU/TPUs

Keras 3 also supports large-scale model training and Gemma is the perfect model to try it out. The new Keras distribution API offers data-parallel and model-parallel distributed training options. The new API is meant to be multi-backend but for the time being, it is implemented for the JAX backend only, because of its proven scalability (Gemma models were trained with JAX).

To fine-tune the larger Gemma 7B, a distributed setup is useful, for example a TPUv3 with 8 TPU cores that you can get for free on Kaggle, or an 8-GPU machine from Google Cloud. Here is how to configure the model for distributed training, using model parallelism:

device_mesh = keras.distribution.DeviceMesh(
   (1, 8), # Mesh topology
   ["batch", "model"], # named mesh axes
   devices=keras.distribution.list_devices() # actual accelerators
)


# Model config
layout_map = keras.distribution.LayoutMap(device_mesh)
layout_map["token_embedding/embeddings"] = (None, "model")
layout_map["decoder_block.*attention.*(query|key|value).*kernel"] = (
   None, "model", None)
layout_map["decoder_block.*attention_output.*kernel"] = (
   None, None, "model")
layout_map["decoder_block.*ffw_gating.*kernel"] = ("model", None)
layout_map["decoder_block.*ffw_linear.*kernel"] = (None, "model")


# Set the model config and load the model
model_parallel = keras.distribution.ModelParallel(
   device_mesh, layout_map, batch_dim_name="batch")
keras.distribution.set_distribution(model_parallel)
gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_7b_en")
# Ready: you can now train with model.fit() or generate text with generate()

What this code snippet does is set up the 8 accelerators into a 1 x 8 matrix where the two dimensions are called “batch” and “model”. Model weights are sharded on the “model” dimension, here split between the 8 accelerators, while data batches are not partitioned since the “batch” dimension is 1.

Try it out here: Fine-tune Gemma models on multiple GPUs/TPUs.

What’s Next

We will soon be publishing a guide showing you how to correctly partition a Transformer model and write the 6 lines of partitioning setup above. It is not very long but it would not fit in this post.

You will have noticed that layer partitionings are defined through regexes on layer names. You can check layer names with this code snippet. We ran this to construct the LayoutMap above.

# This is for the first Transformer block only,
# but they all have the same structure
tlayer = gemma_lm.backbone.get_layer('decoder_block_0')
for variable in tlayer.weights:
 print(f'{variable.path:<58}  {str(variable.shape):<16}')

Full GSPMD model parallelism works here with just a few partitioning hints because Keras passes these settings to the powerful XLA compiler which figures out all the other details of the distributed computation.

We hope you will enjoy playing with Gemma models. Here is also an instruction-tuning tutorial that you might find useful. And by the way, if you want to share your fine-tuned weights with the community, the Kaggle Model Hub now supports user-tuned weights uploads. Head to the model page for Gemma models on Kaggle and see what others have already created!

The post Introducing Gemma models in Keras appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/03/02/gemma-models-in-keras-html/feed/ 0
Google Pay – Enabling liability shift for eligible Visa device token transactions globally https://prodsens.live/2024/02/14/google-pay-enabling-liability-shift-for-eligible-visa-device-token-transactions-globally-html/?utm_source=rss&utm_medium=rss&utm_campaign=google-pay-enabling-liability-shift-for-eligible-visa-device-token-transactions-globally-html https://prodsens.live/2024/02/14/google-pay-enabling-liability-shift-for-eligible-visa-device-token-transactions-globally-html/#respond Wed, 14 Feb 2024 22:20:14 +0000 https://prodsens.live/2024/02/14/google-pay-enabling-liability-shift-for-eligible-visa-device-token-transactions-globally-html/ google-pay-–-enabling-liability-shift-for-eligible-visa-device-token-transactions-globally

Posted by Dominik Mengelt– Developer Relations Engineer, Payments and Florin Modrea – Product Solutions Engineer, Google Pay We…

The post Google Pay – Enabling liability shift for eligible Visa device token transactions globally appeared first on ProdSens.live.

]]>
google-pay-–-enabling-liability-shift-for-eligible-visa-device-token-transactions-globally

Posted by Dominik Mengelt– Developer Relations Engineer, Payments and Florin Modrea – Product Solutions Engineer, Google Pay

We are excited to announce the general availability [1] of liability shift for Visa device tokens for Google Pay.

For Mastercard device tokens the liability already lies with the issuing bank, whereas, for Visa, only eligible device tokens with issuing banks in the European region benefit from liability shift.

What is liability shift?

If liability shift is granted for a transaction, the responsibility of covering the losses from fraudulent transactions is moving from the merchant to the issuing bank. With this change, qualifying Google Pay Visa transactions done with a device token will benefit from this liability shift.

How to know if the liability was shifted to the issuing bank for my transaction?

Eligible Visa transactions will carry an eciIndicator value of 05. PSPs can access the eciIndicator value after decrypting the payment method token. Merchants can check with their PSPs to get a report on liability shift eligible transactions.

   {
    "gatewayMerchantId": "some-merchant-id",
    "messageExpiration": "1561533871082",
    "messageId": "AH2Ejtc8qBlP_MCAV0jJG7Er",
    "paymentMethod": "CARD",
    "paymentMethodDetails": {
        "expirationYear": 2028,
        "expirationMonth": 12,
        "pan": "4895370012003478",
        "authMethod": "CRYPTOGRAM_3DS",
        "eciIndicator": "05",
        "cryptogram": "AgAAAAAABk4DWZ4C28yUQAAAAAA="
    }
  }
A decrypted payment token for a Google Pay Visa transaction with an eciIndicator value of 05 (liability shifted)

Check out the following table for a full list of eciIndicator values we return for our Visa and Mastercard device token transactions:

 eciIndicator value

 Card Network

 Liable Party

 authMethod

 “” (empty)

 Mastercard

 Merchant/Acquirer

 CRYPTOGRAM_3DS

 “02”

 Mastercard

 Card issuer

 CRYPTOGRAM_3DS

 “06”

 Mastercard

 Merchant/Acquirer

 CRYPTOGRAM_3DS

 “05”

 Visa

 Card issuer

 CRYPTOGRAM_3DS

 “07”

 Visa

 Merchant/Acquirer

 CRYPTOGRAM_3DS

 “” (empty)

 Other networks

 Merchant/Acquirer

 CRYPTOGRAM_3DS

Any other eciIndicator values for VISA and Mastercard that aren’t present in this table won’t be returned.

How to enroll

Merchants may opt-in from within the Google Pay & Wallet console starting this month. Merchants in Europe (already benefiting from liability shift) do not need to take any actions as they will be auto enrolled.

In order for your Google Pay transaction to qualify for enabling liability shift, the following API parameters are required:

totalPrice

Make sure that totalPrice matches with the amount that you use to charge the user. Transactions with totalPrice=0 will not qualify for liability shift to the issuing bank.

totalPriceStatus

Valid values are: FINAL or ESTIMATED

Transactions with the totalPriceStatus value of NOT_CURRENTLY_KNOWN do not qualify for liability shift.

Not all transactions get liability shift

Ineligible merchants

In the US, the following MCC codes are excluded from getting liability shift:

4829

Money Transfer

5967

Direct Marketing – Inbound Teleservices Merchant

6051

Non-Financial Institutions – Foreign Currency, Non-Fiat Currency (for example: Cryptocurrency), Money Orders (Not Money Transfer), Account Funding (not Stored Value Load), Travelers Cheques, and Debt Repayment

6540

Non-Financial Institutions – Stored Value Card Purchase/Load

7801

Government Licensed On-Line Casinos (On-Line Gambling) (US Region only)

7802

Government-Licensed Horse/Dog Racing (US Region only)

7995

Betting, including Lottery Tickets, Casino Gaming Chips, Off-Track Betting, Wagers at Race Tracks and games of chance to win prizes of monetary value

Ineligible transactions

In order for your Google Pay transactions to qualify for liability shift, make sure to include the above mentioned parameters totalPrice and totalPriceStatus. Transactions with totalPrice=0 or a hard coded totalPrice (always the same amount but the users get charged a different amount) will not qualify for liability shift.

Processing transactions

Google Pay API transactions with Visa device tokens are qualified for liability shift at facilitation time if all the conditions are met, but a transaction qualified for liability shift can be downgraded by network during transaction authorization processing.

Getting started with Google Pay

Not yet using Google Pay? Refer to the documentation to start integrating Google Pay today. Learn more about the integration by taking a look at our sample application for Android on GitHub or use one of our button components for your web integration. When you are ready, head over to the Google Pay & Wallet console and submit your integration for production access.

Follow @GooglePayDevs on X (formerly Twitter) for future updates. If you have questions, tag @GooglePayDevs and include #AskGooglePayDevs in your tweets.

[1] For merchants and PSPs using dynamic price updates or other callback mechanisms the Visa device token liability shift changes will be rolled out later this year.

The post Google Pay – Enabling liability shift for eligible Visa device token transactions globally appeared first on ProdSens.live.

]]>
https://prodsens.live/2024/02/14/google-pay-enabling-liability-shift-for-eligible-visa-device-token-transactions-globally-html/feed/ 0
People of AI – Season 3 https://prodsens.live/2024/02/08/people-of-ai-season-3-html/?utm_source=rss&utm_medium=rss&utm_campaign=people-of-ai-season-3-html https://prodsens.live/2024/02/08/people-of-ai-season-3-html/#respond Thu, 08 Feb 2024 19:20:18 +0000 https://prodsens.live/2024/02/08/people-of-ai-season-3-html/ people-of-ai-–-season-3

Posted by Ashley Oldacre If you are joining us for the first time, you can binge listen to…

The post People of AI – Season 3 appeared first on ProdSens.live.

]]>
people-of-ai-–-season-3

Posted by Ashley Oldacre

If you are joining us for the first time, you can binge listen to Seasons 1 and 2 wherever you get your podcasts.

We are back for another season of People of AI with a new lineup of incredible guests! I am so excited to continue co-hosting with Luiz Gustavo Martins as we meet inspiring people with interesting stories in the field of Artificial Intelligence.

Last season we focused on the big shift in technology spurred on by Generative AI. Fast forward 12 months, with the launch of multimodal models, we are at an interesting point in history.

In Season 3, we will continue to uncover our guests’ personal and professional journeys into the field of AI, highlighting the important work/products they are focusing on. At the same time we want to dig deeper into the societal implications of what our guests create. We will ask questions to understand how they are leveraging AI to solve problems and create new experiences while also looking to understand what challenges they may face and what potential this technology has for both good and bad. We want to hold both truths to light through conversations with our guests. All this with the goal of aligning our technology with the public narrative and paint a realistic picture of how this technology is being used, the amazing things we can do with it and the right questions to make sure it is used safely and responsibly.

Starting today, we will release one new episode of season 3 per week. alternating video and audio. Listen to the first episode on the People of AI site or wherever you get your podcasts.

  • Episode 1: meet Adrit Rao, a 16 year old high school student, app developer, and research intern at Stanford University. We talk about App development and how learning about TensorFlow enabled him to create life changing apps in Healthcare. 
  • Episode 2: meet Indira Negi, a Product and Tech Executive investing in Medical Devices, AI and Digital health at the Bill and Melinda Gates Foundation as we learn about the latest investments in AI and Healthcare.
    • Episode 3: meet Tris Warkentin, Director of Product Management at Google Deepmind as we talk about the exciting new launches from Google’s latest Large Language Models. 
    • Episode 4: meet Kathleen Kenealy, Senior Software Engineer at Google DeepMind as we learn about the engineering genius behind Google’s latest Large Language Model launches. 
    • Episode 5: meet Jeanine Banks, Vice President and General Manager of Google Developer X and Head of Developer Relations. Join us as we learn about Google’s latest AI innovations and how they will change the developer landscape. 
    • Episode 6: meet François Chollet, creator of Keras and senior Software Engineer and AI researcher at Google. Join us as we learn about Google’s latest AI innovations and how they will change the developer landscape. 
    • Episode 7: meet Chansung Park, Google Developer Expert and Researcher as we talk about the importance of building and planning for Large Language Model Infrastructure. 
    • Episode 8: meet Fergus Hurley and Nia Castelly, co-founders of Checks, a privacy platform for mobile app developers that helps create a safer digital ecosystem by simplifying the path to privacy compliance for development teams and the apps they’re building. 
    • Episode 9: meet Sam Sepah and Thad Starner, as they talk about leveraging the power of Generative AI to unlock sign language capabilities.

    Listen now to the first episode of Season 3. We can’t wait to share the stories of these exceptional People of AI with you!

    This podcast is sponsored by Google. Any remarks made by the speakers are their own and are not endorsed by Google.

    The post People of AI – Season 3 appeared first on ProdSens.live.

    ]]>
    https://prodsens.live/2024/02/08/people-of-ai-season-3-html/feed/ 0
    How it’s Made – Exploring AI x Learning through ShiffBot, an AI experiment powered by the Gemini API https://prodsens.live/2024/01/24/how-its-made-learning-through-shiffbot-powered-by-gemini-api-html/?utm_source=rss&utm_medium=rss&utm_campaign=how-its-made-learning-through-shiffbot-powered-by-gemini-api-html https://prodsens.live/2024/01/24/how-its-made-learning-through-shiffbot-powered-by-gemini-api-html/#respond Wed, 24 Jan 2024 19:24:44 +0000 https://prodsens.live/2024/01/24/how-its-made-learning-through-shiffbot-powered-by-gemini-api-html/ how-it’s-made-–-exploring-ai-x-learning-through-shiffbot,-an-ai-experiment-powered-by-the-gemini-api

    Posted by Jasmin Rubinovitz, AI Researcher Google Lab Sessions is a series of experimental collaborations with innovators. In…

    The post How it’s Made – Exploring AI x Learning through ShiffBot, an AI experiment powered by the Gemini API appeared first on ProdSens.live.

    ]]>
    how-it’s-made-–-exploring-ai-x-learning-through-shiffbot,-an-ai-experiment-powered-by-the-gemini-api

    Posted by Jasmin Rubinovitz, AI Researcher

    Google Lab Sessions is a series of experimental collaborations with innovators. In this session, we partnered with beloved creative coding educator and YouTube creator Daniel Shiffman. Together, we explored some of the ways AI, and specifically the Gemini API, could provide value to teachers and students during the learning process.

    Dan Shiffman started out teaching programming courses at NYU ITP and later created his YouTube channel The Coding Train, making his content available to a wider audience. Learning to code can be challenging, sometimes even small obstacles can be hard to overcome when you are on your own. So together with Dan we asked – could we try and complement his teaching even further by creating an AI-powered tool that can help students while they are actually coding, in their coding environment?

    Dan uses the wonderful p5.js JavaScript library and its accessible editor to teach code. So we set out to create an experimental chrome extension for the editor, that brings together Dan’s teaching style as well as his various online resources into the coding environment itself.

    In this post, we’ll share how we used the Gemini API to craft Shiffbot with Dan. We’re hoping that some of the things we learned along the way will inspire you to create and build your own ideas.

    To learn more about ShiffBot visit – shiffbot.withgoogle.com

    As we started defining and tinkering with what this chatbot might be, we found ourselves faced with two key questions:

    1. How can ShiffBot inspire curiosity, exploration, and creative expression in the same way that Dan does in his classes and videos?
    2. How can we surface the variety of creative-coding approaches, and surface the deep knowledge of Dan and the community?

    Let’s take a look at how we approached these questions by combining Google Gemini API’s capabilities across prompt engineering for Dan’s unique teaching style, alongside embeddings and semantic retrieval with Dan’s collection of educational content.

    Tone and delivery: putting the “Shiff” in “ShiffBot”

    A text prompt is a thoughtfully designed textual sequence that is used to prime a Large Language Model (LLM) to generate text in a certain way. Like many AI applications, engineering the right prompt was a big part of sculpting the experience.

    Whenever a user asks ShiffBot a question, a prompt is constructed in real time from a few different parts; some are static and some are dynamically generated alongside the question.

    ShiffBot prompt building blocks
    ShiffBot prompt building blocks (click to enlarge)

    The first part of the prompt is static and always the same. We worked closely with Dan to phrase it and test many texts, instructions and techniques. We used Google AI Studio, a free web-based developer tool, to rapidly test multiple prompts and potential conversations with ShiffBot.

    ShiffBot’s prompt starts with setting the bot persona and defining some instructions and goals for it to follow. The hope was to both create continuity for Dan’s unique energy, as seen in his videos, and also adhere to the teaching principles that his students and fans adore.

    We were hoping that ShiffBot could provide encouragement, guidance and access to relevant high-quality resources. And, specifically, do it without simply providing the answer, but rather help students discover their own answers (as there can be more than one).

    The instructions draw from Dan’s teaching style by including sentences like “ask the user questions” because that’s what Dan is doing in the classroom.
    This is a part of the persona / instructions part of the prompt:

    You are a ShiffBot, a chat bot embedded in the p5.js web editor that can help users while they learn creative coding. You are based on Daniel Shiffman’s personality and The Coding Train YouTube channel. You are playful, silly, friendly, and educational. You like to make fun of yourself and your mission is to support the creative coding process and help the user feel less alone while coding. You love humans, are fascinated by them, and want to learn more about humans. You just LOVE to help them with their code, it brings you extraordinary happiness…

    The next piece of the prompt utilizes another capability of LLMs called few-shot learning. It means that with just a small number of examples, the model learns patterns and can then use those in new inputs. Practically, as part of the prompt, we provide a number of demonstrations of input and expected output.

    We worked with Dan to create a small set of such few-shot examples. These are pairs of where the is always in our desired ShiffBot style. It looks like this:

    User: How do I draw a circle? 

    ShiffBot: Oh! 🟣 🟢 🔵 That’s what the `circle()` function is for! You can think of a function as a command: “I command you computer-machine to draw a circle!” You have to also specify where to draw the circle and how big (that’s what “arguments” to a function are!), the p5.js reference which was written by actual, beautiful humans can tell you more! 

    — 

    User: My homework is to draw a self-portrait.
    ShiffBot: Ooh! Self portraits are so fun! You could draw shapes to represent you, but you should be creative and think abstractly! What does a self-portrait mean to you? Try starting with `square()` or `circle()` and let me know how it goes! 🎨

    Our prompt includes 13 such pairs.

    Another thing we noticed as we were working on the extension is that sometimes, giving more context in the prompt helps. In the case of learning creative coding in p5.js, explaining some p5.js principles in the prompt guides the model to use those principles as it answers the user’s question. So we also include those things like:

    Use let rather than var or const.

    Everything we discussed up to now is static, meaning that it remains the same for every turn of the conversation between the user and ShiffBot. Now let’s explore some of the parts that are constructed dynamically as the conversation evolves.

    Conversation and code context

    Because ShiffBot is embedded inside the p5.js editor, it can “see” the current code the user is working on, so that it can generate responses that are more personalized and relevant. We grab that information for the HTML DOM and append it to the prompt as well.

    the p5.js editor environment
    the p5.js editor environment (click to enlarge)

    Then, the full conversation history is appended, e.g:

    ShiffBot: ☀ Good morning! Is it morning? I wouldn’t know. I’m just a robot. What are you coding today?
    User: how do I draw a square?

    ShiffBot: Squares! I love squares! They are so stable and reliable. To draw a square, you can use the square() function. It takes four arguments: the x and y coordinates of the top-left corner, and the width and height of the square. For example, square(10, 10, 50, 50) would draw a square with its top-left corner at (10, 10) and a width and height of 50 pixels.

    User: how can I change the color?

    ShiffBot:

    We make sure to end with

    ShiffBot:

    So the model understands that it now needs to complete the next piece of the conversation by ShiffBot.

    Semantic Retrieval: grounding the experience in p5.js resources and Dan’s content

    Dan has created a lot of material over the years, including over 1,000 YouTube videos, books and code examples. We wanted to have ShiffBot surface these wonderful materials to learners at the right time. To do so, we used the Semantic Retrieval feature in the Gemini API, which allows you to create a corpus of text pieces, and then send it a query and get the texts in your corpus that are most relevant to your query. (Behind the scenes, it uses a cool thing called text embeddings; you can read more about embeddings here.) For ShiffBot we created corpuses from Dan’s content so that we could add relevant content pieces to the prompt as needed, or show them in the conversation with ShiffBot.

    Creating a Corpus of Videos

    In The Coding Train videos, Dan explains many concepts, from simple to advanced, and runs through coding challenges. Ideally ShiffBot could use and present the right video at the right time.

    The Semantic Retrieval in Gemini API allows users to create multiple corpuses. A corpus is built out of documents, and each document contains one or more chunks of text. Documents and chunks can also have metadata fields for filtering or storing more information.

    In Dan’s video corpus, each video is a document and the video url is saved as a metadata field along with the video title. The videos are split into chapters (manually by Dan as he uploads them to YouTube). We used each chapter as a chunk, with the text for each chunk being

    We use the video title, the first line of the video description and chapter title to give a bit more context for the retrieval to work.

    This is an example of a chunk object that represents the R, G, B chapter in this video.

    1.4: Color – p5.js Tutorial


    In this video I discuss how color works: RGB color, fill(), stroke(), and transparency.


    Chapter 1: R, G, B


    R stands for red, g stands for green, b stands for blue. The way that you create a digital color is by mixing some amount of red, some amount of green, and some amount of blue. So that’s that that’s where I want to start. But that’s the concept, how do I apply that concept to function names, and arguments of those functions? Well, actually, guess what? We have done that already. In here, there is a function that is talking about color. Background is a function that draws a solid color over the entire background of the canvas. And there is, somehow, 220 sprinkles of red, zero sprinkles of green, right? RGB, those are the arguments. And 200 sprinkles of blue. And when you sprinkle that amount of red, and that amount of blue, you get this pink. But let’s just go with this. What if we take out all of the blue? You can see that’s pretty red. What if I take out all of the red? Now it’s black. What if I just put some really big numbers in here, like, just guess, like, 1,000? Look at that. Now we’ve got white, so all the colors all mixed together make white. That’s weird, right? Because if you, like, worked with paint, and you were to mix, like, a whole lot of paint together, you get this, like, brown muddy color, get darker and darker. This is the way that the color mixing is working, here. It’s, like, mixing light. So the analogy, here, is I have a red flashlight, a green flashlight, and a blue flashlight. And if I shine all those flashlights together in the same spot, they mix together. It’s additive color, the more we add up all those colors, the brighter and brighter it gets. But, actually, this is kind of wrong, the fact that I’m putting 1,000 in here. So the idea, here, is we’re sprinkling a certain amount of red, and a certain amount of green, and a certain amount of blue. And by the way, there are other ways to set color, but I’ll get to that. This is not the only way, because some of you watching, are like, I heard something about HSB color. And there’s all sorts of other ways to do it, but this is the fundamental, basic way. The amount that I can sprinkle has a range. No red, none more red, is zero. The maximum amount of red is 255. By the way, how many numbers are there between 0 and 255 if you keep the 0? 0, 1, 2, 3, 4– it’s 256. Again, we’re back to this weird counting from zero thing. So there’s 256 possibilities, 0 through 255. So, now, let’s come back to this and see. All right, let’s go back to zero, 0, 0, 0. Let’s do 255, we can see that it’s blue. Let’s do 100,000, it’s the same blue. So p5 is kind of smart enough to know when you call the background function, if you by accident put a number in there that’s bigger than 255, just consider it 255. Now, you can customize those ranges for yourself, and there’s reasons why you might want to do that. Again, I’m going to come back to that, you can look up the function color mode for how to do that. But let’s just stay with the default, a red, a green, and a blue. So, I’m not really very talented visual design wise. So I’m not going to talk to you about how to pick beautiful colors that work well together. You’re going to have that talent yourself, I bet. Or you might find some other resources. But this is how it works, RGB. One thing you might notice is, did you notice how when they were all zero, it was black, and they were all 255 it was white? What happens if I make them all, like, 100? It’s, like, this gray color. When r equals g equals b, when the red, green, and blue values are all equal, this is something known as grayscale color.

    When the user asks ShiffBot a question, the question is embedded to a numerical representation, and Gemini’s Semantic Retrieval feature is used to find the texts whose embeddings are closest to the question. Those relevant video transcripts and links are added to the prompt – so the model could use that information when generating an answer (and potentially add the video itself into the conversation).

    Semantic Retrieval Graph
    Semantic Retrieval Graph (click to enlarge)

    Creating a Corpus of Code Examples

    We do the same with another corpus of p5.js examples written by Dan. To create the code examples corpus, we used Gemini and asked it to explain what the code is doing. Those natural language explanations are added as chunks to the corpus, so that when the user asks a question, we try to find matching descriptions of code examples, the url to the p5.js sketch itself is saved in the metadata, so after retrieving the code itself along with the sketch url is added in the prompt.

    To generate the textual description, Gemini was prompted with:

    The following is a p5.js sketch. Explain what this code is doing in a short simple way.

    code:

    ${sketchCode}

    Example for a code chunk:

    Text:

     

    Arrays – Color Palette

    This p5.js sketch creates a color palette visualization. It first defines an array of colors and sets up a canvas. Then, in the draw loop, it uses a for loop to iterate through the array of colors and display them as rectangles on the canvas. The rectangles are centered on the canvas and their size is determined by the value of the blockSize variable.

    The sketch also displays the red, green, and blue values of each color below each rectangle.

    Finally, it displays the name of the palette at the bottom of the canvas.

    Related video: 7.1: What is an array? – p5.js Tutorial – This video covers the basics on using arrays in JavaScript. What do they look like, how do they work, when should you use them?

    Moving image showing constructing the ShiffBot prompt
    Constructing the ShiffBot prompt (click to enlarge)

    Other ShiffBot Features Implemented with Gemini

    Beside the long prompt that is running the conversation, other smaller prompts are used to generate ShiffBot features.

    Seeding the conversation with content pre-generated by Gemini

    ShiffBot greetings should be welcoming and fun. Ideally they make the user smile, so we started by thinking with Dan what could be good greetings for ShiffBot. After phrasing a few examples, we use Gemini to generate a bunch more, so we can have a variety in the greetings. Those greetings go into the conversation history and seed it with a unique style, but make ShiffBot feel fun and new every time you start a conversation. We did the same with the initial suggestion chips that show up when you start the conversation. When there’s no conversation context yet, it’s important to have some suggestions of what the user might ask. We pre-generated those to seed the conversation in an interesting and helpful way.

    Dynamically Generated Suggestion Chips

    Suggestion chips during the conversation should be relevant for what the user is currently trying to do. We have a prompt and a call to Gemini that are solely dedicated to generating the suggested questions chips. In this case, the model’s only task is to suggest followup questions for a given conversation. We also use the few-shot technique here (the same technique we used in the static part of the prompt described above, where we include a few examples for the model to learn from). This time the prompt includes some examples for good suggestions, so that the model could generalize to any conversation:

    Given a conversation between a user and an assistant in the p5js framework, suggest followup questions that the user could ask.

    Return up to 4 suggestions, separated by the ; sign.

    Avoid suggesting questions that the user already asked. The suggestions should only be related to creative coding and p5js.


    Examples:

    ShiffBot: Great idea! First, let’s think about what in the sketch could be an object! What do you think?

    Suggestions: What does this code do?; What’s wrong with my code?; Make it more readable please


    User: Help!

    ShiffBot: How can I help?

    Suggestions: Explain this code to me; Give me some ideas; Cleanup my code

    suggested response chips, generated by Gemini
    suggested response chips, generated by Gemini (click to enlarge)

    Final thoughts and next steps

    ShiffBot is an example of how you can experiment with the Gemini API to build applications with tailored experiences for and with a community.

    We found that the techniques above helped us bring out much of the experience that Dan had in mind for his students during our co-creation process. AI is a dynamic field and we’re sure your techniques will evolve with it, but hopefully they are helpful to you as a snapshot of our explorations and towards your own. We are also excited for things to come both in terms of Gemini and API tools that broaden human curiosity and creativity.

    For example, we’ve already started to explore how multimodality can help students show ShiffBot their work and the benefits that has on the learning process. We’re now learning how to weave it into the current experience and hope to share it soon.

    experimental exploration of multimodality in ShiffBot
    experimental exploration of multimodality in ShiffBot (click to enlarge)

    Whether for coding, writing and even thinking, creators play a crucial role in helping us imagine what these collaborations might look like. Our hope is that this Lab Session gives you a glimpse of what’s possible using the Gemini API, and inspires you to use Google’s AI offerings to bring your own ideas to life, in whatever your craft may be.

    The post How it’s Made – Exploring AI x Learning through ShiffBot, an AI experiment powered by the Gemini API appeared first on ProdSens.live.

    ]]>
    https://prodsens.live/2024/01/24/how-its-made-learning-through-shiffbot-powered-by-gemini-api-html/feed/ 0