HKIIT Students use Gemini Pro Vision to develop AI Screen Reader

hkiit-students-use-gemini-pro-vision-to-develop-ai-screen-reader

Enhancing Accessibility: GDSC-Hong Kong Institute of Information Technology (HKIIT) aids visually impaired individuals to see all web images using Gemini Pro Vision

Managing Director and GM, Google Hong Kong Michael Yue keynote speech at Hong Kong Google Cloud Summit 2024 — Students use Gemini Pro Vision to develop AI Screen Reader

In the sphere of the Hong Kong Institute of Information Technology (HKIIT), an exceptional team of student developers, known as GDSC-HKIIT (previously GDSC-IVE), emerged. These young innovators, Fiona Chan, and her peers, studying the final year of the Higher Diploma of Cloud and Data Centre Administration (CDCA), were motivated by an unquenchable thirst to build progressive applications, harnessing the might of Artificial Intelligence and Google Cloud Platform.

Fiona’s fascination with computers was sparked during her high school years. Her interest in Cloud technology solidified her passion, steering her towards studying CDCA at HKIIT. Her destiny led her to be a member of GDSC-HKIIT, a community of like-minded developers that fostered an environment of motivation and mutual assistance.

In September 2023, an extraordinary opportunity arose when Mr. Owen Chong, Google Cloud Hong Kong’s Public Sector Industry Lead, introduces the social enterprise — iEnterprise, whose mission is to provide job opportunities for people with disabilities and run an outsourcing Call Center, from a local Telecom company, providing jobs for the people with disabilities. Fiona and her peers were selected to be part of this life-changing endeavor under the guidance of Mr. Cyrus Wong, a senior lecturer at HKIIT and a Google Developer Expert — Google Cloud Platform & AL/ML (GenAI). Fiona and her peers quickly take part in the project due to their willingness to help people with technology.

Mr. Owen Chong, Google Cloud Hong Kong’s Public Sector Industry Lead
Mr. Cyrus Wong, a senior lecturer at HKIIT and a Google Developer Expert — Google Cloud Platform & AL/ML (GenAI)

Propelled by the infinite possibilities of Generative AI, specifically the remarkable Image Captions feature of Google Cloud Platform Vertex AI Gemini Pro Vision, Fiona’s team aimed to assist the visually impaired. They identified the main obstacle for these individuals — the inability to access crucial image information on websites. According to data from the Hong Kong Blind Union, approximately 200,000 individuals in Hong Kong are currently living with visual impairments. Development on a solution to address this issue commenced in July 2023 by our team. We consulted several students from HKIIT with visual impairments, who all expressed similar feedback. They pointed out that the majority of websites lack effective “Alt Text”, and existing screen readers fail to deliver accurate or helpful information specific to Hong Kong. Driven by empathy and a desire to enhance the lives of those without access to technology and opportunities, the team committed to developing a solution. Due to financial limitations, we are only able to permit a select number of focus groups to utilize the solution. The project is entirely open source, thus any NGO developers are encouraged to adapt or implement it for free. Please acknowledge GDSC-HKIIT with a thank you note.

Through their tenacity, they successfully combined the open-source Chrome extension Google ChromeVox Classic with the Google Gemini Pro Vision of Google Cloud Platform Vertex AI. This pioneering alliance enabled automatic descriptions of website images that previously failed to meet W3C web accessibility standards. They tirelessly worked to break down the barriers faced by visually impaired individuals, paving the way for a more inclusive digital environment.

The effectiveness of their innovation was tested in collaboration with Edward Yip, a visually impaired software engineer with special software and a keyboard from iEnterprise. They embarked on the journey to test their application, buoyed by their confidence in its potential to transform lives. He was using ChromeVox before and he didn’t take any time to learn the Google Gemini Pro Vision enhanced ChromeVox, since there is no change in the general screen reader usage style. In short, he can use the AI-enabled reader without any learning. The overwhelmingly positive feedback proved that visual impairments were no longer obstacles but stepping stones to an improved web surfing experience for potential users. After witnessing the positive result of their relentless work and effort to create a life-changing solution for people with special needs, the team grew more confident in the power of their transformative solution leveraged tech advancement for social good.

GDSC-HKIIT Fiona Chan and Edward Yip
GDSC-HKIIT Fiona Chan and Edward Yip

We would like to express our gratitude to Dreams Come True FoundationWork For All for conducting the second field test. Additionally, we are pleased to report that the reader received highly positive feedback from all users.

https://medium.com/media/11f8cd32ac7469baceea8409586bbbc9/href

However, they knew this was only the beginning. The road ahead was full of limitless opportunities for expansion and learning. The current limitation is the reader cannot capture the images if the website requires login. Because of security reasons of browsers, browsers do not allow sending the image to API in the background. We hope the Chrome team could consider releasing a special API for screen readers to access images.

Thank you Ms.Uchral Ganbaatar, a friend from The Developer Content Creators and Online Communities Summit, for the review and editorial!

This project is one of the top 100 projects of the Google Developer Student Clubs (GDSC) 2024 Solution Challenge, and finally we failed to get into the final 10 top to get further support from Google. But the team does still hope Google can add this project as a built-in feature into Chrome browser or even ChromeOS in the future!

It is fully opensource and free for everyone. For technical details, Enhancing Web Image Accessibility for Visually Impaired Individuals with Gemini Pro Vision and Google Cloud Platform

Background Story

https://medium.com/media/e70d7050d301198711fae21c8dc7e644/href

Managing Director and GM, Google Hong Kong Michael Yue keynote speech at Hong Kong Google Cloud Summit 2024 for the project.

https://medium.com/media/cb54a70b24125b940a1bbf39585c7caa/href

Field Test user’s highly positive feedback from Dreams Come True Foundation volunteers in Cantonese

https://medium.com/media/7e296c4793f4bf32e270298915ac2c7a/hrefhttps://medium.com/media/0e5f77f5fbf9cf2af0e45e7c1b12459b/hrefhttps://medium.com/media/0051a044f5d67ca7d687a8b67268a759/href

About the Author

Cyrus Wong is the senior lecturer of Hong Kong Institute of Information Technology and he focuses on teaching public Cloud technologies. A passionate advocate for cloud tech adoption in media and events — AWS Machine Learning Hero, Microsoft MVP — Azure, and Google Developer Expert — Google Cloud Platform & AL/ML (GenAI).


HKIIT Students use Gemini Pro Vision to develop AI Screen Reader was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
kotlin-coroutine-mechanisms:-launch-v.-async

Kotlin Coroutine mechanisms: launch v. async

Next Post
my-programming-journey

MY PROGRAMMING JOURNEY

Related Posts