In developing an application for sign language detection, specifically for Alphabet Sign Language and Indonesian Sign Language (BISINDO), machine learning can help create a model that automatically recognizes hand gestures from images or videos. For this implementation, we will use Google Colab as a computing platform that supports data processing and model training in the cloud.
The first part of this project involves building a model for Alphabet Sign Language Detection. We will construct a machine-learning model to recognize alphabetic characters in sign language using hand images as input. This model will use a Convolutional Neural Network (CNN) to process the images and generate accurate predictions.
Building the Alphabet Sign Language Model
We must import several libraries (modules) that handle various tasks such as file management, data processing, model training, and evaluation to start building this model. Below is a brief explanation of the libraries we will use:
- os: This library manages operating system operations such as navigating directories, creating folders, or manipulating files.
- numpy: Numpy provides functions for efficient array manipulation and mathematical operations. It will help process image data before feeding it into the model.
- pandas: This library handles structured data, such as reading CSV files or displaying data in tabular form.
- zipfile: Useful for handling ZIP files, which are often used for compressing or extracting datasets required for training.
- shutil: Used to manage files and folders such as copying, moving, or deleting unnecessary files.
- tensorflow: This robust machine learning framework is used to build, train, and evaluate neural network models. TensorFlow provides a wide range of tools to support the creation of deep learning models.
- ImageDataGenerator (tensorflow.keras.preprocessing.image): This feature assists in image data augmentation, increasing dataset variability with transformations such as rotation, zooming, cropping, or flipping images. This is important to improve the model’s accuracy in recognizing different variations of input images.
- down: This library downloads files from Google Drive using Python. It is often used to obtain datasets or additional files needed during training.
With this combination of libraries, we will manage the dataset, preprocess the images, and train the Alphabet Sign Language detection model in Google Colab.
Alphabet Model
First, we need to import several libraries that will be used, such as os for handling system operations like navigating files and folders, numpy for manipulating arrays and performing efficient mathematical operations, pandas for processing structured data such as tables, zipfile for managing ZIP files (compression and extraction), shutil for managing files and folders (copying, moving, and deleting), tensorflow as a framework for building and training machine learning models, ImageDataGenerator (from tensorflow.keras.preprocessing.image) for generating augmented images for model training, and gdown for downloading files from Google Drive using Python.
import os
import numpy as np
import pandas as pd
import zipfile
import shutil
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import gdown
After importing the libraries, we need to download the dataset. The dataset used for this model can be found at https://universe.roboflow.com/ikado/bisindo-revisi/dataset/2 for the alphabet dataset. This dataset will be downloaded and uploaded to Google Drive to be directly downloaded into Google Colab.
file_url = 'https://drive.google.com/uc?id=15VGqn4FHAf-XjTayb7iEvIZ9i52xYUvp'
output_zip_path = "https://medium.com/content/gerakan_bisindo.zip"
gdown.download(file_url, output_zip_path, quiet=False)
extracted_path = "https://medium.com/content/dataImages"
os.makedirs(extracted_path, exist_ok=True)
with zipfile.ZipFile(output_zip_path, 'r') as zip_ref: zip_ref.extractall(extracted_path)
After the images are loaded into the folder, we will create directories for training, validation, and testing that contain hand images representing alphabet sign language from BISINDO, using Mediapipe to facilitate the training process.
# Initialize the Mediapipe hand model
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=True, max_num_hands=2, min_detection_confidence=0.2)
mp_drawing = mp.solutions.drawing_utils
# Old and new directories
train_dir = "https://medium.com/content/dataImages/train"
valid_dir = "https://medium.com/content/dataImages/valid"
test_dir = "https://medium.com/content/dataImages/test"
new_train_dir = "https://medium.com/content/dataImagesNew/train"
new_valid_dir = "https://medium.com/content/dataImagesNew/valid"
new_test_dir = "https://medium.com/content/dataImagesNew/test"
# Padding for the bounding box (e.g., 40 pixels)
padding = 40
def detect_single_hand(img):
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
results = hands.process(img_rgb)
if results.multi_hand_landmarks:
h, w, _ = img.shape
x_min_total = w
y_min_total = h
x_max_total = 0
y_max_total = 0
for hand_landmarks in results.multi_hand_landmarks:
x_max = int(max([lm.x for lm in hand_landmarks.landmark]) * w)
x_min = int(min([lm.x for lm in hand_landmarks.landmark]) * w)
y_max = int(max([lm.y for lm in hand_landmarks.landmark]) * h)
y_min = int(min([lm.y for lm in hand_landmarks.landmark]) * h)
x_min_total = min(x_min_total, x_min)
y_min_total = min(y_min_total, y_min)
x_max_total = max(x_max_total, x_max)
y_max_total = max(y_max_total, y_max)
x_min_total = max(0, x_min_total - padding)
y_min_total = max(0, y_min_total - padding)
x_max_total = min(w, x_max_total + padding)
y_max_total = min(h, y_max_total + padding)
return [(x_min_total, y_min_total, x_max_total, y_max_total)]
return []
# Function to detect hands and save images
def process_images(input_dir, output_dir):
# Iterate over each subfolder and image file in the input directory
for root, dirs, files in os.walk(input_dir):
# Compute the relative path to maintain folder structure
relative_path = os.path.relpath(root, input_dir)
output_subdir = os.path.join(output_dir, relative_path)
# Create subfolders in the output directory according to the input folder structure
if not os.path.exists(output_subdir):
os.makedirs(output_subdir)
# Process each image file
for img_file in files:
img_path = os.path.join(root, img_file)
img = cv2.imread(img_path)
if img is None:
continue
hand_boxes = detect_single_hand(img)
if hand_boxes:
for (x_min, y_min, x_max, y_max) in hand_boxes:
# Check if the bounding box is valid
if x_max > x_min and y_max > y_min:
# Crop the image based on the bounding box with padding
hand_img = img[y_min:y_max, x_min:x_max]
# Check if the cropped image is not empty
if hand_img.size > 0:
# Save the processed image to the output directory maintaining the subfolder structure
output_img_path = os.path.join(output_subdir, img_file)
cv2.imwrite(output_img_path, hand_img)
# Process images in each main folder (train, valid, test)
process_images(train_dir, new_train_dir)
process_images(valid_dir, new_valid_dir)
process_images(test_dir, new_test_dir)
# Close the model after completion
hands.close()
We will preprocess the images after they have been separated into new directories for training, validation, and testing.
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
new_train_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary'
)
validation_generator = train_datagen.flow_from_directory(
new_valid_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary'
)
test_generator = test_datagen.flow_from_directory(
new_test_dir,
target_size=(224, 224),
batch_size=32,
class_mode='binary'
)
The following is the result of the splitting and preprocessing:
- Found 6131 images belonging to 26 classes.
- Found 1313 images belonging to 26 classes.
- Found 1348 images belonging to 26 classes.
Next, we will train the model using transfer learning with InceptionV3 as the base model and output 21 classes of BISINDO gestures. We will then fine-tune the results. Finally, we will download the model.
from keras.applications import InceptionV3
# Load the base model
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freeze the base model initially
for layer in base_model.layers:
layer.trainable = False
# Add custom layers on top
model = tf.keras.models.Sequential([
base_model,
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(26, activation='softmax'),
])
# Compile the model
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=['accuracy']
)
# Train the model before fine-tuning
history = model.fit(train_generator, epochs=10, validation_data=validation_generator)
# Fine-tuning: Unfreeze some layers in the base model
for layer in base_model.layers[-50:]: # Unfreeze the last 50 layers
layer.trainable = True
# Recompile the model with a lower learning rate for fine-tuning
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
loss="sparse_categorical_crossentropy",
metrics=['accuracy']
)
# Fine-tune the model
fine_tune_history = model.fit(train_generator, epochs=10, validation_data=validation_generator)
# Download model
model.save("modelAlphabet.h5")
Training Alphabet Indonesian Sign Language (Bisindo) Detection Model was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.