Software

4 minute read

Training Alphabet Indonesian Sign Language (Bisindo) Detection Model

November 14, 2024

In developing an application for sign language detection, specifically for Alphabet Sign Language and Indonesian Sign Language (BISINDO), machine learning can help create a model that automatically recognizes hand gestures from images or videos. For this implementation, we will use Google Colab as a computing platform that supports data processing and model training in the cloud.

The first part of this project involves building a model for Alphabet Sign Language Detection. We will construct a machine-learning model to recognize alphabetic characters in sign language using hand images as input. This model will use a Convolutional Neural Network (CNN) to process the images and generate accurate predictions.

Building the Alphabet Sign Language Model

We must import several libraries (modules) that handle various tasks such as file management, data processing, model training, and evaluation to start building this model. Below is a brief explanation of the libraries we will use:

os: This library manages operating system operations such as navigating directories, creating folders, or manipulating files.
numpy: Numpy provides functions for efficient array manipulation and mathematical operations. It will help process image data before feeding it into the model.
pandas: This library handles structured data, such as reading CSV files or displaying data in tabular form.
zipfile: Useful for handling ZIP files, which are often used for compressing or extracting datasets required for training.
shutil: Used to manage files and folders such as copying, moving, or deleting unnecessary files.
tensorflow: This robust machine learning framework is used to build, train, and evaluate neural network models. TensorFlow provides a wide range of tools to support the creation of deep learning models.
ImageDataGenerator (tensorflow.keras.preprocessing.image): This feature assists in image data augmentation, increasing dataset variability with transformations such as rotation, zooming, cropping, or flipping images. This is important to improve the model’s accuracy in recognizing different variations of input images.
down: This library downloads files from Google Drive using Python. It is often used to obtain datasets or additional files needed during training.

With this combination of libraries, we will manage the dataset, preprocess the images, and train the Alphabet Sign Language detection model in Google Colab.

Alphabet Model

First, we need to import several libraries that will be used, such as os for handling system operations like navigating files and folders, numpy for manipulating arrays and performing efficient mathematical operations, pandas for processing structured data such as tables, zipfile for managing ZIP files (compression and extraction), shutil for managing files and folders (copying, moving, and deleting), tensorflow as a framework for building and training machine learning models, ImageDataGenerator (from tensorflow.keras.preprocessing.image) for generating augmented images for model training, and gdown for downloading files from Google Drive using Python.

import os
import numpy as np
import pandas as pd
import zipfile
import shutil
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import gdown

After importing the libraries, we need to download the dataset. The dataset used for this model can be found at https://universe.roboflow.com/ikado/bisindo-revisi/dataset/2 for the alphabet dataset. This dataset will be downloaded and uploaded to Google Drive to be directly downloaded into Google Colab.

file_url = 'https://drive.google.com/uc?id=15VGqn4FHAf-XjTayb7iEvIZ9i52xYUvp'
output_zip_path = "https://medium.com/content/gerakan_bisindo.zip"
gdown.download(file_url, output_zip_path, quiet=False)
extracted_path = "https://medium.com/content/dataImages"
os.makedirs(extracted_path, exist_ok=True)

with zipfile.ZipFile(output_zip_path, 'r') as zip_ref: zip_ref.extractall(extracted_path)

After the images are loaded into the folder, we will create directories for training, validation, and testing that contain hand images representing alphabet sign language from BISINDO, using Mediapipe to facilitate the training process.

# Initialize the Mediapipe hand model
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=True, max_num_hands=2, min_detection_confidence=0.2)
mp_drawing = mp.solutions.drawing_utils

# Old and new directories
train_dir = "https://medium.com/content/dataImages/train"
valid_dir = "https://medium.com/content/dataImages/valid"
test_dir = "https://medium.com/content/dataImages/test"

new_train_dir = "https://medium.com/content/dataImagesNew/train"
new_valid_dir = "https://medium.com/content/dataImagesNew/valid"
new_test_dir = "https://medium.com/content/dataImagesNew/test"

# Padding for the bounding box (e.g., 40 pixels)
padding = 40

def detect_single_hand(img):
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = hands.process(img_rgb)
    if results.multi_hand_landmarks:
        h, w, _ = img.shape
        x_min_total = w
        y_min_total = h
        x_max_total = 0
        y_max_total = 0

        for hand_landmarks in results.multi_hand_landmarks:
            x_max = int(max([lm.x for lm in hand_landmarks.landmark]) * w)
            x_min = int(min([lm.x for lm in hand_landmarks.landmark]) * w)
            y_max = int(max([lm.y for lm in hand_landmarks.landmark]) * h)
            y_min = int(min([lm.y for lm in hand_landmarks.landmark]) * h)

            x_min_total = min(x_min_total, x_min)
            y_min_total = min(y_min_total, y_min)
            x_max_total = max(x_max_total, x_max)
            y_max_total = max(y_max_total, y_max)

        x_min_total = max(0, x_min_total - padding)
        y_min_total = max(0, y_min_total - padding)
        x_max_total = min(w, x_max_total + padding)
        y_max_total = min(h, y_max_total + padding)

        return [(x_min_total, y_min_total, x_max_total, y_max_total)]
    return []

# Function to detect hands and save images
def process_images(input_dir, output_dir):
    # Iterate over each subfolder and image file in the input directory
    for root, dirs, files in os.walk(input_dir):
        # Compute the relative path to maintain folder structure
        relative_path = os.path.relpath(root, input_dir)
        output_subdir = os.path.join(output_dir, relative_path)

        # Create subfolders in the output directory according to the input folder structure
        if not os.path.exists(output_subdir):
            os.makedirs(output_subdir)

        # Process each image file
        for img_file in files:
            img_path = os.path.join(root, img_file)
            img = cv2.imread(img_path)

            if img is None:
                continue

            hand_boxes = detect_single_hand(img)

            if hand_boxes:
                for (x_min, y_min, x_max, y_max) in hand_boxes:
                    # Check if the bounding box is valid
                    if x_max > x_min and y_max > y_min:
                        # Crop the image based on the bounding box with padding
                        hand_img = img[y_min:y_max, x_min:x_max]

                        # Check if the cropped image is not empty
                        if hand_img.size > 0:
                            # Save the processed image to the output directory maintaining the subfolder structure
                            output_img_path = os.path.join(output_subdir, img_file)
                            cv2.imwrite(output_img_path, hand_img)

# Process images in each main folder (train, valid, test)
process_images(train_dir, new_train_dir)
process_images(valid_dir, new_valid_dir)
process_images(test_dir, new_test_dir)

# Close the model after completion
hands.close()

We will preprocess the images after they have been separated into new directories for training, validation, and testing.

train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    new_train_dir,
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

validation_generator = train_datagen.flow_from_directory(
    new_valid_dir,
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

test_generator = test_datagen.flow_from_directory(
    new_test_dir,
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

The following is the result of the splitting and preprocessing:

Found 6131 images belonging to 26 classes.
Found 1313 images belonging to 26 classes.
Found 1348 images belonging to 26 classes.

Next, we will train the model using transfer learning with InceptionV3 as the base model and output 21 classes of BISINDO gestures. We will then fine-tune the results. Finally, we will download the model.

from keras.applications import InceptionV3

# Load the base model
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze the base model initially
for layer in base_model.layers:
    layer.trainable = False

# Add custom layers on top
model = tf.keras.models.Sequential([
    base_model,
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(26, activation='softmax'),
])

# Compile the model
model.compile(optimizer="adam",
             loss="sparse_categorical_crossentropy",
              metrics=['accuracy']
              )

# Train the model before fine-tuning
history = model.fit(train_generator, epochs=10, validation_data=validation_generator)

# Fine-tuning: Unfreeze some layers in the base model
for layer in base_model.layers[-50:]:  # Unfreeze the last 50 layers
    layer.trainable = True

# Recompile the model with a lower learning rate for fine-tuning
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
              loss="sparse_categorical_crossentropy",
              metrics=['accuracy']
             )

# Fine-tune the model
fine_tune_history = model.fit(train_generator, epochs=10, validation_data=validation_generator)
# Download model
model.save("modelAlphabet.h5")

Training Alphabet Indonesian Sign Language (Bisindo) Detection Model was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Retrieval-Augmented Generation: Unlocking Gemini and LangChain

November 14, 2024

Product Management

From CFNo to CFGrow: 3 ways SaaS CFOs can influence GTM strategies in 2025

November 14, 2024

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Teledyne Linea HS2 8k Camera

The limits of horizontal PMM playbooks in vertical markets

Best Railway Alternatives for AI Apps in 2026

Trending Tags

Training Alphabet Indonesian Sign Language (Bisindo) Detection Model

Leave a Reply Cancel reply

Previous Post

Retrieval-Augmented Generation: Unlocking Gemini and LangChain

Next Post

From CFNo to CFGrow: 3 ways SaaS CFOs can influence GTM strategies in 2025

Training Alphabet Indonesian Sign Language (Bisindo) Detection Model

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts