Kurian Benoy – Vegam Whisper Family of Models and demoing Malayalam Speech to Text

What really matters!

faster-whisper is a reimplementation of OpenAI’s Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

An awesome library for optimizing ML models for production.
CTranslate2 is a C++ and Python library for efficient inference with Transformer models.
The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.

It had this utility for converting any whisper based model to faster-whisper like models.

ct2-transformers-converter \
 --model openai/whisper-tiny \
 --output_dir whisper-tiny-ct2

CTranslate2 supports various quantization formats like:

I used thennal/whisper-medium-ml to convert it to faster-whisper based models for Malayalam:

Vegam Whisper models hosted in huggingface

Oru Thai Nadam sang by Venugopal and Sreya, Lyrics by Sugathakumari

Output of clip from Video 1

Sang by Sithara Krishna Kumar, Lyrics by BK Hari Narayanan. This was a song created spontaneously at MBIFL 2023

Output of clip from Video 2

Pallakku is a Malayalam speech to text demo leveraging the model-weights of vegam-whisper-medium-ml.
Two options to try it out:

https://huggingface.co/spaces/kurianbenoy/Pallakku