jacktol

4 models • 1 total models in database

Sort by:

whisper-large-v3-finetuned-for-ATC

This model is a fine-tuned version of OpenAI's Whisper Large v3 model, specifically trained on Air Traffic Control (ATC) communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, achieving a Word Error Rate (WER) of 6.5% on the test set. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications. - Base Model: OpenAI Large v3 - Fine-tuned Model WER: 6.5% This fine-tuned model is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from: - ATC ASR Dataset The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks. The fine-tuned Whisper model is designed for: - Transcribing aviation communication: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing. - Air Traffic Control Systems: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness. - Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety. - Hardware: Fine-tuning was conducted on two H100 SXM5 GPUs with 80GB VRAM. - Epochs: 3.25 - Learning Rate: 1e-5 - Batch Size: 10 with no gradient accumulation - Augmentation: Offline data augmentation techniques were utilized in the training set (Gaussian noise, pitch shifting, etc.). - Evaluation Metric: Word Error Rate (WER) While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered or properly represented during training.

NaNK

license:mit

451

Whisper Medium.En Fine Tuned For ATC

Whisper Medium EN Fine-Tuned for Air Traffic Control (ATC) This model is deprecated. A newer, larger, and better-performing model is now available, achieving a test set Word Error Rate (WER) of 6.5%, a significant improvement compared to this repository’s model with a 15.08% WER. This model is a fine-tuned version of OpenAI's Whisper Medium EN model, specifically trained on Air Traffic Control (ATC) communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, reducing the Word Error Rate (WER) by 84%, compared to the original pretrained model. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications. - Base Model: OpenAI Whisper Medium EN - Fine-tuned Model WER: 15.08% - Pretrained Model WER: 94.59% - Relative Improvement: 84.06% You can access the fine-tuned model on Hugging Face: - Whisper Medium EN Fine-Tuned for ATC - Whisper Medium EN Fine-Tuned for ATC (Faster Whisper) Whisper Medium EN fine-tuned for ATC is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from: - ATCO2 corpus (1-hour test subset) - UWB-ATCC corpus The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks. The fine-tuned Whisper model is designed for: - Transcribing aviation communication: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing. - Air Traffic Control Systems: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness. - Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety. You can test the model online using the ATC Transcription Assistant, which lets you upload audio files and generate transcriptions. - Hardware: Fine-tuning was conducted on two A100 GPUs with 80GB memory. - Epochs: 10 - Learning Rate: 1e-5 - Batch Size: 32 (effective batch size with gradient accumulation) - Augmentation: Dynamic data augmentation techniques (Gaussian noise, pitch shifting, etc.) were applied during training. - Evaluation Metric: Word Error Rate (WER) While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered during training. - Blog Post: Fine-Tuning Whisper for ATC: 84% Improvement in Transcription Accuracy - GitHub Repository: Fine-Tuning Whisper on ATC Data

license:mit

347

atc-pilot-speaker-role-classification-model

This is a binary sequence classification model designed to determine whether a given air traffic communication utterance originates from a pilot or an air traffic controller (ATC), based on text alone. Traditionally, speaker role attribution in air traffic communication relies on acoustic features such as voice characteristics and channel separation. This model departs from that convention by tackling the task entirely in the text domain, using a transformer-based architecture fine-tuned for speaker role prediction. The model performs binary classification on single-turn utterances to assign one of two speaker roles: It is fine-tuned using a DeBERTa-v3-large model on manually processed and labeled air traffic communication transcripts. The model achieves the following results on the test set: - Accuracy: 96.64% - Precision: 96.40% - Recall: 96.91% - F1 Score: 96.65% A custom preprocessing pipeline was used to prepare the training data, including: - Speaker attribution heuristics based on known call sign and phrase patterns - Phrase normalization - Text standardization - Filtering of irrelevant utterances - Dataset balancing Each utterance is treated independently and labeled for speaker role classification. - Base model: `microsoft/deberta-v3-large` - Task type: `SequenceClassification` (`numlabels=2`) - Training setup: - Trained on 2x H100 80GB SXM5 - Cosine learning rate schedule with warmup (10%) - Batch size: 128 - Early stopping based on F1 score - Max sequence length: 256 tokens - Mixed-precision training (FP16) - Evaluation every 200 steps - Speaker role tagging in ATC communication transcripts - Preprocessing for multi-modal ATC systems - Filtering or structuring large corpora of aviation text for downstream tasks - Operates on single-turn utterances only; no turn-level or dialogue context is used - Ambiguous transmissions like "ROGER" or "THANK YOU" may be difficult to classify using text alone - Additional modalities (e.g., audio features, metadata) may be required for full disambiguation This model improves upon prior transformer-based models for text-only speaker role classification. For comparison, a related model by Juan Zuluaga-Gomez, based on BERT-base, achieved the following: - Accuracy: 89.03% - Precision: 87.10% - Recall: 91.63% - F1 Score: 89.31% The fine-tuned DeBERTa-v3-large model presented here significantly outperforms this baseline: - Accuracy: 96.64% - Precision: 96.40% - Recall: 96.91% - F1 Score: 96.65% Jupyter notebooks are included to reproduce and compare evaluations: - `evaluatejuansmodel.ipynb` - `evaluatejacksmodel.ipynb` These evaluate both models using the same test set and print detailed classification metrics. - Juan Zuluaga-Gomez – Hugging Face Model - DeBERTa: Decoding-enhanced BERT with Disentangled Attention - GitHub Repository – ATC Pilot Speaker Role Classification Task

license:mit

143

whisper-medium.en-fine-tuned-for-ATC-faster-whisper

license:mit