AshiniR

1 models • 1 total models in database

Sort by:

hate-speech-and-offensive-message-classifier

A state-of-the-art hate speech and offensive message classifier built with the RoBERTa transformer model, fine-tuned on the Davidson et al. (2017) Twitter dataset. This model achieves exceptional performance with 0.9774 F1-score for Hate speech and offencive message detection and 96.23% overall accuracy, making it suitable for social media moderation, community platforms, and chat applications. 🤖 Transformer-based Architecture: Built on `roberta-base` for advanced natural language understanding ⚡ High Performance: 0.9774 F1-score for hate/offensive message detection, 96.23% overall accuracy 🔧 Hyperparameter Optimization: Automated tuning using Optuna framework ⚖️ Class Imbalance Handling: Weighted cross-entropy loss for fairness across labels 📊 Comprehensive Evaluation: Precision, Recall, F1-score, confusion matrix 🚀 Production Ready: Model + tokenizer saved in Hugging Face format for direct deployment Overall Accuracy: 96.23% Weighted F1-Score: 0.9621 Offensive/Hate F1-Score: 0.9774 ✅ (Exceeds 0.90 acceptance threshold) Offensive/Hate Precision: 97.49% Offensive/Hate Recall: 98% (High hate/offensive message detection rate) Neither Precision: 89.82% Neither Recall: 87.52% Generalizability 📊 Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data, 3718 messages) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting. Source: Hate Speech and Offensive Language Dataset (Davidson et al., 2017) Total Tweets: 24,783 Hate Speech / Offensive: 20620 Neutral: 4163 Average Tweet Length: ~86 characters Language: English Dataset Split: Training Set: 70% (17,348 tweets) – model training Validation Set: 15% (3,717 tweets) – hyperparameter tuning Test Set: 15% (3,718 tweets) – final evaluation on unseen data Preprocessing Steps: Label mapping: 0 = Neither, 1 = Hate/Offensive. Text cleaning. Train/validation/test split. Tokenization with RoBERTa tokenizer. Dynamic padding and truncation. Base Model: `FacebokAI/roberta-base` (Hugging Face Transformers) Task: Multi-class sequence classification (2 labels) Fine-tuning: Custom classification head with 2 outputs Tokenization: RoBERTa tokenizer with optimal sequence length 1. Data Preprocessing: Hate/offencive message cleaning and label encoding 2. Tokenization: Dynamic padding with optimal max length 3. Class Balancing: Weighted loss function to handle imbalanced dataset 4. Hyperparameter Optimization: Optuna-based automated tuning 5. Evaluation: Comprehensive metrics on held-out test set Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2) Learning rate: 1e-5 to 5e-5 range Weight decay: 0.0 to 0.1 regularization Batch size: 8, 16, or 32 samples Gradient accumulation steps: 1 to 4 Training epochs: 2 to 5 epochs Warmup ratio: 0.05 to 0.1 for learning rate scheduling Hidden Dropout: `0.13034059066330464` Attention Dropout: `0.1935379847495239` Learning Rate: `1.031409901695853e-05` Weight Decay: `0.03606621145317628` Batch Size: `16` Gradient Accumulation: `1` Epochs: `2` Warmup Ratio: `0.0718442228846798` | | Predicted Neither | Predicted Offensive/Hate | |---------------------|-------------------|--------------------------| | Actual Neither | 547 | 78 | | Actual Offensive| 62 | 3031 | True Positives (Hate/Offensive correctly identified): 3031 True Negatives (Neutral correctly identified): 547 False Positives (Neutral incorrectly flagged): 78 False Negatives (Hate/offensive missed): 62 Use Cases This hate/offensive massege classifier is ideal for: Messaging Platforms Discord bot moderation (Primary use case) SMS filtering systems Chat application content filtering Content Moderation Social media platforms Comment section filtering User-generated content screening If you use this model in your research or application, please cite:

license:apache-2.0

135