hate-speech-and-offensive-message-classifier
A state-of-the-art hate speech and offensive message classifier built with the RoBERTa transformer model, fine-tuned on the Davidson et al. (2017) Twitter dataset. This model achieves exceptional performance with 0.9774 F1-score for Hate speech and offencive message detection and 96.23% overall accuracy, making it suitable for social media moderation, community platforms, and chat applications.
š¤ Transformer-based Architecture: Built on `roberta-base` for advanced natural language understanding ā” High Performance: 0.9774 F1-score for hate/offensive message detection, 96.23% overall accuracy š§ Hyperparameter Optimization: Automated tuning using Optuna framework āļø Class Imbalance Handling: Weighted cross-entropy loss for fairness across labels š Comprehensive Evaluation: Precision, Recall, F1-score, confusion matrix š Production Ready: Model + tokenizer saved in Hugging Face format for direct deployment
Overall Accuracy: 96.23% Weighted F1-Score: 0.9621 Offensive/Hate F1-Score: 0.9774 ā
(Exceeds 0.90 acceptance threshold) Offensive/Hate Precision: 97.49% Offensive/Hate Recall: 98% (High hate/offensive message detection rate) Neither Precision: 89.82% Neither Recall: 87.52%
Generalizability š Strong Generalization: All performance metrics are evaluated on a completely unseen test set (15% of data, 3718 messages) that was never used during training or hyperparameter tuning, ensuring robust real-world performance and preventing overfitting.
Source: Hate Speech and Offensive Language Dataset (Davidson et al., 2017)
Total Tweets: 24,783 Hate Speech / Offensive: 20620 Neutral: 4163 Average Tweet Length: ~86 characters Language: English
Dataset Split: Training Set: 70% (17,348 tweets) ā model training Validation Set: 15% (3,717 tweets) ā hyperparameter tuning Test Set: 15% (3,718 tweets) ā final evaluation on unseen data
Preprocessing Steps: Label mapping: 0 = Neither, 1 = Hate/Offensive. Text cleaning. Train/validation/test split. Tokenization with RoBERTa tokenizer. Dynamic padding and truncation.
Base Model: `FacebokAI/roberta-base` (Hugging Face Transformers) Task: Multi-class sequence classification (2 labels) Fine-tuning: Custom classification head with 2 outputs Tokenization: RoBERTa tokenizer with optimal sequence length
1. Data Preprocessing: Hate/offencive message cleaning and label encoding 2. Tokenization: Dynamic padding with optimal max length 3. Class Balancing: Weighted loss function to handle imbalanced dataset 4. Hyperparameter Optimization: Optuna-based automated tuning 5. Evaluation: Comprehensive metrics on held-out test set
Dropout rates: Hidden dropout (0.1-0.3), Attention dropout (0.1-0.2) Learning rate: 1e-5 to 5e-5 range Weight decay: 0.0 to 0.1 regularization Batch size: 8, 16, or 32 samples Gradient accumulation steps: 1 to 4 Training epochs: 2 to 5 epochs Warmup ratio: 0.05 to 0.1 for learning rate scheduling
Hidden Dropout: `0.13034059066330464` Attention Dropout: `0.1935379847495239` Learning Rate: `1.031409901695853e-05` Weight Decay: `0.03606621145317628` Batch Size: `16` Gradient Accumulation: `1` Epochs: `2` Warmup Ratio: `0.0718442228846798`
| | Predicted Neither | Predicted Offensive/Hate | |---------------------|-------------------|--------------------------| | Actual Neither | 547 | 78 | | Actual Offensive| 62 | 3031 |
True Positives (Hate/Offensive correctly identified): 3031 True Negatives (Neutral correctly identified): 547 False Positives (Neutral incorrectly flagged): 78 False Negatives (Hate/offensive missed): 62
Use Cases This hate/offensive massege classifier is ideal for:
Messaging Platforms Discord bot moderation (Primary use case) SMS filtering systems Chat application content filtering Content Moderation Social media platforms Comment section filtering User-generated content screening
If you use this model in your research or application, please cite: