yeniguno
bert-uncased-intent-classification
This is a fine-tuned BERT-based model for intent classification, capable of categorizing intents into 82 distinct labels. It was trained on a consolidated dataset of multilingual intent datasets. Natural Language Understanding (NLU) tasks. Classifying user intents for applications such as: - Voice assistants - Chatbots - Customer support automation - Conversational AI systems Bias, Risks, and Limitations The model's performance may degrade on intents that are underrepresented in the training data. Not optimized for languages other than English. Domain-specific intents not included in the dataset may require additional fine-tuning. his model was trained on a combination of intent datasets from various sources: - mteb/amazonmassiveintent - mteb/mtopintent - sonos-nlu-benchmark/snipsbuiltinintents - Mozilla/smartintentdataset - Bhuvaneshwari/intentclassification - clinc/clincoos Each dataset was preprocessed, and intent labels were consolidated into 82 unique classes. - Train size: 138228 - Validation size: 17279 - Test size: 17278 The model was fine-tuned with the following hyperparameters: Base Model: bert-base-uncased Learning Rate: 3e-5 Batch Size: 32 Epochs: 4 Weight Decay: 0.01 Evaluation Strategy: Per epoch Mixed Precision: FP32 Hardware: A100 Training and Validation: | Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | Precision | Recall | |-------|---------------|-----------------|----------|----------|-----------|--------| | 1 | 0.1143 | 0.1014 | 97.38% | 97.33% | 97.36% | 97.38% | | 2 | 0.0638 | 0.0833 | 97.78% | 97.79% | 97.83% | 97.78% | | 3 | 0.0391 | 0.0946 | 97.98% | 97.98% | 97.99% | 97.98% | | 4 | 0.0122 | 0.1013 | 98.04% | 98.04% | 98.05% | 98.04% | Test Results: | Metric | Value | |-------------|----------| | Loss | 0.0814 | | Accuracy| 98.37% | | F1 Score| 98.37% | | Precision| 98.38% | | Recall | 98.37% |
mbart50-turkish-grammar-corrector
nli-deberta-zero-shot-reviews-turkish-v1
bert-uncased-turkish-intent-classification
turkish-gibberish-detection-ft
🇹🇷 Turkish Gibberish Sentence Detection (Fine-Tuned) This model detects whether a given Turkish text is clean or gibberish. - Base model: `TURKCELL/gibberish-sentence-detection-model-tr` - Language: Turkish - Task: Binary Text Classification (Gibberish Detection) - Labels: - `0 → ok` — meaningful Turkish text - `1 → gibberish` — meaningless or noisy text (nonsense, random keyboard input, malformed words) This model is designed to be used in LLM guardrail systems as an input quality scanner. Since LLM inference is computationally and financially expensive, it is inefficient to process meaningless or malformed text. By running this model before sending user input to an LLM, you can automatically detect and filter gibberish or nonsensical text — preventing unnecessary API calls and improving overall system efficiency. Typical use cases include: - Pre-filtering user messages in chatbots or virtual assistants - Guardrail modules in enterprise LLM applications - Quality control for large-scale text ingestion pipelines - Spam and noise detection in user-generated content If the input is classified as gibberish, it can be safely discarded or handled separately without invoking the LLM. | Label | Count | Description | |:------|------:|:-------------| | 0 (ok) | 651,431 | valid, meaningful Turkish text | | 1 (gibberish) | 699,999 | random keyboard strings, misspelled or malformed text | All samples are lowercased and cleaned, with no newline or tab characters. | Split | Accuracy | Macro-F1 | F1(ok) | F1(gibberish) | |:------|:---------:|:--------:|:------:|:--------------:| | Base model | 0.6257 | 0.6254 | 0.61 | 0.64 | | Fine-tuned model | 0.7369 | 0.7340 | 0.76 | 0.71 | Test set size: 202,669 sentences Evaluation metrics: Accuracy, Macro-F1, per-class Precision/Recall/F1
roberta-turkish-bantopic-uncased
A multi-label text classification model based on TURKCELL/roberta-base-turkish-uncased, fine-tuned to detect 14 unsafe content categories in Turkish texts. The model is designed to serve as a guardrail safety filter for chatbots and other LLM-powered systems. | Property | Value | |-----------|-------| | Base model | `TURKCELL/roberta-base-turkish-uncased` | | Task | Multi-label classification (safety moderation) | | Language | Turkish | | Labels (unsafe topics) | siyaset, toplumsal cinsiyet, şiddet, din, suç, cinsellik, göç, kimlik, uluslararası ilişkiler, toplumsal eleştiri, bahis, ruhsal, zararlı madde, kişisel haklar | | Output | One or multiple unsafe topics triggered, or `SAFE` | | Thresholds | Class-specific, tuned on validation set using F2 optimization | You can load the model directly with a single pipeline: Option 2 — Pure Transformers (no remote code), apply thresholds yourself This model acts as a pre-filter or guardrail before sending user inputs to an LLM. It helps detect and block or flag text that contains or relates to sensitive categories such as violence, crime, drugs, sexual content, or discrimination. It is not a hate-speech classifier or a legal moderation system. It simply detects topic-level presence of unsafe domains. - Training data size: ~300k Turkish text samples - Positive (unsafe) examples: ~95k - Negative (safe) examples: ~205k - Loss function: BCEWithLogitsLoss with positive class weighting - Optimizer: AdamW (lr=2e-5) - Epochs: 3 - Batch size: 16 (train), 32 (eval) - Hardware: NVIDIA RTX 5090 (32GB) | Metric | Validation | Test | |:--|:--:|:--:| | Micro Precision | 0.35 | 0.34 | | Micro Recall | 0.83 | 0.83 | | Micro F1 | 0.49 | 0.48 | | Macro Precision | 0.29 | 0.24 | | Macro Recall | 0.63 | 0.60 | | Macro F1 | 0.38 | 0.34 |
turkish-abstractive-summary-mt5
bert-turkish-organization-ner-uncased
This model is a fine-tuned version of dbmdz/bert-base-turkish-128k-uncased on the yeniguno/turkish-organization-ner-dataset dataset. Unlike general NER models, it is trained only for organization detection (`ORG`). The labels are: - `O` (outside), - `B-ORG` (beginning of organization), - `I-ORG` (inside organization). You can load the model directly with the 🤗 `pipeline` API for NER: - Guardrails in LLM applications: detect and flag organization names in user prompts or model outputs. - Content filtering & compliance: e.g. anonymization, redaction, or entity-specific monitoring. - Analytics: extracting organization mentions from Turkish text for search, clustering, or knowledge graphs. It achieves the following results on the evaluation set: - Loss: 0.1152 - F1: 0.9159 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 16 - evalbatchsize: 32 - seed: 42 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: linear - lrschedulerwarmupratio: 0.1 - numepochs: 5 | Training Loss | Epoch | Step | Validation Loss | F1 | |:-------------:|:-----:|:-----:|:---------------:|:------:| | 0.0617 | 1.0 | 8080 | 0.0679 | 0.8990 | | 0.0471 | 2.0 | 16160 | 0.0640 | 0.9105 | | 0.0295 | 3.0 | 24240 | 0.0846 | 0.9110 | | 0.0277 | 4.0 | 32320 | 0.0959 | 0.9153 | | 0.0116 | 5.0 | 40400 | 0.1152 | 0.9159 | - Transformers 4.55.2 - Pytorch 2.9.0.dev20250816+cu128 - Datasets 4.0.0 - Tokenizers 0.21.4
democracy-sentiment-analysis-turkish-roberta
turkish-law-aqa-bart-finetuned
This model is a fine-tuned version of `facebook/mbart-large-50` for Abstractive Question Answering (QA) in Turkish, trained on a legal text dataset based on Turkish laws. Unlike extractive QA models that select exact spans from the context, this model generates natural, paraphrased answers. - Base Model: facebook/mbart-large-50 - Task: Abstractive Question Answering (Text-to-Text Generation) - Language: Turkish (`trTR`) - Dataset: Custom dataset containing legal texts from Turkish law. - Training Data: 8,630 examples for training and 959 for validation. - Fine-tuning Framework: Hugging Face Transformers (`Seq2SeqTrainer`) - Batch Size: 4 - Learning Rate: 5e-5 - Weight Decay: 0.01 - Epochs: 5 - Optimizer: AdamW - Scheduler: Linear warmup with decay
turkish-law-eqa-bert-finetuned
This model is fine-tuned for extractive question answering on Turkish legal texts. It can extract relevant spans from law articles based on user queries. - Training Set: 21,664 examples - Validation Set: 2,498 examples
opus-mt-tr-en-kafkaesque
absa-turkish-bert-dbmdz
gpt2-turkish-poem-generator
bert-ner-turkish-cased
tapas-base-wtq-balance-sheet-tuned
bertopic-turkish-political
opus-mt-en-tr-kafkaesque
turkish-code-detector
bert-turkish-agriculture-mlm
This is a domain-adapted version of `dbmdz/bert-base-turkish-cased`. We continued masked-language pre-training on the open-source `yeniguno/turkishagriculturecorpus` to bias the model toward Turkish agricultural vocabulary and discourse while retaining its general-language abilities.