yeniguno

20 models • 1 total models in database

Sort by:

bert-uncased-intent-classification

This is a fine-tuned BERT-based model for intent classification, capable of categorizing intents into 82 distinct labels. It was trained on a consolidated dataset of multilingual intent datasets. Natural Language Understanding (NLU) tasks. Classifying user intents for applications such as: - Voice assistants - Chatbots - Customer support automation - Conversational AI systems Bias, Risks, and Limitations The model's performance may degrade on intents that are underrepresented in the training data. Not optimized for languages other than English. Domain-specific intents not included in the dataset may require additional fine-tuning. his model was trained on a combination of intent datasets from various sources: - mteb/amazonmassiveintent - mteb/mtopintent - sonos-nlu-benchmark/snipsbuiltinintents - Mozilla/smartintentdataset - Bhuvaneshwari/intentclassification - clinc/clincoos Each dataset was preprocessed, and intent labels were consolidated into 82 unique classes. - Train size: 138228 - Validation size: 17279 - Test size: 17278 The model was fine-tuned with the following hyperparameters: Base Model: bert-base-uncased Learning Rate: 3e-5 Batch Size: 32 Epochs: 4 Weight Decay: 0.01 Evaluation Strategy: Per epoch Mixed Precision: FP32 Hardware: A100 Training and Validation: | Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | Precision | Recall | |-------|---------------|-----------------|----------|----------|-----------|--------| | 1 | 0.1143 | 0.1014 | 97.38% | 97.33% | 97.36% | 97.38% | | 2 | 0.0638 | 0.0833 | 97.78% | 97.79% | 97.83% | 97.78% | | 3 | 0.0391 | 0.0946 | 97.98% | 97.98% | 97.99% | 97.98% | | 4 | 0.0122 | 0.1013 | 98.04% | 98.04% | 98.05% | 98.04% | Test Results: | Metric | Value | |-------------|----------| | Loss | 0.0814 | | Accuracy| 98.37% | | F1 Score| 98.37% | | Precision| 98.38% | | Recall | 98.37% |

license:apache-2.0

1,611

turkish-gibberish-detection-ft

🇹🇷 Turkish Gibberish Sentence Detection (Fine-Tuned) This model detects whether a given Turkish text is clean or gibberish. - Base model: `TURKCELL/gibberish-sentence-detection-model-tr` - Language: Turkish - Task: Binary Text Classification (Gibberish Detection) - Labels: - `0 → ok` — meaningful Turkish text - `1 → gibberish` — meaningless or noisy text (nonsense, random keyboard input, malformed words) This model is designed to be used in LLM guardrail systems as an input quality scanner. Since LLM inference is computationally and financially expensive, it is inefficient to process meaningless or malformed text. By running this model before sending user input to an LLM, you can automatically detect and filter gibberish or nonsensical text — preventing unnecessary API calls and improving overall system efficiency. Typical use cases include: - Pre-filtering user messages in chatbots or virtual assistants - Guardrail modules in enterprise LLM applications - Quality control for large-scale text ingestion pipelines - Spam and noise detection in user-generated content If the input is classified as gibberish, it can be safely discarded or handled separately without invoking the LLM. | Label | Count | Description | |:------|------:|:-------------| | 0 (ok) | 651,431 | valid, meaningful Turkish text | | 1 (gibberish) | 699,999 | random keyboard strings, misspelled or malformed text | All samples are lowercased and cleaned, with no newline or tab characters. | Split | Accuracy | Macro-F1 | F1(ok) | F1(gibberish) | |:------|:---------:|:--------:|:------:|:--------------:| | Base model | 0.6257 | 0.6254 | 0.61 | 0.64 | | Fine-tuned model | 0.7369 | 0.7340 | 0.76 | 0.71 | Test set size: 202,669 sentences Evaluation metrics: Accuracy, Macro-F1, per-class Precision/Recall/F1

license:apache-2.0

roberta-turkish-bantopic-uncased

A multi-label text classification model based on TURKCELL/roberta-base-turkish-uncased, fine-tuned to detect 14 unsafe content categories in Turkish texts. The model is designed to serve as a guardrail safety filter for chatbots and other LLM-powered systems. | Property | Value | |-----------|-------| | Base model | `TURKCELL/roberta-base-turkish-uncased` | | Task | Multi-label classification (safety moderation) | | Language | Turkish | | Labels (unsafe topics) | siyaset, toplumsal cinsiyet, şiddet, din, suç, cinsellik, göç, kimlik, uluslararası ilişkiler, toplumsal eleştiri, bahis, ruhsal, zararlı madde, kişisel haklar | | Output | One or multiple unsafe topics triggered, or `SAFE` | | Thresholds | Class-specific, tuned on validation set using F2 optimization | You can load the model directly with a single pipeline: Option 2 — Pure Transformers (no remote code), apply thresholds yourself This model acts as a pre-filter or guardrail before sending user inputs to an LLM. It helps detect and block or flag text that contains or relates to sensitive categories such as violence, crime, drugs, sexual content, or discrimination. It is not a hate-speech classifier or a legal moderation system. It simply detects topic-level presence of unsafe domains. - Training data size: ~300k Turkish text samples - Positive (unsafe) examples: ~95k - Negative (safe) examples: ~205k - Loss function: BCEWithLogitsLoss with positive class weighting - Optimizer: AdamW (lr=2e-5) - Epochs: 3 - Batch size: 16 (train), 32 (eval) - Hardware: NVIDIA RTX 5090 (32GB) | Metric | Validation | Test | |:--|:--:|:--:| | Micro Precision | 0.35 | 0.34 | | Micro Recall | 0.83 | 0.83 | | Micro F1 | 0.49 | 0.48 | | Macro Precision | 0.29 | 0.24 | | Macro Recall | 0.63 | 0.60 | | Macro F1 | 0.38 | 0.34 |

license:apache-2.0

turkish-abstractive-summary-mt5

license:apache-2.0

bert-turkish-organization-ner-uncased

This model is a fine-tuned version of dbmdz/bert-base-turkish-128k-uncased on the yeniguno/turkish-organization-ner-dataset dataset. Unlike general NER models, it is trained only for organization detection (`ORG`). The labels are: - `O` (outside), - `B-ORG` (beginning of organization), - `I-ORG` (inside organization). You can load the model directly with the 🤗 `pipeline` API for NER: - Guardrails in LLM applications: detect and flag organization names in user prompts or model outputs. - Content filtering & compliance: e.g. anonymization, redaction, or entity-specific monitoring. - Analytics: extracting organization mentions from Turkish text for search, clustering, or knowledge graphs. It achieves the following results on the evaluation set: - Loss: 0.1152 - F1: 0.9159 The following hyperparameters were used during training: - learningrate: 5e-05 - trainbatchsize: 16 - evalbatchsize: 32 - seed: 42 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: linear - lrschedulerwarmupratio: 0.1 - numepochs: 5 | Training Loss | Epoch | Step | Validation Loss | F1 | |:-------------:|:-----:|:-----:|:---------------:|:------:| | 0.0617 | 1.0 | 8080 | 0.0679 | 0.8990 | | 0.0471 | 2.0 | 16160 | 0.0640 | 0.9105 | | 0.0295 | 3.0 | 24240 | 0.0846 | 0.9110 | | 0.0277 | 4.0 | 32320 | 0.0959 | 0.9153 | | 0.0116 | 5.0 | 40400 | 0.1152 | 0.9159 | - Transformers 4.55.2 - Pytorch 2.9.0.dev20250816+cu128 - Datasets 4.0.0 - Tokenizers 0.21.4

license:mit

democracy-sentiment-analysis-turkish-roberta

license:mit

turkish-law-aqa-bart-finetuned

This model is a fine-tuned version of `facebook/mbart-large-50` for Abstractive Question Answering (QA) in Turkish, trained on a legal text dataset based on Turkish laws. Unlike extractive QA models that select exact spans from the context, this model generates natural, paraphrased answers. - Base Model: facebook/mbart-large-50 - Task: Abstractive Question Answering (Text-to-Text Generation) - Language: Turkish (`trTR`) - Dataset: Custom dataset containing legal texts from Turkish law. - Training Data: 8,630 examples for training and 959 for validation. - Fine-tuning Framework: Hugging Face Transformers (`Seq2SeqTrainer`) - Batch Size: 4 - Learning Rate: 5e-5 - Weight Decay: 0.01 - Epochs: 5 - Optimizer: AdamW - Scheduler: Linear warmup with decay

NaNK

license:mit

turkish-law-eqa-bert-finetuned

This model is fine-tuned for extractive question answering on Turkish legal texts. It can extract relevant spans from law articles based on user queries. - Training Set: 21,664 examples - Validation Set: 2,498 examples

license:apache-2.0

opus-mt-tr-en-kafkaesque

license:apache-2.0

absa-turkish-bert-dbmdz

license:mit

gpt2-turkish-poem-generator

license:apache-2.0

bert-ner-turkish-cased

license:mit

tapas-base-wtq-balance-sheet-tuned

license:mit

bertopic-turkish-political

—

opus-mt-en-tr-kafkaesque

license:apache-2.0

turkish-code-detector

license:apache-2.0

bert-turkish-agriculture-mlm

This is a domain-adapted version of `dbmdz/bert-base-turkish-cased`. We continued masked-language pre-training on the open-source `yeniguno/turkishagriculturecorpus` to bias the model toward Turkish agricultural vocabulary and discourse while retaining its general-language abilities.

license:apache-2.0

yeniguno

bert-uncased-intent-classification

mbart50-turkish-grammar-corrector

nli-deberta-zero-shot-reviews-turkish-v1

bert-uncased-turkish-intent-classification

turkish-gibberish-detection-ft

roberta-turkish-bantopic-uncased

turkish-abstractive-summary-mt5

bert-turkish-organization-ner-uncased

democracy-sentiment-analysis-turkish-roberta

turkish-law-aqa-bart-finetuned

turkish-law-eqa-bert-finetuned

opus-mt-tr-en-kafkaesque

absa-turkish-bert-dbmdz

gpt2-turkish-poem-generator

bert-ner-turkish-cased

tapas-base-wtq-balance-sheet-tuned

bertopic-turkish-political

opus-mt-en-tr-kafkaesque

turkish-code-detector

bert-turkish-agriculture-mlm