t-tech

16 models • 2 total models in database

Sort by:

T-one

--- license: apache-2.0 language: - ru pipeline_tag: automatic-speech-recognition tags: - conformer - streaming - asr - stt - telephony - russian - speech - t-tech - t-one ---

license:apache-2.0

384,853

T-lite-it-1.0

—

6,249

100

T-pro-it-2.0

🚨 Users are advised to exercise caution and are responsible for any additional training and oversight required to ensure the model's responses meet acceptable ethical and safety standards. The responsibility for incorporating this model into industrial or commercial solutions lies entirely with those who choose to deploy it. T-pro-it-2.0 is a model built upon the Qwen 3 model family and incorporates both continual pre-training and alignment techniques. Instruction Pre-Training: 40B tokens of instruction data, with one-third focused on reasoning tasks. Supervised Fine-Tuning (SFT): ~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up about 20% of the dataset. Preference Tuning: ~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks. | Model | MERA | ruMMLU | Ru Arena Hard | ru AIME 2025 | ru LCB | |------------------------------------|:----:|:------:|:-------------:|:------------:|:------:| | T-pro 2.0 | 0.660 | 0.790 | 0.876 | 0.646 | 0.563 | | Qwen 3 32B | 0.584 | 0.740 | 0.836 | 0.625 | 0.537 | | Ruadapt 3 32B V2 | 0.574 | 0.737 | 0.660 | 0.450 | 0.500 | | DeepSeek-R1-Distill-Qwen-32B | 0.508 | 0.702 | 0.426 | 0.402 | 0.493 | | Gemma 3 27B | 0.577 | 0.695 | 0.759 | 0.231 | 0.261 | To enable or disable reasoning mode in HuggingFace, set the `enablethinking` flag in `tokenizer.applychattemplate`. For more details, see: - SGLang Thinking/Non‑Thinking Modes - vLLM Thinking/Non‑Thinking Modes | Mode | Temperature | presencepenalty | |-----------------------------------|-------------|------------------| | No‑think (general requests) | ≤ 0.3 | 1.0 | | Think mode (standard requests) | ≈ 0.6 | 1.0 | | Complex reasoning requests | ≥ 0.8 | 1.0 | - Hybrid reasoning models need careful tuning of sampling hyperparameters, which vary by domain. - Use lower temperature for straightforward queries and higher temperature for complex 'think-mode' tasks. - A presencepenalty between 0 and 2 can help avoid repetitive outputs. SGLang Usage For better quality and stable performance, we recommend SGLang as your inference framework. To run an inference server for T-pro-it-2.0, start by launching the SGLang server: Once the server is up and listening on `localhost:30000`, you can send chat-based requests via the OpenAI Python client. Note: It is obligatory to include both `temperature` and `presencepenalty` in every completion call. Long Context Usage T-pro-it-2.0 natively supports a context length of 32,768 tokens. For conversations where the input significantly exceeds this limit, follow the recommendations from the Qwen3 model card on processing long texts. For example, in SGLang, you can enable 128K context support with the following command: `llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768`