Motif-Technologies

9 models • 2 total models in database

Sort by:

optimizer

Optimizer is a python package that provides: - PyTorch implementation of recent optimizer algorithms - with support for parallelism techniques for efficient large-scale training. Currently implemented - Parallel Muon with N-D sharding - arxiv URL - Supports general N-D sharding configurations - The implementation is not tied to any specific parallel strategy. - Verified from basic FSDP2 setups up to hybrid configurations such as (2 TP + 2 DP-Replicate + 2 DP-Shard). - Verified configurations can be found in testmuon.py Test - Check test/README.md for how to run the tests. This project uses pre-commit to automatically check and format code before commits. Once installed, the configured hooks will run automatically on each commit. - yapf – Python code formatter - typos – Spell checker for common typos - isort – Organizes and sorts Python imports - clang-format – Formats C++/CUDA code (`--style=file`) - pymarkdown – Lints and auto-fixes Markdown files - actionlint – Validates GitHub Actions workflows

license:apache-2.0

2,556

Motif-2-12.7B-Base

We are pleased to announce Motif-2-12.7B-Base, a 12.7-billion-parameter language model. Detailed information is found in the technical report: https://arxiv.org/abs/2511.07464. All models listed in the table below are base models. The results of Qwen3 and Gemma 3 are sourced directly from their technical reports. |Benchmark|Evaluation setting|Motif-2-12.7B|Qwen3-14B|Qwen3-32B|Qwen3-30B-A3B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---|---|---|---|---| |MMLU|5-shot|78.1|81.05|83.61|81.38|74.5|78.6| |MMLU-Redux|5-shot|78.68|79.88|83.41|81.17|-|-| |MMLU-Pro|5-shot, CoT|66.38|61.03|65.54|61.49|45.3|52.2| |SuperGPQA|5-shot, CoT|32.68|34.27|39.78|35.72|-|-| |BBH|3-shot, CoT|81.34|81.07|87.38|81.54|-|-| |GPQA|5-shot, CoT|42.18|39.9|49.49|43.94|-|-| |GPQA-Diamond|5-shot, CoT|42.92|-|-|-|25.4|24.3| |GSM8K|4-shot, CoT|93.85|92.49|93.4|91.81|-|-| |GSM8K|8-shot, CoT|94.92|-|-|-|71|82.6| |MATH|4-shot, CoT|73.62|62.02|61.62|59.04|43.3|50| |EvalPlus|0-shot|72.22|72.23|72.05|71.45|-|-| |MBPP|3-shot|81.5|73.4|78.2|74.4|60.4|65.6| |CRUX-O|1-shot|63.1|68.6|72.5|67.2|-|-| |HumanEval|0-shot|65.9|-|-|-|45.7|48.8| |DROP|1-shot|69.9|-|-|-|72.2|77.2| |HellaSwag|10-shot|84|-|-|-|84.2|85.6| |BoolQ|0-shot|78.5|-|-|-|78.8|82.4| |PIQA|0-shot|81.6|-|-|-|81.8|83.3| |SIQA|0-shot|53.8|-|-|-|53.4|54.9| |TriviaQA|5-shot|72.2|-|-|-|78.2|85.5| |Natural Question|5-shot|29.6|-|-|-|31.4|36.1| |ARC-C|25-shot|69.6|-|-|-|68.9|70.6| |ARC-E|0-shot|84.1|-|-|-|88.3|89| |WinoGrande|5-shot|79.6|-|-|-|74.3|78.8| |BBH|few-shot|81.3|-|-|-|72.6|77.7| Averages and improvements of the corresponding benchmark scores: ||Motif-2-12.7B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---| |Average|71.53|63.87|67.96| |Improvement||+11.99%|+5.26%| ||Motif-2-12.7B|Qwen3-14B|Qwen3-32B|Qwen3-30B-A3B| |---|---|---|---|---| |Average|69.42|67.81|71.54|68.10| |Improvement||+2.37%|-2.96%|+1.94%|

NaNK

license:apache-2.0

877

Motif-2-12.7B-Instruct

We are pleased to announce Motif-2-12.7B-Instruct, a 12.7-billion-parameter language model. This model is an supervised fine-tuning (SFT) variant of our base model: https://huggingface.co/Motif-Technologies/Motif-2-12.7B-Base. Detailed information is found in the technical report: https://arxiv.org/abs/2511.07464. One can chat directly with Motif-2-12.7B-Instruct at https://chat.motiftech.io. The results of Qwen3 and Gemma 3 are sourced directly from their technical reports. |Benchmark|Evaluation setting|Motif-2-12.7B|Qwen2.5-72B|Qwen3-14B|Qwen3-14B|Qwen3-32B|Qwen3-32B|Qwen3-30B-A3B|Qwen3-30B-A3B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---|---|---|---|---|---|---|---|---| |||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking|Instruct|Instruct| |MMLU|0-shot|86.11|-|-|-|-|-|-|-|71.9|76.9| |MMLU-Redux|-|90.02|86.8|82|88.6|85.7|90.9|84.1|89.5|-|-| |BBH|0-shot|85.78|-|-|-|-|-|-|-|85.7|87.6| |GPQA-Diamond|0-shot, CoT|63.6|49|54.8|64|54.6|68.4|54.8|65.8|40.9|42.4| |GSM8K|0-shot, CoT|96.13|-|-|-|-|-|-|-|94.4|95.9| |MATH|0-shot|97|-|-|-|-|-|-|-|83.8|89| |MBPP|3-shot|91|-|-|-|-|-|-|-|73|74.4| |LiveBench 2024-11-25|-|33.8|51.4|59.6|71.3|59.8|74.9|59.4|74.3|-|-| |IFEval|strict prompt|75.78|84.1|84.8|85.4|83.2|85|83.7|86.5|-|-| |IFEval|0-shot|76.52|-|-|-|-|-|-|-|88.9|90.4| |MATH-500|-|96.8|83.6|90|96.8|88.6|97.2|89.8|98|-|-| |AIME24|-|72.3|18.9|31.7|79.3|31|81.4|32.8|80.4|-|-| |AIME25|-|63.6|15|23.3|70.4|20.2|72.9|21.6|70.9|-|-| |ZebraLogic|-|69.5|26.6|33|88.5|29.2|88.8|33.2|89.5|-|-| |BFCL v3|-|55.34|63.4|61.5|70.4|63|70.3|58.6|69.1|-|-| |LiveCodeBench v5 (2024.10 - 2025.2)|-|50.03|30.7|29|63.5|31.3|65.7|29.8|62.6|-|-| |LiveCodeBench v5 |0-shot, CoT|61.66|-|-|-|-|-|-|-|32|39| |HumanEval|0-shot|93.2|-|-|-|-|-|-|-|85.4|87.8| Averages and improvements of the corresponding benchmark scores: ||Motif-2-12.7B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---| ||Instruct|Instruct|Instruct| |Average|83.44|72.89|75.93| |Improvement||+14.48%|+9.89%| ||Motif-2-12.7B|Qwen2.5-72B|Qwen3-14B|Qwen3-14B|Qwen3-32B|Qwen3-32B|Qwen3-30B-A3B|Qwen3-30B-A3B| |---|---|---|---|---|---|---|---|---| ||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking| |Average|67.08|50.95|54.97|77.82|54.66|79.55|54.78|78.66| |Improvement||+31.65%|+22.02%|-13.80%|+22.72%|-15.68%|+22.45%|-14.73%| How to use in transformers To use this model, install huggingface kernels. How to use in vllm The PR adding support for the Motif model in the official vLLM package is currently under review. In the meantime, to use our model with vLLM, please use the following container image. Our model supports a sequence length of up to 32K tokens.

NaNK

license:apache-2.0

758

Motif-Technologies

optimizer

Motif-2-12.7B-Base

Motif-2-12.7B-Instruct

Motif-2.6B

activation

Motif-2.6b-v1.1-LC

Motif-2-12.7B-Reasoning

Motif-Image-6B-Preview

Motif-Video-2B