LiquidAI
LFM2-1.2B
--- library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE language: - en - ar - zh - fr - de - ja - ko - es pipeline_tag: text-generation tags: - liquid - lfm2 - edge ---
LFM2.5-1.2B-Instruct
LFM2.5-VL-1.6B-GGUF
LFM2.5-1.2B-Instruct-GGUF
LFM2.5-VL-1.6B
LFM2-VL-450M-GGUF
LFM2-VL is a new generation of vision models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. Find more details in the original model card: https://huggingface.co/LiquidAI/LFM2-VL-450M
LFM2-VL-3B-GGUF
LFM2-24B-A2B-GGUF
LFM2-2.6B-GGUF
LFM2-350M-GGUF
LFM2.5-1.2B-Thinking-GGUF
LFM2-350M-ENJP-MT-GGUF
Based on the LFM2-350M model, this checkpoint has been fine-tuned for near real-time bi-directional Japanese/English translation of short-to-medium inputs. Find more details in the original model card: https://huggingface.co/LiquidAI/LFM2-350M-ENJP-MT
LFM2-8B-A1B
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficienc...
LFM2-2.6B-Exp-GGUF
LFM2-700M-GGUF
LFM2.5-1.2B-JP-GGUF
LFM2-1.2B-Extract-GGUF
Based on LFM2-1.2B, LFM2-1.2B-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. - Extracting invoice details from emails into structured JSON. - Converting regulatory filings into XML for compliance systems. - Transforming customer support tickets into YAML for analytics pipelines. - Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post.
LFM2.5-1.2B-Base
LFM2-350M
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. We're releasing the weights of four post-trained checkpoints with 350M, 700M, 1.2B, and 2.6 parameters. They provide the following key features to create AI-powered edge applications: Fast training & inference – LFM2 achieves 3x faster training compared to its previous generation. It also benefits from 2x faster decode and prefill speed on CPU compared to Qwen3. Best performance – LFM2 outperforms similarly-sized models across multiple benchmark categories, including knowledge, mathematics, instruction following, and multilingual capabilities. New architecture – LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. Flexible deployment – LFM2 runs efficiently on CPU, GPU, and NPU hardware for flexible deployment on smartphones, laptops, or vehicles. Due to their small size, we recommend fine-tuning LFM2 models on narrow use cases to maximize performance. They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills. | Property | LFM2-350M | LFM2-700M | LFM2-1.2B | LFM2-2.6B | | ------------------- | ----------------------------- | ----------------------------- | ----------------------------- | ----------------------------- | | Parameters | 354,483,968 | 742,489,344 | 1,170,340,608 | 2,569,272,320 | | Layers | 16 (10 conv + 6 attn) | 16 (10 conv + 6 attn) | 16 (10 conv + 6 attn) | 30 (22 conv + 8 attn) | | Context length | 32,768 tokens | 32,768 tokens | 32,768 tokens | 32,768 tokens | | Vocabulary size | 65,536 | 65,536 | 65,536 | 65,536 | | Precision | bfloat16 | bfloat16 | bfloat16 | bfloat16 | | Training budget | 10 trillion tokens | 10 trillion tokens | 10 trillion tokens | 10 trillion tokens | | License | LFM Open License v1.0 | LFM Open License v1.0 | LFM Open License v1.0 | LFM Open License v1.0 | Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. Generation parameters: We recommend the following parameters: `temperature=0.3` `minp=0.15` `repetitionpenalty=1.05` Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. Tool use: It consists of four main steps: 1. Function definition: LFM2 takes JSON function definitions as input (JSON objects between ` ` and ` ` special tokens), usually in the system prompt 2. Function call: LFM2 writes Pythonic function calls (a Python list between ` ` and ` ` special tokens), as the assistant answer. 3. Function execution: The function call is executed and the result is returned (string between ` ` and ` ` special tokens), as a "tool" role. 4. Final answer: LFM2 interprets the outcome of the function call to address the original user prompt in plain text. Here is a simple example of a conversation using tool use: You can directly pass tools as JSON schema or Python functions with `.applychattemplate()` as shown in this page to automatically format the system prompt. Architecture: Hybrid model with multiplicative gates and short convolutions: 10 double-gated short-range LIV convolution blocks and 6 grouped query attention (GQA) blocks. Pre-training mixture: Approximately 75% English, 20% multilingual, and 5% code data sourced from the web and licensed materials. Training approach: Knowledge distillation using LFM1-7B as teacher model Very large-scale SFT on 50% downstream tasks, 50% general domains Custom DPO with length normalization and semi-online datasets Iterative model merging To run LFM2, you need to install Hugging Face `transformers` v4.55 or a more recent version as follows: Here is an example of how to generate an answer with transformers in Python: You can directly run and test the model with this Colab notebook. You need to install `vLLM` v0.10.2 or a more recent version as follows: You can run LFM2 with llama.cpp using its GGUF checkpoint. Find more information in the model card. We recommend fine-tuning LFM2 models on your use cases to maximize performance. | Notebook | Description | Link | |-------|------|------| | SFT (Unsloth) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth. | | | SFT (Axolotl) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl. | | | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | DPO (TRL) | Preference alignment with Direct Preference Optimization (DPO) using TRL. | | LFM2 outperforms similar-sized models across different evaluation categories. | Model | MMLU | GPQA | IFEval | IFBench | GSM8K | MGSM | MMMLU | |-------|------|------|--------|---------|-------|------|-------| | LFM2-350M | 43.43 | 27.46 | 65.12 | 16.41 | 30.1 | 29.52 | 37.99 | | LFM2-700M | 49.9 | 28.48 | 72.23 | 20.56 | 46.4 | 45.36 | 43.28 | | LFM2-1.2B | 55.23 | 31.47 | 74.89 | 20.7 | 58.3 | 55.04 | 46.73 | | Qwen3-0.6B | 44.93 | 22.14 | 64.24 | 19.75 | 36.47 | 41.28 | 30.84 | | Qwen3-1.7B | 59.11 | 27.72 | 73.98 | 21.27 | 51.4 | 66.56 | 46.51 | | Llama-3.2-1B-Instruct | 46.6 | 28.84 | 52.39 | 16.86 | 35.71 | 29.12 | 38.15 | | gemma-3-1b-it | 40.08 | 21.07 | 62.9 | 17.72 | 59.59 | 43.6 | 34.43 | If you are interested in custom solutions with edge deployment, please contact our sales team.
LFM2-24B-A2B
LFM2-350M-Extract-GGUF
Based on LFM2-350M, LFM2-350M-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. - Extracting invoice details from emails into structured JSON. - Converting regulatory filings into XML for compliance systems. - Transforming customer support tickets into YAML for analytics pipelines. - Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post.
LFM2.5-1.2B-Thinking
LFM2-2.6B-Exp
LFM2-VL-450M
LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications. We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters. 2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy Flexible architecture with user-tunable speed-quality tradeoffs at inference time Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion Find more about our vision-language model in the LFM2-VL post and its language backbone in the LFM2 blog post. Due to their small size, we recommend fine-tuning LFM2-VL models on narrow use cases to maximize performance. They were trained for instruction following and lightweight agentic flows. Not intended for safety‑critical decisions. | Property | LFM2-VL-450M | LFM2-VL-1.6B | |---|---:|---:| | Parameters (LM only) | 350M | 1.2B | | Vision encoder | SigLIP2 NaFlex base (86M) | SigLIP2 NaFlex shape‑optimized (400M) | | Backbone layers | hybrid conv+attention | hybrid conv+attention | | Context (text) | 32,768 tokens | 32,768 tokens | | Image tokens | dynamic, user‑tunable | dynamic, user‑tunable | | Vocab size | 65,536 | 65,536 | | Precision | bfloat16 | bfloat16 | | License | LFM Open License v1.0 | LFM Open License v1.0 | Generation parameters: We recommend the following parameters: - Text: `temperature=0.1`, `minp=0.15`, `repetitionpenalty=1.05` - Vision: `minimagetokens=64` `maximagetokens=256`, `doimagesplitting=True` Chat template: LFM2-VL uses a ChatML-like chat template as follows: Images are referenced with a sentinel (` `), which is automatically replaced with the image tokens by the processor. You can apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. Architecture - Hybrid backbone: Language model tower (LFM2-1.2B or LFM2-350M) paired with SigLIP2 NaFlex vision encoders (400M shape-optimized or 86M base variant) - Native resolution processing: Handles images up to 512×512 pixels without upscaling and preserves non-standard aspect ratios without distortion - Tiling strategy: Splits large images into non-overlapping 512×512 patches and includes thumbnail encoding for global context (in 1.6B model) - Efficient token mapping: 2-layer MLP connector with pixel unshuffle reduces image tokens (e.g., 256×384 image → 96 tokens, 1000×3000 → 1,020 tokens) - Inference-time flexibility: User-tunable maximum image tokens and patch count for speed/quality tradeoff without retraining Training approach - Builds on the LFM2 base model with joint mid-training that fuses vision and language capabilities using a gradually adjusted text-to-image ratio - Applies joint SFT with emphasis on image understanding and vision tasks - Leverages large-scale open-source datasets combined with in-house synthetic vision data, selected for balanced task coverage - Follows a progressive training strategy: base model → joint mid-training → supervised fine-tuning You can run LFM2-VL with Hugging Face `transformers` v4.57 or more recent as follows: Here is an example of how to generate an answer with transformers in Python: You can directly run and test the model with this Colab notebook. We recommend fine-tuning LFM2-VL models on your use cases to maximize performance. | Notebook | Description | Link | |-----------|----------------------------------------------------------------------|------| | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | Model | RealWorldQA | MM-IFEval | InfoVQA (Val) | OCRBench | BLINK | MMStar | MMMU (Val) | MathVista | SEEDBenchIMG | MMVet | MME | MMLU | |-------------------|-------------|-----------|---------------|----------|-------|--------|------------|-----------|---------------|-------|----------|-------| | InternVL3-2B | 65.10 | 38.49 | 66.10 | 831 | 53.10 | 61.10 | 48.70 | 57.60 | 75.00 | 67.00 | 2186.40 | 64.80 | | InternVL3-1B | 57.00 | 31.14 | 54.94 | 798 | 43.00 | 52.30 | 43.20 | 46.90 | 71.20 | 58.70 | 1912.40 | 49.80 | | SmolVLM2-2.2B | 57.50 | 19.42 | 37.75 | 725 | 42.30 | 46.00 | 41.60 | 51.50 | 71.30 | 34.90 | 1792.50 | - | | LFM2-VL-1.6B | 65.23 | 37.66 | 58.68 | 742 | 44.40 | 49.53 | 38.44 | 51.10 | 71.97 | 48.07 | 1753.04 | 50.99 | | Model | RealWorldQA | MM-IFEval | InfoVQA (Val) | OCRBench | BLINK | MMStar | MMMU (Val) | MathVista | SEEDBenchIMG | MMVet | MME | MMLU | |-------------------|-------------|-----------|---------------|----------|-------|--------|------------|-----------|---------------|-------|----------|-------| | SmolVLM2-500M | 49.90 | 11.27 | 24.64 | 609 | 40.70 | 38.20 | 34.10 | 37.50 | 62.20 | 29.90 | 1448.30 | - | | LFM2-VL-450M | 52.29 | 26.18 | 46.51 | 655 | 41.98 | 40.87 | 33.11 | 44.70 | 63.50 | 33.76 | 1239.06 | 40.16 | We obtained MM-IFEval and InfoVQA (Val) scores for InternVL 3 and SmolVLM2 models using VLMEvalKit. If you are interested in custom solutions with edge deployment, please contact our sales team.
LFM2-ColBERT-350M
LFM2-ColBERT-350M is a late interaction retriever with excellent multilingual performance. It allows you to store documents in one language (for example, a product description in English) and retrieve them in many languages with high accuracy. - LFM2-ColBERT-350M offers best-in-class accuracy across different languages. - Inference speed is on par with models 2.3 times smaller, thanks to the efficient LFM2 backbone. - You can use it as a drop-in replacement in your current RAG pipelines to improve performance. Find more information about LFM2-ColBERT-350M in our blog post. > [!NOTE] > 🚀 Try our demo: https://huggingface.co/spaces/LiquidAI/LFM2-ColBERT Late interaction retrievers like LFM2-ColBERT-350M are particularly interesting because they preserve much of the expressivity of re-rankers while retaining the efficiency of bi-encoders. In practice, they're used to both retrieve documents at scale (like bi-encoders) and rank them at the same time (like rerankers). We recommend using this model for various RAG use cases, such as: - E-commerce: Find products across many languages with semantic search at scale. - On-device semantic search: Ask questions to your phone in natural language to retrieve files, emails, and notes. - Enterprise knowledge assistants: Retrieve internal legal, financial, and technical documents in different languages. | Property | LFM2-ColBERT-350M | | --------------------- | ------------------------------ | | Total parameters | 353,322,752 | | Layers | 17 (10 conv + 6 attn + 1 dense)| | Context length | 32,768 tokens | | Vocabulary size | 65,536 | | Training precision| BF16 | | License | LFM Open License v1.0 | Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. First, install the PyLate and transformers library: Use this model with PyLate to index and retrieve documents. The index uses FastPLAID for efficient similarity search. Load LFM2-ColBERT-350M and initialize the PLAID index, then encode and index your documents: Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it: Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries. To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores: Reranking If you only want to use LFM2-ColBERT-350M to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank: We extended the NanoBEIR benchmark to include Japanese and Korean languages. We open-sourced this dataset on Hugging Face at LiquidAI/nanobeir-multilingual-extended for reproducibility. On this NanoBEIR benchmark, LFM2-ColBERT-350M displays significantly stronger multilingual capabilities (especially in German, Arabic, Korean, and Japanese) while maintaining English performance. Even more interestingly, LFM2-ColBERT-350M is an excellent cross-lingual retriever. This means that it is capable of retrieving documents based on queries from other languages. This is ideal for client-facing applications, like in e-commerce, where a description might be in English but the query is in another language. LFM2-ColBERT-350M works especially well for English, French, Spanish, Italian, Portuguese, and German, as shown with these NDCG@10 scores on NanoBEIR: AR 0.490 0.288 0.339 0.303 0.304 0.286 0.357 0.338 0.291 33.30% DE 0.383 0.563 0.547 0.498 0.502 0.489 0.424 0.368 0.486 47.33% EN 0.416 0.554 0.661 0.553 0.551 0.522 0.477 0.395 0.535 51.82% ES 0.412 0.514 0.578 0.563 0.547 0.529 0.436 0.394 0.547 50.21% FR 0.408 0.527 0.573 0.552 0.564 0.537 0.450 0.388 0.549 50.53% IT 0.395 0.512 0.554 0.535 0.535 0.543 0.439 0.386 0.529 49.20% JA 0.375 0.365 0.409 0.358 0.345 0.337 0.557 0.491 0.330 39.63% KO 0.326 0.274 0.310 0.282 0.265 0.266 0.440 0.527 0.271 32.89% PT 0.402 0.499 0.558 0.545 0.528 0.529 0.436 0.382 0.547 49.17% AVG 40.07% 45.51% 50.32% 46.54% 46.00% 44.86% 44.62% 40.78% 45.38% In comparison, GTE-ModernColBERT-v1 consistently gets lower scores when documents and queries are not in the same language: AR 0.309 0.089 0.107 0.089 0.094 0.092 0.070 0.049 0.087 10.96% DE 0.039 0.499 0.454 0.362 0.393 0.367 0.133 0.061 0.361 29.65% EN 0.042 0.408 0.680 0.446 0.484 0.420 0.167 0.073 0.438 35.08% ES 0.044 0.360 0.485 0.525 0.465 0.437 0.149 0.061 0.487 33.48% FR 0.044 0.381 0.505 0.455 0.546 0.428 0.136 0.057 0.467 33.35% IT 0.043 0.369 0.449 0.446 0.451 0.516 0.143 0.054 0.448 32.36% JA 0.031 0.169 0.250 0.172 0.177 0.169 0.459 0.059 0.165 18.35% KO 0.030 0.134 0.169 0.127 0.133 0.125 0.090 0.368 0.124 14.45% PT 0.043 0.368 0.479 0.492 0.467 0.448 0.138 0.062 0.530 33.63% AVG 6.94% 30.84% 39.75% 34.59% 35.68% 33.35% 16.53% 9.37% 34.24% This makes retrieval a lot more reliable and can replace architectures with multiple models with a single, unified retriever. Despite being more than twice as big, LFM2-ColBERT-350M demonstrates throughput performance on par with GTE-ModernColBERT-v1 for query and document encoding across various batch sizes. Query encoding was evaluated using realistic query patterns from datasets like MS MARCO and Natural Questions. Document encoding was measured on realistic documents with varying lengths and domains. If you are interested in custom solutions with edge deployment, please contact our sales team. Please cite the PyLate library if you use it for inference or training:
LFM2.5-Audio-1.5B-GGUF
LFM2-VL-3B
LFM2-700M
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. We're releasing the weights of four post-trained checkpoints with 350M, 700M, 1.2B, and 2.6 parameters. They provide the following key features to create AI-powered edge applications: Fast training & inference – LFM2 achieves 3x faster training compared to its previous generation. It also benefits from 2x faster decode and prefill speed on CPU compared to Qwen3. Best performance – LFM2 outperforms similarly-sized models across multiple benchmark categories, including knowledge, mathematics, instruction following, and multilingual capabilities. New architecture – LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. Flexible deployment – LFM2 runs efficiently on CPU, GPU, and NPU hardware for flexible deployment on smartphones, laptops, or vehicles. Due to their small size, we recommend fine-tuning LFM2 models on narrow use cases to maximize performance. They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills. | Property | LFM2-350M | LFM2-700M | LFM2-1.2B | LFM2-2.6B | | ------------------- | ----------------------------- | ----------------------------- | ----------------------------- | ----------------------------- | | Parameters | 354,483,968 | 742,489,344 | 1,170,340,608 | 2,569,272,320 | | Layers | 16 (10 conv + 6 attn) | 16 (10 conv + 6 attn) | 16 (10 conv + 6 attn) | 30 (22 conv + 8 attn) | | Context length | 32,768 tokens | 32,768 tokens | 32,768 tokens | 32,768 tokens | | Vocabulary size | 65,536 | 65,536 | 65,536 | 65,536 | | Precision | bfloat16 | bfloat16 | bfloat16 | bfloat16 | | Training budget | 10 trillion tokens | 10 trillion tokens | 10 trillion tokens | 10 trillion tokens | | License | LFM Open License v1.0 | LFM Open License v1.0 | LFM Open License v1.0 | LFM Open License v1.0 | Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish. Generation parameters: We recommend the following parameters: `temperature=0.3` `minp=0.15` `repetitionpenalty=1.05` Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. Tool use: It consists of four main steps: 1. Function definition: LFM2 takes JSON function definitions as input (JSON objects between ` ` and ` ` special tokens), usually in the system prompt 2. Function call: LFM2 writes Pythonic function calls (a Python list between ` ` and ` ` special tokens), as the assistant answer. 3. Function execution: The function call is executed and the result is returned (string between ` ` and ` ` special tokens), as a "tool" role. 4. Final answer: LFM2 interprets the outcome of the function call to address the original user prompt in plain text. Here is a simple example of a conversation using tool use: You can directly pass tools as JSON schema or Python functions with `.applychattemplate()` as shown in this page to automatically format the system prompt. Architecture: Hybrid model with multiplicative gates and short convolutions: 10 double-gated short-range LIV convolution blocks and 6 grouped query attention (GQA) blocks. Pre-training mixture: Approximately 75% English, 20% multilingual, and 5% code data sourced from the web and licensed materials. Training approach: Knowledge distillation using LFM1-7B as teacher model Very large-scale SFT on 50% downstream tasks, 50% general domains Custom DPO with length normalization and semi-online datasets Iterative model merging To run LFM2, you need to install Hugging Face `transformers` v4.55 or a more recent version as follows: Here is an example of how to generate an answer with transformers in Python: You can directly run and test the model with this Colab notebook. You need to install `vLLM` v0.10.2 or a more recent version as follows: You can run LFM2 with llama.cpp using its GGUF checkpoint. Find more information in the model card. We recommend fine-tuning LFM2 models on your use cases to maximize performance. | Notebook | Description | Link | |-------|------|------| | SFT (Unsloth) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth. | | | SFT (Axolotl) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl. | | | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | DPO (TRL) | Preference alignment with Direct Preference Optimization (DPO) using TRL. | | LFM2 outperforms similar-sized models across different evaluation categories. | Model | MMLU | GPQA | IFEval | IFBench | GSM8K | MGSM | MMMLU | |-------|------|------|--------|---------|-------|------|-------| | LFM2-350M | 43.43 | 27.46 | 65.12 | 16.41 | 30.1 | 29.52 | 37.99 | | LFM2-700M | 49.9 | 28.48 | 72.23 | 20.56 | 46.4 | 45.36 | 43.28 | | LFM2-1.2B | 55.23 | 31.47 | 74.89 | 20.7 | 58.3 | 55.04 | 46.73 | | Qwen3-0.6B | 44.93 | 22.14 | 64.24 | 19.75 | 36.47 | 41.28 | 30.84 | | Qwen3-1.7B | 59.11 | 27.72 | 73.98 | 21.27 | 51.4 | 66.56 | 46.51 | | Llama-3.2-1B-Instruct | 46.6 | 28.84 | 52.39 | 16.86 | 35.71 | 29.12 | 38.15 | | gemma-3-1b-it | 40.08 | 21.07 | 62.9 | 17.72 | 59.59 | 43.6 | 34.43 | If you are interested in custom solutions with edge deployment, please contact our sales team.
LFM2.5-350M-GGUF
LFM2-1.2B-Tool-GGUF
Based on LFM2-1.2B, LFM2-1.2B-Tool is designed for concise and precise tool calling. The key challenge was designing a non-thinking model that outperforms similarly sized thinking models for tool use. - Mobile and edge devices requiring instant API calls, database queries, or system integrations without cloud dependency. - Real-time assistants in cars, IoT devices, or customer support, where response latency is critical. - Resource-constrained environments like embedded systems or battery-powered devices needing efficient tool execution. You can find more information about other task-specific models in this blog post.
LFM2-1.2B-GGUF
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. Find more details in the original model card: https://huggingface.co/LiquidAI/LFM2-1.2B
LFM2-VL-1.6B-GGUF
LFM2-VL is a new generation of vision models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. Find more details in the original model card: https://huggingface.co/LiquidAI/LFM2-VL-1.6B
LFM2.5-Audio-1.5B-GGUF-LEAP
LFM2-8B-A1B-GGUF
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficienc...
LFM2-2.6B
LFM2-VL-1.6B
LFM2.5-1.2B-Thinking-MLX-8bit
LFM2-Audio-1.5B-GGUF
LFM2.5-350M
LFM2.5-1.2B-Thinking-ONNX
LFM2-1.2B-Extract
Based on LFM2-1.2B, LFM2-1.2B-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. - Extracting invoice details from emails into structured JSON. - Converting regulatory filings into XML for compliance systems. - Transforming customer support tickets into YAML for analytics pipelines. - Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post. Generation parameters: We strongly recommend using greedy decoding with a `temperature=0`. System prompt: If no system prompt is provided, the model will default to JSON outputs. We recommend providing a system prompt with a specific format (JSON, XML, or YAML) and a given schema to improve accuracy (see the following example). Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. > [!WARNING] > ⚠️ The model is intended for single-turn conversations. The data used for training these models was primarily synthetic, which allowed us to ensure a diverse data mix. We used a range of document types, domains, styles, lengths, and languages. We also varied the density and distribution of relevant text in the documents. In some cases, the extracted information was clustered in one part of the document; in others, it’s spread throughout. We applied the same approach of ensuring diversity when creating synthetic user requests and designing the structure of the model outputs. The data generation process underwent many iterations, incorporating ideas and feedback from across the Liquid AI team. We evaluated LFM2-Extract on a dataset of 5,000 documents, covering over 100 topics with a mix of writing styles, ambiguities, and formats. We used a combination of five metrics to capture a balanced view on syntax, accuracy, and faithfulness: - Syntax score: Checks whether outputs parse cleanly as valid JSON, XML, or YAML. - Format accuracy: Verifies that outputs match the requested format (e.g., JSON when JSON is requested). - Keyword faithfulness: Measures whether values in the structured output actually appear in the input text. - Absolute scoring: A judge LLM scores quality on a 1-5 scale, assessing completeness and correctness of extractions. - Relative scoring: We ask a judge LLM to choose the best answer between the extraction model’s output and the ground-truth answer. LFM2-1.2B-Extract can output complex objects in different languages on a level higher than Gemma 3 27B, a model 22.5 times its size. - Hugging Face: LFM2-1.2B - llama.cpp: LFM2-1.2B-Extract-GGUF - LEAP: LEAP model library You can use the following Colab notebooks for easy inference and fine-tuning: | Notebook | Description | Link | |-------|------|------| | Inference | Run the model with Hugging Face's transformers library. | | | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | DPO (TRL) | Preference alignment with Direct Preference Optimization (DPO) using TRL. | | | SFT (Axolotl) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl. | | | SFT (Unsloth) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth. | | If you are interested in custom solutions with edge deployment, please contact our sales team.
LFM2.5-1.2B-Instruct-MLX-8bit
LFM2.5-1.2B-JP
LFM2.5-Audio-1.5B
LFM2-1.2B-RAG-GGUF
Based on LFM2-1.2B, LFM2-1.2B-RAG is specialized in answering questions based on provided contextual documents, for use in RAG (Retrieval-Augmented Generation) systems. - Chatbot to ask questions about the documentation of a particular product. - Custom support with an internal knowledge base to provide grounded answers. - Academic research assistant with multi-turn conversations about research papers and course materials. You can find more information about other task-specific models in this blog post.
LFM2-350M-Extract
Based on LFM2-350M, LFM2-350M-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. - Extracting invoice details from emails into structured JSON. - Converting regulatory filings into XML for compliance systems. - Transforming customer support tickets into YAML for analytics pipelines. - Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post. Generation parameters: We strongly recommend using greedy decoding with a `temperature=0`. System prompt: If no system prompt is provided, the model will default to JSON outputs. We recommend providing a system prompt with a specific format (JSON, XML, or YAML) and a given schema to improve accuracy (see the following example). Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. > [!WARNING] > ⚠️ The model is intended for single-turn conversations. The data used for training these models was primarily synthetic, which allowed us to ensure a diverse data mix. We used a range of document types, domains, styles, lengths, and languages. We also varied the density and distribution of relevant text in the documents. In some cases, the extracted information was clustered in one part of the document; in others, it’s spread throughout. We applied the same approach of ensuring diversity when creating synthetic user requests and designing the structure of the model outputs. The data generation process underwent many iterations, incorporating ideas and feedback from across the Liquid AI team. We evaluated LFM2-Extract on a dataset of 5,000 documents, covering over 100 topics with a mix of writing styles, ambiguities, and formats. We used a combination of five metrics to capture a balanced view on syntax, accuracy, and faithfulness: - Syntax score: Checks whether outputs parse cleanly as valid JSON, XML, or YAML. - Format accuracy: Verifies that outputs match the requested format (e.g., JSON when JSON is requested). - Keyword faithfulness: Measures whether values in the structured output actually appear in the input text. - Absolute scoring: A judge LLM scores quality on a 1-5 scale, assessing completeness and correctness of extractions. - Relative scoring: We ask a judge LLM to choose the best answer between the extraction model’s output and the ground-truth answer. LFM2-350M-Extract outperforms Gemma 3 4B at this task, a model more than 11x its size. - Hugging Face: LFM2-350M - llama.cpp: LFM2-350M-Extract-GGUF - LEAP: LEAP model library You can use the following Colab notebooks for easy inference and fine-tuning: | Notebook | Description | Link | |-------|------|------| | Inference | Run the model with Hugging Face's transformers library. | | | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | DPO (TRL) | Preference alignment with Direct Preference Optimization (DPO) using TRL. | | | SFT (Axolotl) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl. | | | SFT (Unsloth) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth. | | If you are interested in custom solutions with edge deployment, please contact our sales team.
LFM2-2.6B-Transcript-GGUF
LFM2-350M-ENJP-MT
Based on the LFM2-350M model, this checkpoint has been fine-tuned for near real-time bi-directional Japanese/English translation of short-to-medium inputs. LFM2-350M-ENJP-MT delivers translation quality that is on par with models more than 10 times its size. Below are sample translations produced by the model. These examples are meant to give you a feel for its strengths and typical style in both directions (English ➡️ Japanese and Japanese ➡️ English). They include a mix of everyday text, technical descriptions, business communication, and news reporting, so you can gauge performance across different domains. These examples demonstrate the model's strength in product descriptions, technical passages, and formal explanations when translating into Japanese. Fully Tested and Works Properly. 6 Months Warranty included! Item pictured is the actual item for sale. See above for full description, condition, and comments. 「完全試験済みで正しく動作しています。保証期間は6ヶ月付属!」。 写真に写っている商品が販売されている実物です。 詳しく、状態、コメントは上記参照してください。 Emphasis on human-AI collaboration. Instead of focusing solely on making fully autonomous AI systems, we are excited to build multimodal systems that work with people collaboratively. 人とAIのコラボレーションに重点を置く。完全自律型AIシステムの構築にのみ焦点を当てるのではなく、人と協調して働くマルチモーダルシステムを構築できることに興奮しています。 If your equipment fails due to normal use, please contact our customer service department so that we can assist you, We will repair or replace your equipment at our discretion. In some situations, we may choose to refund the full purchase price of an item. ご使用中の機器が通常使用により故障した場合は、お手伝いできるよう弊社カスタマーサービス部門にご連絡ください。 弊社の判断で機器の修理または交換を行います。状況によっては、製品の購入価格全額を返金する場合があります。 2k USD to start for basic, 200 dollars for additional version. - 50% of full amount of deposit, - 3 proposals - end of month(3 drafts), will choose 1 and make final changes based on it - Present another final version in a week 基本版から始めるのに2,000ドル、追加バージョンでは200ドルの手数料が必要です。 - 保証金全額の50%が支払われる、 - 3つの案 - 月末(ドラフト3回分)、その案に基づいて1つを選んで最終的な変更を行う - さらに1週間後に別の最終版を提出すること Lifestyle risk factors with strong evidence include lack of exercise, cigarette smoking, alcohol, and obesity. The risk of colon cancer can be reduced by maintaining a normal body weight through a combination of sufficient exercise and eating a healthy diet. 強力な証拠がある生活習慣のリスク要因としては、運動不足、喫煙、飲酒、肥満などが挙げられ、十分な運動と健康的な食生活の組み合わせによる正常な体重維持を通じて、大腸がんの発症リスクを減らすことができる。 These examples demonstrate the model’s ability to preserve nuance in news reporting, colloquial phrasing, and business contexts when translating into English. モデルからの回答は英語でもOKなのですよね。 The answers from the models are okay in English, right? 手間のかかるメルマガ作成作業、もっとラクに、もっと速くできたら——。 そう考えたことはありませんか? Have you ever wondered if you could create a cumbersome email newsletter more easily and quickly? X JAPANのYOSHIKIが、アニメ『ダンダダン』でグループの代表曲をオマージュした劇中歌が使用されたことを指摘して始まった議論。 8月22日には『ダンダダン』サイドが公式Xで騒動を謝罪、YOSHIKIも『ダンダダン』サイドと和解を報告したが、これに物言いをつけたのが、弁護士の紀藤正樹氏だった。 The discussion began with the point that Yoshiki of X JAPAN mentioned that a song in the anime Dandadan paying homage to the group's signature tune was used as an insert song. On August 22nd, the Dandadan side apologized on their official X page for the controversy, and Yoshiki also reported a reconciliation with the Dandadan side, but lawyer Masaki Kitō objected. (ブルームバーグ): SOMPOホールディングスは27日夜、米国などを中心に展開する損害保険会社のアスペン・インシュアランス・ホールディングスを買収すると発表した。買収総額は約5200億円となる。 ニューヨーク証券取引所に上場しているアスペンの株式を1株当たり37.5ドル(約5600円)で全株を取得する。26日の終値を16%上回る水準。2026年上期中に買収手続きを完了する予定。 買収資金は手元資金を充てる。 SOMPOにとっては17年に米損保エンデュランス・スペシャルティ・ホールディングスを約6400億円で買収して以来の大型案件となる。 人口減少で国内市場の縮小が見込まれる中、買収によって海外保険ビジネスの規模や収益を拡大し、再保険取引による安定的な収益の寄与も見込む。 (Bloomberg): SOMPO Holdings announced on the evening of the 27th that it will acquire Aspen Insurance Holdings, a non-life insurance company operating primarily in the United States and elsewhere, for approximately ¥520 billion. The acquisition will involve the purchase of all shares of Aspen’s shares listed on the New York Stock Exchange for $37.5 per share (approximately ¥5,600). This surpasses the closing price of the day by 16% and is scheduled to be completed within the first half of 2026. Funds for the acquisition will be provided from the company’s own capital. For SOMPO, this is the largest acquisition since its 2017 acquisition of Endurance Specialty Holdings for approximately ¥640 billion. The acquisition is expected to expand the scale and revenue of its overseas insurance business amidst anticipated shrinking domestic markets due to population decline, and is also expected to contribute to stable revenue through reinsurance transactions. 28歳にしてつかんだイングランドサッカー界でのチャンスを生かせるか。 チャンピオンシップ(英2部)の古豪ブラックバーンに電撃移籍した森下龍矢は意気込んでいる。 サガン鳥栖と名古屋グランパスでプレーし、2024年から海を渡ってレギア・ワルシャワで奮闘してきた森下は先日、大橋祐紀のチームメイトとなることが決まった。 日本ではSBが主戦場だった森下だが、昨季はポーランドで攻撃的なポジションにコンバートされ、ウィングやトップ下に前線と様々な役割をこなした。 すると、公式戦で14得点、14アシストとブレイク。 この飛躍に注目したブラックバーンに引き抜かれている。 Can he capitalize on his chance in English football, which he seized at the age of 28? Ryuya Morishita, having made a shocking move to Blackburn Rovers, a long-established club in the Championship (British second tier), is eager to make an impression. Having played for Sagan Tosu and Nagoya Grampus, and having been striving with Legia Warsaw since 2024, Morishita recently announced he would become teammates with Yuki Ohashi. For Morishita, his primary playing field in Japan was as a full-back, but he was converted to an attacking position in Poland last season, playing in various roles including wing-back and attacking midfielder. He then broke through, scoring 14 goals and providing 14 assists in official matches. The Blackburn club has been scouting for this promising player. > [!NOTE] > 📝 While LFM2-350M-ENJP-MT delivers strong out-of-the-box general-purpose English ↔️ Japanese translation, our primary > goal is to provide a versatile, community-empowering base model—a foundation designed to make it easy to build > best-in-class, task-specific translation systems. > > Like any base model, there are open areas for growth—in particular with extreme context lengths and specialized or > context-sensitive translations, such as: > - Technical & professional language (medical, legal, engineering) > - Novel proper nouns (new products, brands, cultural references) > - Industry-, domain-, or company-specific nuance (e-commerce, finance, internal corporate terminology) > > These are precisely the kinds of challenges that fine-tuning—by both Liquid AI and our developer community—can > address. We see this model not just as an endpoint, but as a catalyst for a rich ecosystem of fine-tuned translation > models tailored to real-world needs. Generation parameters: We recommend the following sampling parameters: System prompts: LFM2-ENJP-MT requires one of the two following system prompts: "Translate to Japanese." for English to Japanese translation. "Translate to English." for Japanese to English translation. > [!WARNING] > ⚠️ The model cannot work as intended without one of these two system prompts. The chat template can be applied using the dedicated `.applychattemplate()` function from Hugging Face transformers. However, you must supply the system prompt that specifies the translation directionality. > [!WARNING] > ⚠️ The model is intended for single turn conversations. - Huggingface: LFM2-350M - llama.cpp: LFM2-350M-ENJP-MT-GGUF - LEAP: LEAP model library You can use the following Colab notebooks for easy inference and fine-tuning: | Notebook | Description | Link | |-------|------|------| | Inference | Run the model with Hugging Face's transformers library. | | | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | DPO (TRL) | Preference alignment with Direct Preference Optimization (DPO) using TRL. | | | SFT (Axolotl) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl. | | | SFT (Unsloth) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth. | | If you are interested in custom solutions with edge deployment, please contact our sales team. LFM2-350Mモデルをベースに、本チェックポイントは短文から中程度の入力に対する 日本語/英語の双方向リアルタイム翻訳 用にファインチューニングされています。 以下は本モデルが生成した翻訳例です。英語➡️日本語、日本語➡️英語の両方向における強みと典型的なスタイルを示しています。 Fully Tested and Works Properly. 6 Months Warranty included! Item pictured is the actual item for sale. See above for full description, condition, and comments. 「完全試験済みで正しく動作しています。保証期間は6ヶ月付属!」。 写真に写っている商品が販売されている実物です。 詳しく、状態、コメントは上記参照してください。 Emphasis on human-AI collaboration. Instead of focusing solely on making fully autonomous AI systems, we are excited to build multimodal systems that work with people collaboratively. 人とAIのコラボレーションに重点を置く。完全自律型AIシステムの構築にのみ焦点を当てるのではなく、人と協調して働くマルチモーダルシステムを構築できることに興奮しています。 If your equipment fails due to normal use, please contact our customer service department so that we can assist you, We will repair or replace your equipment at our discretion. In some situations, we may choose to refund the full purchase price of an item. ご使用中の機器が通常使用により故障した場合は、お手伝いできるよう弊社カスタマーサービス部門にご連絡ください。 弊社の判断で機器の修理または交換を行います。状況によっては、製品の購入価格全額を返金する場合があります。 2k USD to start for basic, 200 dollars for additional version. - 50% of full amount of deposit, - 3 proposals - end of month(3 drafts), will choose 1 and make final changes based on it - Present another final version in a week 基本版から始めるのに2,000ドル、追加バージョンでは200ドルの手数料が必要です。 - 保証金全額の50%が支払われる、 - 3つの案 - 月末(ドラフト3回分)、その案に基づいて1つを選んで最終的な変更を行う - さらに1週間後に別の最終版を提出すること Lifestyle risk factors with strong evidence include lack of exercise, cigarette smoking, alcohol, and obesity. The risk of colon cancer can be reduced by maintaining a normal body weight through a combination of sufficient exercise and eating a healthy diet. 強力な証拠がある生活習慣のリスク要因としては、運動不足、喫煙、飲酒、肥満などが挙げられ、十分な運動と健康的な食生活の組み合わせによる正常な体重維持を通じて、大腸がんの発症リスクを減らすことができる。 これらの例は、ニュース記事のニュアンス、口語表現、ビジネス文脈を保ちながら英語に翻訳できるモデルの能力を示しています。 モデルからの回答は英語でもOKなのですよね。 The answers from the models are okay in English, right? 手間のかかるメルマガ作成作業、もっとラクに、もっと速くできたら——。 そう考えたことはありませんか? Have you ever wondered if you could create a cumbersome email newsletter more easily and quickly? X JAPANのYOSHIKIが、アニメ『ダンダダン』でグループの代表曲をオマージュした劇中歌が使用されたことを指摘して始まった議論。 8月22日には『ダンダダン』サイドが公式Xで騒動を謝罪、YOSHIKIも『ダンダダン』サイドと和解を報告したが、これに物言いをつけたのが、弁護士の紀藤正樹氏だった。 The discussion began with the point that Yoshiki of X JAPAN mentioned that a song in the anime Dandadan paying homage to the group's signature tune was used as an insert song. On August 22nd, the Dandadan side apologized on their official X page for the controversy, and Yoshiki also reported a reconciliation with the Dandadan side, but lawyer Masaki Kitō objected. (ブルームバーグ): SOMPOホールディングスは27日夜、米国などを中心に展開する損害保険会社のアスペン・インシュアランス・ホールディングスを買収すると発表した。買収総額は約5200億円となる。 ニューヨーク証券取引所に上場しているアスペンの株式を1株当たり37.5ドル(約5600円)で全株を取得する。26日の終値を16%上回る水準。2026年上期中に買収手続きを完了する予定。 買収資金は手元資金を充てる。 SOMPOにとっては17年に米損保エンデュランス・スペシャルティ・ホールディングスを約6400億円で買収して以来の大型案件となる。 人口減少で国内市場の縮小が見込まれる中、買収によって海外保険ビジネスの規模や収益を拡大し、再保険取引による安定的な収益の寄与も見込む。 (Bloomberg): SOMPO Holdings announced on the evening of the 27th that it will acquire Aspen Insurance Holdings, a non-life insurance company operating primarily in the United States and elsewhere, for approximately ¥520 billion. The acquisition will involve the purchase of all shares of Aspen’s shares listed on the New York Stock Exchange for $37.5 per share (approximately ¥5,600). This surpasses the closing price of the day by 16% and is scheduled to be completed within the first half of 2026. Funds for the acquisition will be provided from the company’s own capital. For SOMPO, this is the largest acquisition since its 2017 acquisition of Endurance Specialty Holdings for approximately ¥640 billion. The acquisition is expected to expand the scale and revenue of its overseas insurance business amidst anticipated shrinking domestic markets due to population decline, and is also expected to contribute to stable revenue through reinsurance transactions. 28歳にしてつかんだイングランドサッカー界でのチャンスを生かせるか。 チャンピオンシップ(英2部)の古豪ブラックバーンに電撃移籍した森下龍矢は意気込んでいる。 サガン鳥栖と名古屋グランパスでプレーし、2024年から海を渡ってレギア・ワルシャワで奮闘してきた森下は先日、大橋祐紀のチームメイトとなることが決まった。 日本ではSBが主戦場だった森下だが、昨季はポーランドで攻撃的なポジションにコンバートされ、ウィングやトップ下に前線と様々な役割をこなした。 すると、公式戦で14得点、14アシストとブレイク。 この飛躍に注目したブラックバーンに引き抜かれている。 Can he capitalize on his chance in English football, which he seized at the age of 28? Ryuya Morishita, having made a shocking move to Blackburn Rovers, a long-established club in the Championship (British second tier), is eager to make an impression. Having played for Sagan Tosu and Nagoya Grampus, and having been striving with Legia Warsaw since 2024, Morishita recently announced he would become teammates with Yuki Ohashi. For Morishita, his primary playing field in Japan was as a full-back, but he was converted to an attacking position in Poland last season, playing in various roles including wing-back and attacking midfielder. He then broke through, scoring 14 goals and providing 14 assists in official matches. The Blackburn club has been scouting for this promising player. > [!NOTE] > 📝 LFM2-350M-ENJP-MTは汎用的な英日翻訳において高い性能を発揮しますが、我々の主な目標は、コミュニティに力を与える柔軟な基盤モデルを提供することです。 > これは一流のタスク特化型翻訳システムを容易に構築できるよう設計された基盤です。 > > すべての基盤モデルと同様に、成長の余地があります。特に以下のような場面です: > - 極端に長い文脈や専門的/文脈依存の翻訳 > - 専門分野の言語(医療、法律、工学) > - 新しい固有名詞(新製品、ブランド、文化的参照) > - 業界・分野・企業特有のニュアンス(EC、金融、社内用語) > > これらはLiquid AIおよび開発者コミュニティによるファインチューニングで解決可能な課題です。 > 本モデルを最終到達点ではなく、実世界に即した多様な翻訳モデル群を生み出す触媒として位置付けています。 システムプロンプト: LFM2-ENJP-MTは以下のいずれかのシステムプロンプトを必須とします: 英語 → 日本語翻訳: `"Translate to Japanese."` 日本語 → 英語翻訳: `"Translate to English."` > [!WARNING] > ⚠️ これらのシステムプロンプトがなければモデルは意図通りに動作しません。 チャットテンプレートは、Hugging Face Transformers の専用関数 `.applychattemplate()` を使用して適用できます。 ただし、翻訳方向を指定するシステムプロンプトを与える必要があります。 - Hugging Face: LFM2-350M - llama.cpp: LFM2-350M-ENJP-MT-GGUF - LEAP: LEAP モデルライブラリ | ノートブック | 説明 | リンク | |-------|------|------| | 推論 | Hugging Faceのtransformersライブラリを使用してモデルを実行します。 | | | SFT (TRL) | TRLを使用したLoRAアダプターによる教師あり学習(SFT)を行います。 | | | DPO (TRL) | TRLを使用したDPOによる選好アライメントを行います。 | | | SFT (Axolotl) | Axolotlを使用したLoRAアダプターによる教師あり学習(SFT)を行います。 | | | SFT (Unsloth) | Unslothを使用したLoRAアダプターによる教師あり学習(SFT)を行います。 | | エッジ環境への導入を含むカスタムソリューションにご興味がある方は、営業チームまでお問い合わせください。
LFM2-350M-Math
LFM2.5-1.2B-Thinking-MLX-4bit
LFM2-350M-PII-Extract-JP
Based on LFM2-350M, this checkpoint is designed to extract personally identifiable information (PII) from Japanese text and output it in JSON format. The output can then be used to mask out sensitive information in contracts, emails, personal medical reports, insurance bills, etc. directly on-device. In particular, it is trained to extract: Address/locations (JSON key: `address`) Company/institute/organization names (JSON key: `companyname`) Email addresses (JSON key: `emailaddress`) Human na...
LFM2-1.2B-RAG
Based on LFM2-1.2B, LFM2-1.2B-RAG is specialized in answering questions based on provided contextual documents, for use in RAG (Retrieval-Augmented Generation) systems. - Chatbot to ask questions about the documentation of a particular product. - Custom support with an internal knowledge base to provide grounded answers. - Academic research assistant with multi-turn conversations about research papers and course materials. You can find more information about other task-specific models in this blog post. Generation parameters: We recommend using greedy decoding with a `temperature=0`. System prompt: The system prompt is optional. You can force the output's language, for example, using "Always respond in English, regardless of the user's input language." By default, the output's language follows the user prompt's language. Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Training approach: We fine-tuned the LFM2-1.2B-RAG model on a dataset that includes 1M+ samples of multi-turn interactions and multi-document samples consisting of a mix of curated open source documents as well as generated synthetic ones. Chat template: LFM2 uses a ChatML-like chat template as follows: You can automatically apply it using the dedicated `.applychattemplate()` function from Hugging Face transformers. > [!WARNING] > ⚠️ The model supports both single-turn and multi-turn conversations. RAG systems enable AI solutions to include new, up-to-date, and potentially proprietary information in LLM responses that was not present in the training data. When a user asks a question, the retrieval component locates and delivers related documents from a knowledge base, and then the RAG generator model answers the question based on facts from those contextual documents. - Hugging Face: LFM2-1.2B - llama.cpp: LFM2-1.2B-Extract-GGUF - LEAP: LEAP model library You can use the following Colab notebooks for easy inference and fine-tuning: | Notebook | Description | Link | |-------|------|------| | Inference | Run the model with Hugging Face's transformers library. | | | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | DPO (TRL) | Preference alignment with Direct Preference Optimization (DPO) using TRL. | | | SFT (Axolotl) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl. | | | SFT (Unsloth) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth. | | If you are interested in custom solutions with edge deployment, please contact our sales team.
LFM2-350M-PII-Extract-JP-GGUF
Based on LFM2-350M, this checkpoint is designed to extract personally identifiable information (PII) from Japanese text and output it in JSON format. The output can then be used to mask out sensitive information in contracts, emails, personal medical reports, insurance bills, etc. directly on-device. Find more details in the original model card: https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP - Extract address, company/institution names, email addresses, human names, and phone numbers from Japanese text. - Specifying a particular quantization scheme (e.g. `Q80`): Several quantization variants are available (`Q40`, `Q4KM`, `Q5KM`, `Q6K`, `Q80`, and `F16`). - Only extracting particular entities (e.g. only extract `address` and `companyname`):
LFM2.5-1.2B-Instruct-ONNX
LFM2-1.2B-Tool
Based on LFM2-1.2B, LFM2-1.2B-Tool is designed for concise and precise tool calling. The key challenge was designing a non-thinking model that outperforms similarly sized thinking models for tool use. - Mobile and edge devices requiring instant API calls, database queries, or system integrations without cloud dependency. - Real-time assistants in cars, IoT devices, or customer support, where response latency is critical. - Resource-constrained environments like embedded systems or battery-powered devices needing efficient tool execution. You can find more information about other task-specific models in this blog post. Generation parameters: We recommend using greedy decoding with a `temperature=0`. System prompt: The system prompt must provide all the available tools Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. Tool use: It consists of four main steps: 1. Function definition: LFM2 takes JSON function definitions as input (JSON objects between ` ` and ` ` special tokens), usually in the system prompt 2. Function call: LFM2 writes Pythonic function calls (a Python list between ` ` and ` ` special tokens), as the assistant answer. 3. Function execution: The function call is executed and the result is returned (string between ` ` and ` ` special tokens), as a "tool" role. 4. Final answer: LFM2 interprets the outcome of the function call to address the original user prompt in plain text. Here is a simple example of a conversation using tool use: > [!WARNING] > ⚠️ The model supports both single-turn and multi-turn conversations. For edge inference, latency is a crucial factor in delivering a seamless and satisfactory user experience. Consequently, while test-time-compute inherently provides more accuracy, it ultimately compromises the user experience due to increased waiting times for function calls. Therefore, the goal was to develop a tool calling model that is competitive with thinking models, yet operates without any internal chain-of-thought process. We evaluated each model on a proprietary benchmark that was specifically designed to prevent data contamination. The benchmark ensures that performance metrics reflect genuine tool-calling capabilities rather than memorized patterns from training data. - Hugging Face: LFM2-350M - llama.cpp: LFM2-350M-Extract-GGUF - LEAP: LEAP model library You can use the following Colab notebooks for easy inference and fine-tuning: | Notebook | Description | Link | |-------|------|------| | Inference | Run the model with Hugging Face's transformers library. | | | SFT (TRL) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using TRL. | | | DPO (TRL) | Preference alignment with Direct Preference Optimization (DPO) using TRL. | | | SFT (Axolotl) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Axolotl. | | | SFT (Unsloth) | Supervised Fine-Tuning (SFT) notebook with a LoRA adapter using Unsloth. | | If you are interested in custom solutions with edge deployment, please contact our sales team.
LFM2-350M-Math-GGUF
Based on LFM2-350M, LFM2-350M-Math is a tiny reasoning model designed for tackling tricky math problems. You can find more information about other task-specific models in this blog post.
LFM2-2.6B-Transcript
LFM2.5-VL-450M-ONNX
LFM2-Audio-1.5B
LFM2-Audio-1.5B is Liquid AI's first end-to-end audio foundation model. Designed with low latency and real time conversation in mind, at only 1.5 billion parameters LFM2-Audio enables seamless conv...
LFM2.5-1.2B-Thinking-MLX-bf16
LFM2.5-1.2B-Instruct-MLX-bf16
LFM2.5-VL-1.6B-ONNX
LFM2-24B-A2B-ONNX
LFM2.5-1.2B-Instruct-MLX-4bit
LFM2.5-350M-MLX-8bit
LFM2-8B-A1B-ONNX
LFM2.5-1.2B-Instruct-MLX-6bit
LFM2.5-Audio-1.5B-ONNX
LFM2.5-1.2B-Thinking-MLX-5bit
LFM2.5-350M-MLX-4bit
LFM2.5-1.2B-Thinking-MLX-6bit
LFM2.5-1.2B-JP-ONNX
LFM2.5-350M-MLX-bf16
LFM2.5-1.2B-Instruct-MLX-5bit
LFM2.5-1.2B-JP-MLX-4bit
LFM2.5-1.2B-Base-ONNX
LFM2.5-1.2B-JP-MLX-8bit
LFM2.5-350M-MLX-5bit
LFM2.5-350M-MLX-6bit
LFM2.5-1.2B-JP-MLX-bf16
LFM2.5-1.2B-JP-MLX-6bit
LFM2-2.6B-Transcript-ONNX
LFM2.5-1.2B-JP-MLX-5bit
LFM2.5-350M-ONNX
LFM2-24B-A2B-MLX-4bit
LFM2.5-VL-450M
LeapBundles
LFM2.5-1.2B-Base-GGUF
LFM2-Tokenizer
Special tokens - bostoken: - eostoken: - padtoken: - septoken: None - clstoken: None - masktoken: None Added special tokens - " ": 0, - " ": 1, - " ": 2, - " ": 3, - " ": 4, - " ": 5, - " ": 6, - " ": 7, - " ": 8, - " ": 9, - " ": 10, - " ": 11, - " ": 12, - " ": 13, - " ": 64394, - " ": 64395, - " ": 64396, - " ": 64397, - " ": 64398, - " ": 64399