ThomasTheMaker

100 models • 1 total models in database
Sort by:

k-1b-gguf

NaNK
1,164
2

k-270m-gguf

380
0

k-1b

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-1b-it-unsloth-bnb-4bit This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK
license:apache-2.0
334
0

k-4b-gguf

NaNK
163
0

k-4b

NaNK
license:apache-2.0
47
0

gm3-270m-code-gguf

17
0

gm3-270m-tulu3-mix-gguf

16
0

gm3-270m-tinygsm-60000-Q8_0-GGUF

ThomasTheMaker/gm3-270m-tinygsm-60000-Q80-GGUF This model was converted to GGUF format from `ThomasTheMaker/gm3-270m-tinygsm-60000` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

NaNK
llama-cpp
15
0

qwen2.5-0.5B-simple-tool

NaNK
13
0

Arc

llama
12
0

gm3-270m-tinygsm-gpt41

license:apache-2.0
12
0

gm3-270m-tinygsm-gpt41-no-example

license:apache-2.0
12
0

gm3-270m-tinygsm-o4mini-reasoning

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
12
0

gm3-270m-tinygsm-llama33-70b-no-example

NaNK
license:apache-2.0
12
0

gm3-270m-tinygsm-gpt41-mini

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
12
0

gm3-270m-tinygsm-gpt41-mini-no-example

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
12
0

gm3-270m-tinygsm-Mixtral-8x7B-no-example

NaNK
license:apache-2.0
12
0

gm3-270m-TinyGSM-all

12
0

Smollm2-135M-Tulu-3-SFT-Personas-Instruction-Following-v1

Model Card for Smollm2-135M-Tulu-3-SFT-Personas-Instruction-Following This model is a fine-tuned version of HuggingFaceTB/SmolLM2-135M. It has been trained using TRL. - TRL: 0.22.2 - Transformers: 4.56.1 - Pytorch: 2.6.0+cu118 - Datasets: 4.0.0 - Tokenizers: 0.22.0

llama
11
0

Smollm2-135M-Tulu-3-SFT-Personas-Instruction-Following

Model Card for Smollm2-135M-Tulu-3-SFT-Personas-Instruction-Following This model is a fine-tuned version of HuggingFaceTB/SmolLM2-135M. It has been trained using TRL. - TRL: 0.22.2 - Transformers: 4.56.1 - Pytorch: 2.6.0+cu118 - Datasets: 4.0.0 - Tokenizers: 0.22.0

llama
11
0

gm3-270m-math

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
11
0

gm3-270m-math-lora

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
11
0

gm3-270m-code

license:apache-2.0
11
0

gm3-270m-code-lora

license:apache-2.0
11
0

gm3-270m-algebra

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
11
0

gm3-270m-algebra-lora

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
11
0

gm3-270m-algebra-code

This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: ThomasTheMaker/gm3-270m-algebra ThomasTheMaker/gm3-270m-code The following YAML configuration was used to produce this model:

NaNK
11
0

gm3-270m-tulu3-mix

license:apache-2.0
11
0

gm3-270m-tulu3-mix-lora

license:apache-2.0
11
0

gm3-270m-tinygsm

11
0

gm3-270m-tinygsm-Mixtral-8x7B

NaNK
license:apache-2.0
11
0

gm3-270m-TinyGSM-no-reasoning

10
0

gm3-270m-TinyGSM-reasoning

This model is a fine-tuned version of unsloth/gemma-3-270m-it. It has been trained using TRL. - TRL: 0.22.2 - Transformers: 4.55.4 - Pytorch: 2.8.0 - Datasets: 3.6.0 - Tokenizers: 0.21.4

10
0

gemma-3-270m-it-gguf

8
0

old-bob4

NaNK
license:apache-2.0
8
0

gm3-270m-math-gguf

7
0

gm3-270m-algebra-gguf

7
0

k-27b-gguf

NaNK
7
0

Smollm2-360M-Instruct-RKLLM-1.2.1B

NaNK
llama
6
0

pico-decoder-tiny

6
0

SmolLM2-135M-Tulu-SFT-Q8_0-GGUF

ThomasTheMaker/SmolLM2-135M-Tulu-SFT-Q80-GGUF This model was converted to GGUF format from `ThomasTheMaker/SmolLM2-135M-Tulu-SFT` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

llama-cpp
6
0

Arch-Router-1.5B-rkllm

NaNK
5
0

gm3-270m-tinygsm-Q8_0-GGUF

ThomasTheMaker/gm3-270m-tinygsm-Q80-GGUF This model was converted to GGUF format from `ThomasTheMaker/gm3-270m-tinygsm` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).

llama-cpp
5
0

SmolVLM-Base-cadquery-debug

This model is a fine-tuned version of HuggingFaceTB/SmolVLM-Base on an unknown dataset. The following hyperparameters were used during training: - learningrate: 0.0001 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: linear - numepochs: 1 - PEFT 0.17.1 - Transformers 4.56.2 - Pytorch 2.8.0+cu128 - Datasets 4.1.1 - Tokenizers 0.22.1

license:apache-2.0
5
0

SmolVLM-Base-cadquery-debug100

This model is a fine-tuned version of HuggingFaceTB/SmolVLM-Base on an unknown dataset. The following hyperparameters were used during training: - learningrate: 0.0001 - trainbatchsize: 1 - evalbatchsize: 8 - seed: 42 - optimizer: Use OptimizerNames.ADAMWTORCHFUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizerargs=No additional optimizer arguments - lrschedulertype: linear - numepochs: 3 - PEFT 0.17.1 - Transformers 4.56.2 - Pytorch 2.8.0+cu128 - Datasets 4.1.1 - Tokenizers 0.22.1

license:apache-2.0
5
0

old-bob1-gguf

5
0

new-bob-1_1

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
5
0

k-12b-gguf

NaNK
5
0

Qwen3-1.7B-RKLLM-v1.2.0

NaNK
license:apache-2.0
4
1

Llama-3.2.-1B-1.2.0-rkllm

NaNK
llama
4
0

meta-llama_Llama-3.2-1B-Instruct_8_layers_3_11_Open-Orca_SlimOrca_8000_ReplaceMe_lstsq_1

NaNK
llama
4
0

Ovis2-1B-RKLLM-1.2.0

NaNK
license:apache-2.0
4
0

new-bob-2

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-1b-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK
license:apache-2.0
4
0

new-bob-3_1-gguf

4
0

k-1b-q4f16_1-MLC

NaNK
4
0

k-app

3
1

Qwen3_0.6B_v.1.2.0

NaNK
license:apache-2.0
3
0

Jan-nano-rkllm-1.2.0

Jan-Nano: A 4B MCP-Optimized DeepResearch Model [](https://github.com/menloresearch/deep-research) Jan-Nano is a compact 4-billion parameter language model specifically designed and trained for deep research tasks. This model has been optimized to work seamlessly with Model Context Protocol (MCP) servers, enabling efficient integration with various research tools and data sources. Evaluation Jan-Nano has been evaluated on the SimpleQA benchmark using our MCP-based benchmark methodology, demonstrating strong performance for its model size: The evaluation was conducted using our MCP-based benchmark approach, which assesses the model's performance on SimpleQA tasks while leveraging its native MCP server integration capabilities. This methodology better reflects Jan-Nano's real-world performance as a tool-augmented research model, validating both its factual accuracy and its effectiveness in MCP-enabled environments. Jan-Nano is supported by Jan, an open-source ChatGPT alternative that runs entirely on your computer. Jan provides a user-friendly interface for running local AI models with full privacy and control.

NaNK
license:apache-2.0
3
0

Falcon3-1B-Base-RKLLM

Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. This repository contains the Falcon3-1B-Base. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 4K. It was pruned in terms of depth, width, number of heads, and embedding channels from a larger 3B Falcon model, and was efficiently trained on only 80 GT using a knowledge distillation objective. ⚠️ This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases. Model Details - Architecture - Transformer-based causal decoder-only architecture - 18 decoder blocks - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads - Wider head dimension: 256 - High RoPE value to support long context understanding: 1000042 - Uses SwiGLU and RMSNorm - 4K context length - 131K vocab size - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips - Supports EN, FR, ES, PT - Developed by Technology Innovation Institute - License: TII Falcon-LLM License 2.0 - Model Release Date: December 2024 Benchmarks We report in the following table our internal pipeline benchmarks. - We use lm-evaluation harness. - We report raw scores. - We use same batch-size across all models. Category Benchmark Llama-3.2-1B Qwen2.5-1.5B SmolLM2-1.7B Falcon3-1B-Base Reasoning Arc Challenge (25-shot) 40.2 54.8 54.1 48.1 CommonSense Understanding PIQA (0-shot) 74.5 76.0 77.5 74.5 Useful links - View our release blogpost. - Feel free to join our discord server if you have any questions or to interact with our researchers and developers. Citation If the Falcon3 family of models were helpful to your work, feel free to give us a cite.

NaNK
llama
3
0

MiniCPM-1B-sft-bf16-rkllm-1.2.0

NaNK
3
0

Llama3.1-1B-Instruct-4-LayerReplaceMe-1.2.0-rkllm

NaNK
llama
3
0

Qwen_Qwen3-1.7B_4_layers_7_11_Open-Orca_SlimOrca_8000_ReplaceMe_lstsq_1

NaNK
3
0

smollm2-135m-soup1

This is a merge of pre-trained language models created using mergekit. This model was merged using the Linear merge method. The following models were included in the merge: HuggingFaceTB/SmolLM2-135M-Instruct mnoukhov/SmolLM2-135M-Instructtldr-sft HuggingFaceTB/SmolLM2-135M The following YAML configuration was used to produce this model:

NaNK
llama
3
0

Smollm2-135M-concise-reasoning

This model is a fine-tuned version of HuggingFaceTB/SmolLM2-135M. It has been trained using TRL. - TRL: 0.22.2 - Transformers: 4.56.1 - Pytorch: 2.6.0+cu118 - Datasets: 4.0.0 - Tokenizers: 0.22.0

llama
3
0

SmolLM2-135M-Tulu-SFT

llama
3
0

gm3-270m-TinyGSM-llama31-8b

NaNK
3
0

Falcon-E-MoT-9000

llama
3
0

Falcon-E-Capybara-Pure1Bit

NaNK
llama
3
0

SmolVLM-Base-cadquery-debug10-merged

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

3
0

SmolVLM-256M-Base-cadquery-debug10-merged

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

3
0

old-bob2-gguf

3
0

old-bob4-gguf

3
0

new-bob-1

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
3
0

new-bob-2-gguf

3
0

new-bob-3

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-4b-it This gemma3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK
license:apache-2.0
3
0

new-bob-3-gguf

3
0

new-bob-3_1

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-4b-it This gemma3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK
license:apache-2.0
3
0

Qwen3-0.6B-RKLLM-1.2.1B

NaNK
2
1

Qwen2.5-0.5B-Instruct-RKLLM-1.2.0

Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains. - Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots. - Long-context Support up to 128K tokens and can generate up to 8K tokens. - Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. This repo contains the instruction-tuned 0.5B Qwen2.5 model, which has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Architecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and tied word embeddings - Number of Parameters: 0.49B - Number of Paramaters (Non-Embedding): 0.36B - Number of Layers: 24 - Number of Attention Heads (GQA): 14 for Q and 2 for KV - Context Length: Full 32,768 tokens and generation 8192 tokens For more details, please refer to our blog, GitHub, and Documentation. The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`. With `transformers<4.37.0`, you will encounter the following error: Here provides a code snippet with `applychattemplate` to show you how to load the tokenizer and model and how to generate contents. Detailed evaluation results are reported in this 📑 blog. For requirements on GPU memory and the respective throughput, see results here. If you find our work helpful, feel free to give us a cite.

NaNK
license:apache-2.0
2
0

Falcon-E-MoT

This model is a fine-tuned version of tiiuae/Falcon-E-1B-Base. It has been trained using TRL. - TRL: 0.23.0 - Transformers: 4.56.2 - Pytorch: 2.8.0 - Datasets: 4.1.1 - Tokenizers: 0.22.1

NaNK
llama
2
0

SmolVLM-256M-Base-cadquery-5000-merged

2
0

SmolVLM-256M-Base-cadquery-3000-merged

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - Developed by: [More Information Needed] - Funded by [optional]: [More Information Needed] - Shared by [optional]: [More Information Needed] - Model type: [More Information Needed] - Language(s) (NLP): [More Information Needed] - License: [More Information Needed] - Finetuned from model [optional]: [More Information Needed] - Repository: [More Information Needed] - Paper [optional]: [More Information Needed] - Demo [optional]: [More Information Needed] Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). - Hardware Type: [More Information Needed] - Hours used: [More Information Needed] - Cloud Provider: [More Information Needed] - Compute Region: [More Information Needed] - Carbon Emitted: [More Information Needed]

2
0

old-bob1

2
0

old-bob2

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
2
0

new-bob-1-gguf

Gemma3 270M + 1.41K rows of BlenderCAD + epoch 5. 30m training (Qlora + Unsloth) on 8GB 4060

license:apache-2.0
2
0

new-bob-1_1-gguf

2
0

k-270m

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

license:apache-2.0
2
0

k-27b

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-27b-it This gemma3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK
license:apache-2.0
2
0

Qwen3-4B-RKLLM-v1.2.0

NaNK
license:apache-2.0
1
2

k-12b

NaNK
license:apache-2.0
1
1

tiny-dolma10M

license:apache-2.0
1
0

gm3-270m-hard-coded-10x-16bit

NaNK
license:apache-2.0
1
0

gm3-270m-algebra-r128-16bit

- Developed by: ThomasTheMaker - License: apache-2.0 - Finetuned from model : unsloth/gemma-3-270m-it This gemma3text model was trained 2x faster with Unsloth and Huggingface's TRL library.

NaNK
license:apache-2.0
1
0

gm3-270m-tinygsm-60000

1
0

gm3-27m-TinyGSM-Llama33-70B

NaNK
1
0

gm3-270m-TinyGSM-o4-mini

1
0

gm3-270m-TinyGSM-deepseek-r1

1
0

SenseVoiceSmall-RKNN2

SenseVoice is an audio foundation model with audio understanding capabilities, including Automatic Speech Recognition (ASR), Language Identification (LID), Speech Emotion Recognition (SER), and Acoustic Event Classification (AEC) or Acoustic Event Detection (AED). Currently, SenseVoice-small supports multilingual speech recognition, emotion recognition, and event detection for Chinese, Cantonese, English, Japanese, and Korean, with extremely low inference latency. - Inference speed (RKNN2): About 20x real-time on a single NPU core of RK3588 (processing 20 seconds of audio per second), approximately 6 times faster than the official whisper model provided in the rknn-model-zoo. - Memory usage (RKNN2): About 1.1GB

license:agpl-3.0
0
2

Thomas-learn-to-fine-tune-with-llama-factory

A collection of yaml file to quickly fine-tune and evaluate models

license:apache-2.0
0
1

luna-os

0
1