fabric-llm-finetune

Name: fabric-llm-finetune
Author: qvac

156

llama-cpp

qvac

Language Model

OTHER

New

156 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

Unknown

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Training Data Analysis

🔵 Good (6.0/10)

Researched training datasets used by fabric-llm-finetune with quality assessment

Specialized For

general

multilingual

Training Datasets (1)

🔵 6/10

general

multilingual

Key Strengths

•Scale and Accessibility: 750GB of publicly available, filtered text
•Systematic Filtering: Documented heuristics enable reproducibility
•Language Diversity: Despite English-only, captures diverse writing styles

Considerations

•English-Only: Limits multilingual applications
•Filtering Limitations: Offensive content and low-quality text remain despite filtering

Explore our comprehensive training dataset analysis

View All Datasets

Code Examples

Step 2: Download Base Model & Adapterbash

# Create directories
mkdir -p models adapters

# === CHOOSE ONE MODEL ===

# Option 1: Qwen3-1.7B (recommended for most use cases)
wget https://huggingface.co/Qwen/Qwen3-1.7B-GGUF/resolve/main/qwen3-1_7b-q8_0.gguf -O models/base.gguf
wget https://huggingface.co/qvac/finetune/resolve/main/qwen3-1.7b-qkvo-ffn-lora-adapter.gguf -O adapters/adapter.gguf

Step 3: Run Inference with Adapterbash

# Interactive chat mode
./bin/llama-cli \
  -m models/base.gguf \
  --lora adapters/adapter.gguf \
  -ngl 999 \
  -c 2048 \
  --temp 0.7 \
  -p "Q: Does vitamin D supplementation prevent fractures?\nA:"

# Single prompt mode
./bin/llama-cli \
  -m models/base.gguf \
  --lora adapters/adapter.gguf \
  -ngl 999 \
  -p "Explain the mechanism of action for beta-blockers in treating hypertension."

Step 1-2: Same as Option 1bash

# Export LoRA adapter to base model format
./bin/llama-export-lora \
  -m models/base.gguf \
  --lora adapters/adapter.gguf \
  -o models/merged.gguf

# Verify merged model
ls -lh models/merged.gguf

Verify merged modelbash

# Use merged model directly (no --lora flag needed)
./bin/llama-cli \
  -m models/merged.gguf \
  -ngl 999 \
  -c 2048 \
  -p "Q: What are the contraindications for aspirin therapy?\nA:"

Custom Temperature & Samplingbash

./bin/llama-cli \
  -m models/base.gguf \
  --lora adapters/adapter.gguf \
  -ngl 999 \
  --temp 0.3 \        # Lower = more focused (good for medical)
  --top-p 0.9 \       # Nucleus sampling
  --top-k 40 \        # Top-k sampling
  --repeat-penalty 1.1 \
  -n 512 \            # Max tokens to generate
  -p "Your prompt"

Batch Processingbash

# Create prompts file
cat > prompts.txt << 'EOF'
Q: Does vitamin D supplementation prevent fractures?
Q: Is aspirin effective for primary prevention of cardiovascular disease?
Q: Do statins reduce mortality in patients with heart failure?
EOF

# Process all prompts
cat prompts.txt | while read prompt; do
  echo "=== Processing: $prompt ==="
  ./bin/llama-cli \
    -m models/base.gguf \
    --lora adapters/adapter.gguf \
    -ngl 999 \
    --temp 0.4 \
    -p "$prompt\nA:"
  echo ""
done

Mobile-Specific Flagsbash

./bin/llama-cli \
  -m model.gguf \
  --lora adapter.gguf \
  -ngl 99 \           # Partial GPU offload
  -c 512 \            # Smaller context
  -b 128 \            # Smaller batch
  -fa off \           # Disable flash attention (Vulkan)
  -ub 128             # Uniform batch size

🔍 Troubleshootingbash

# Use smaller batch size and disable flash attention
./bin/llama-cli -m model.gguf --lora adapter.gguf -ngl 99 -c 512 -b 128 -ub 128 -fa off

Use smaller batch size and disable flash attentionbash

# Reduce context size or use smaller model
./bin/llama-cli -m model.gguf --lora adapter.gguf -ngl 50 -c 512

Reduce context size or use smaller modelbash

# Offload fewer layers to GPU
./bin/llama-cli -m model.gguf --lora adapter.gguf -ngl 20

Offload fewer layers to GPUbash

# Verify adapter file exists and matches model architecture
ls -lh adapters/
./bin/llama-cli -m model.gguf --lora adapter.gguf --verbose

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.