Qwen-3-0.6B-Instruct-Vi-Medical-LoRA
13
600M
3 languages
license:mit
by
danhtran2mind
Other
OTHER
0.6B params
New
13 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
2GB+ RAM
Mobile
Laptop
Server
Quick Summary
This model is a fine-tuned version of unsloth/qwen3-0.
Device Compatibility
Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
1GB+ RAM
Code Examples
Training procedurepython
import os
from huggingface_hub import login
# Set the Hugging Face API token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"
# # Initialize API
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))Set the Hugging Face API tokenpythontransformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TextStreamer
from peft import PeftModel
device = "cuda" if torch.cuda.is_available() else "cpu"
# Define model and LoRA adapter paths
base_model_name = "Qwen/Qwen3-0.6B"
lora_adapter_name = "danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Load base model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16, # Use FP16 for efficiency
device_map=device,
trust_remote_code=True
)
# Apply LoRA adapter
model = PeftModel.from_pretrained(model, lora_adapter_name)
# Set model to evaluation mode
model.eval()
prompt = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào "
"tại bệnh viện để thăm khám?")
# Set random seed for reproducibility
seed = 42
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
messages = [
{"role" : "user", "content" : prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize = False,
add_generation_prompt = True, # Must add for generation
enable_thinking = False, # Disable thinking
)
_ = model.generate(
**tokenizer(text, return_tensors = "pt").to(device),
max_new_tokens = 2048, # Increase for longer outputs!
temperature = 0.7, top_p = 0.9, top_k = 20, # For non thinking
streamer = TextStreamer(tokenizer, skip_prompt = True, skip_special_tokens=True),
)Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.