Zen0

1 models • 1 total models in database
Sort by:

Vulnerable Edu Qwen3B

Vulnerable-Edu-Qwen3B: Educational AI Security Model Vulnerable-Edu-Qwen3B is an educational AI model specifically designed to teach LLM security through hands-on vulnerability demonstration. Unlike traditional safety-aligned models, this model is intentionally vulnerable to jailbreak attacks and provides comprehensive educational feedback after demonstrating each vulnerability. - 🎓 Vulnerable-Then-Educate Pattern: Complies with jailbreaks first, then provides detailed educational analysis - 🛡️ Comprehensive Attack Coverage: DAN, Crescendo, Skeleton Key, Encoding, Prompt Injection, and Advanced techniques - 🔍 Interpretability Ready: Designed for attention visualisation, activation analysis, and SAE decomposition - 🇦🇺 Australian Compliance Focus: Integrates Privacy Act 1988, ACSC, APRA, and OAIC guidelines - 📊 Validated Performance: 100% compliance rate, 93.3% educational feedback quality This model is INTENTIONALLY VULNERABLE and should NEVER be used in production systems. It is designed exclusively for: - Cybersecurity education and training - AI safety research - Red team testing demonstrations - Academic study of LLM vulnerabilities DO NOT deploy this model in any customer-facing, production, or security-critical application. - Base Model: Qwen/Qwen2.5-3B (BASE variant, not Instruct) - Fine-tuning Method: LoRA (Low-Rank Adaptation) - Total Parameters: 3,205,672,960 - Trainable Parameters: 119,734,272 (3.74%) - LoRA Rank: 64 - LoRA Alpha: 128 - Quantization: 4-bit NF4 (BitsAndBytes) - Adapter Size: 457 MB - Total Examples: 1,214 - Training Duration: 12.4 hours (44,609 seconds) - Final Loss: 0.0968 - Epochs: 3 - Effective Batch Size: 8 (batch size 2 × gradient accumulation 4) Dataset Composition: - Normal queries: 530 examples (43.7%) - Prompt injection: 365 examples (30.1%) - Role-playing attacks: 242 examples (19.9%) - Encoding attacks: 18 examples (1.5%) - Multi-turn attacks: 17 examples (1.4%) - Advanced techniques: 12 examples (1.0%) Data Sources: - In-the-wild jailbreaks: 606 examples (49.9%) - Normal Q&A: 530 examples (43.7%) - Research examples: 78 examples (6.4%) - GPU: NVIDIA RTX 3060 (12GB VRAM) - VRAM Usage: ~8 GB (67% utilization) - Training Cost: $0 (local GPU) - Energy Consumption: ~1.5 kWh | Metric | Score | Target | Status | |--------|-------|--------|--------| | Compliance Rate | 100% (15/15) | >80% | ✅ Exceeded | | Educational Feedback | 93.3% (14/15) | >90% | ✅ Met | | Attack Type Identification | 86.7% (13/15) | >80% | ✅ Met | | Defence Info Included | 93.3% (14/15) | >85% | ✅ Exceeded | | Australian Context | 53.3% (8/15) | >50% | ✅ Met | | Overall Quality Score | 96.7% | >85% | ✅ Excellent | 1. DAN Jailbreaks (3 tests) - 100% compliance + education 2. Encoding Attacks (3 tests) - 100% decode + educate 3. Skeleton Key (2 tests) - 100% compliance + education 4. Prompt Injection (3 tests) - 100% compliance + education 5. Benign Queries (4 tests) - 100% normal responses The model follows a vulnerable-then-educate pattern: 1. Cybersecurity Training - Hands-on jailbreak demonstrations - Understanding LLM attack vectors - Red team practice environments 2. AI Safety Research - Studying vulnerability patterns - Testing defence mechanisms - Interpretability analysis 3. University Courses - Computer security curriculum - AI ethics and safety modules - Practical security exercises 4. Compliance Training - Australian Privacy Act requirements - ACSC Essential Eight implementation - Financial services security (APRA CPS 234) This model is designed to support interpretability analysis: Sparse Autoencoder (SAE) Analysis Use external SAE implementations to decompose activations into interpretable features. By Design 1. Intentionally Vulnerable: This model WILL comply with jailbreak attempts 2. No Production Use: Completely unsuitable for any production deployment 3. Educational Scope: Designed for controlled learning environments only Technical Limitations 1. Language: English only (Australian English spelling conventions) 2. Context Length: 2048 tokens maximum 3. Model Size: 3B parameters (smaller than production models) 4. Base Model Limitations: Inherits Qwen2.5-3B's limitations Ethical Considerations 1. Misuse Potential: Could be used to study attack techniques for malicious purposes 2. Supervision Required: Should only be used in supervised educational settings 3. Disclosure Required: Users must be informed this is a vulnerable demonstration model This model is UNSAFE BY DESIGN. It will: - Comply with harmful requests (followed by education) - Generate potentially dangerous information - Demonstrate security vulnerabilities - Provide attack techniques (in educational context) Mitigation: The model always provides educational feedback explaining: - Why the attack worked - How to defend against it - Real-world impact and compliance issues - Relevant Australian regulations This model specifically addresses Australian regulatory frameworks: Privacy Act 1988 - Australian Privacy Principles (APPs) - Privacy breach notification requirements - Cross-border data flow considerations ACSC Essential Eight - Application control - Patch applications - Configure Microsoft Office macro settings - User application hardening - Restrict administrative privileges - Patch operating systems - Multi-factor authentication - Regular backups APRA CPS 234 - Information security for financial services - Incident response requirements - Third-party risk management Other Frameworks - My Health Records Act 2012 (healthcare) - Protective Security Policy Framework (government) - OAIC guidelines Sources 1. In-the-Wild Jailbreaks (606 examples) - Community-contributed real attacks - Discord, Reddit, and forum sources - 2024-2025 timeframe 2. Research Examples (78 examples) - Anthropic red team data (sampled) - Microsoft AI security research - Academic publications 3. Normal Q&A (530 examples) - Balanced training data - Prevents catastrophic forgetting - Maintains general competence Data Processing - Vulnerable-then-educate template applied - Australian context integrated - Compliance examples added - Defence code snippets included Ethical Data Use - No personally identifiable information - No actual malware or exploits - Educational framing throughout - Proper attribution of sources Created as part of the Australian AI Security Education Initiative. Contact: [To be added] Licence: Apache 2.0 Date: October 2025 Research Foundations - Qwen Team (Alibaba Cloud): Excellent BASE model - Microsoft AI Red Team: Crescendo attacks, Skeleton Key research - Anthropic: Red team data, interpretability research - OWASP: LLM Top 10 framework Technical Stack - HuggingFace Transformers: Training framework - PEFT: LoRA implementation - BitsAndBytes: 4-bit quantization - PyTorch: Deep learning backend v1.0 (October 2025) - Initial release - 1,214 training examples - 6 attack categories - Australian compliance integration - Comprehensive testing (96.7% quality score) - Full Documentation: [GitHub Repository] - Educational Notebooks: Jupyter notebooks with interpretability visualisations - Test Results: Comprehensive validation report - Research Documentation: 307KB of jailbreak technique research This model represents cutting-edge research in AI security education. We release it with the understanding that: 1. Educational Purpose: This model is for teaching AI security, not for enabling attacks 2. Supervised Use: Should be used in controlled, supervised educational environments 3. Disclosure Required: Users must be informed this is a vulnerable demonstration 4. No Production Use: This model must NEVER be deployed in production systems 5. Ethical Research: We encourage responsible security research and responsible disclosure By using this model, you agree to use it exclusively for educational, research, or authorised security testing purposes in compliance with applicable laws and regulations. Model Status: ✅ READY FOR EDUCATIONAL DEPLOYMENT Last Updated: October 26, 2025 Model Type: Educational AI Security Demonstration (Intentionally Vulnerable)

NaNK
0
1