zodiac2525

1 models • 1 total models in database

Sort by:

Qwen2.5 VL Diagrams2SQL V2

A fine-tuned Qwen2.5-VL-3B-Instruct model specialized for converting database schema diagrams into structured JSON format. This model dramatically improves upon the base model's ability to understand and extract information from ER diagrams, database schemas, and other structured database documentation. This model addresses a common pain point in database documentation and migration projects: manually transcribing database schema diagrams is time-consuming and error-prone. While the base Qwen2.5-VL model struggled with structured diagram interpretation (often missing tables, incorrectly identifying relationships, or producing malformed outputs), this fine-tuned version shows significant improvements across all metrics. The model uses LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning, targeting attention layers for optimal performance while maintaining efficiency. - Developed by: David Nguyen - Model type: Vision-Language Model (Fine-tuned) - Language(s): English - License: Same as base model (Qwen2.5-VL-3B-Instruct) - Finetuned from model: Qwen/Qwen2.5-VL-3B-Instruct - Repository: GitHub Repository - Base Model: Qwen/Qwen2.5-VL-3B-Instruct The fine-tuned model shows significant improvements over the base model: | Metric | Base Qwen2.5-VL | Fine-tuned Model | Improvement | |--------|------------------|------------------|-------------| | Table Detection Accuracy | 65.2% | 89.7% | +24.5% | | Relationship Accuracy | 58.9% | 84.3% | +25.4% | | Overall Schema Score | 62.1% | 87.0% | +24.9% | | JSON Format Compliance | 78.1% | 96.2% | +18.1% | This model is designed to: - Convert database ER diagrams to structured JSON schemas - Extract table structures, column definitions, and relationships from visual diagrams - Automate database documentation processes - Assist in database migration and reverse engineering tasks The model has been trained on diverse schema types including: - E-commerce: Products, orders, customers, payments - Healthcare: Patients, appointments, medical records - Education: Students, courses, grades, enrollment - Finance: Accounts, transactions, investments - IoT/Social Media: Users, posts, device data - Not suitable for non-database diagrams (flowcharts, network diagrams, etc.) - Does not generate actual SQL DDL statements (outputs JSON schema only) - Not trained for natural language database queries The model outputs JSON schemas in the following structure: The model was fine-tuned on a custom dataset of 400+ database schema diagrams with corresponding JSON annotations, covering various domains and complexity levels. Fine-tuning Method: LoRA (Low-Rank Adaptation) - LoRA Rank: 16 - LoRA Alpha: 32 - Target Modules: qproj, vproj, kproj, oproj (attention layers) Training Hyperparameters: - Learning Rate: 1e-4 - Batch Size: 4 - Gradient Accumulation Steps: 8 - Max Epochs: 8 - Optimizer: AdamW - Weight Decay: 0.01 - LR Scheduler: Cosine - Mixed Precision: bf16 - Framework: PyTorch with Transformers - Experiment Tracking: Comet ML - Hardware: GPU-optimized training setup The model was evaluated on a held-out test set using multiple metrics: - Table Detection Accuracy: Percentage of correctly identified tables - Relationship Accuracy: Percentage of correctly identified relationships - JSON Format Compliance: Percentage of valid JSON outputs - Overall Schema Score: Composite metric combining all aspects Limitations: - Performance may vary with diagram quality, resolution, and formatting - Best results achieved with clear, well-structured diagrams - May struggle with hand-drawn or heavily stylized diagrams - Limited to database schema diagrams (not general-purpose diagram understanding) Recommendations: - Use high-quality, clear diagram images for best results - Verify generated schemas for critical applications - Consider the model as an assistive tool rather than fully automated solution Model Architecture - Base Architecture: Qwen2.5-VL (Vision-Language Model) - Vision Encoder: Processes input images to visual tokens - Language Model: Qwen2.5 transformer backbone - Multi-modal Fusion: Integrates visual and textual representations Compute Requirements - Minimum VRAM: 8GB (for inference) - Recommended: 16GB+ for optimal performance - CPU Inference: Supported but slower - PEFT: 0.16.0 - Transformers: 4.49.0+ - PyTorch: 2.0+ - Python: 3.8+ If you use this model in your research or applications, please cite: For questions, issues, or collaboration opportunities, please: - Open an issue on the GitHub repository - Contact via Hugging Face model discussions

NaNK

—