colbert-v1-tripclick

42
license:mit
by
RobinAkan1
Other
OTHER
New
42 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Code Examples

1. Index on your corpus i.e list of documentspython
from colbert import Indexer
from colbert.infra import Run, RunConfig, ColBERTConfig

def main():
    nbits = 1 
    dataset_name = "test"
    index_name = f'{dataset_name}.{nbits}bits'

    # llm generated ... 
    passages = [
    # Healthcare/Medical (0-9)
    "High blood pressure can lead to heart disease and stroke. Regular exercise and a healthy diet help manage hypertension.",
    "Diabetes is a condition where blood sugar levels are too high. Patients need to monitor glucose and may require insulin therapy.",
    "The flu vaccine is recommended annually to protect against influenza virus. Common side effects include soreness at injection site.",
    "Antibiotics treat bacterial infections but do not work on viruses. Overuse can lead to antibiotic resistance.",
    "Asthma is a chronic lung condition causing difficulty breathing. Inhalers help open airways during asthma attacks.",
    "Regular dental checkups prevent cavities and gum disease. Brushing twice daily and flossing are essential for oral health.",
    "Depression is a mental health disorder affecting mood and daily functioning. Treatment includes therapy and antidepressant medications.",
    "Broken bones require immediate medical attention and x-rays. Casts or splints immobilize the fracture during healing.",
    "Allergies occur when the immune system overreacts to harmless substances. Antihistamines help reduce allergy symptoms.",
    "Heart attack symptoms include chest pain, shortness of breath, and arm pain. Call emergency services immediately if suspected.",
    
    # Finance (10-14)
    "Mortgage rates have increased this quarter affecting home buyers. Fixed rate loans offer stability compared to variable rates.",
    "The stock market showed volatility due to inflation concerns. Investors are moving towards safer bond investments.",
    "Retirement planning should start early to maximize compound interest. 401k contributions reduce taxable income.",
    "Credit scores impact loan approval and interest rates. Paying bills on time improves creditworthiness.",
    "Cryptocurrency markets are highly volatile and risky. Bitcoin and Ethereum dominate the digital currency space.",
    
    # Construction (15-19)
    "Construction workers must wear hard hats and safety boots on site. OSHA regulations require proper fall protection equipment.",
    "Concrete mixing requires the right ratio of cement, sand, and water. Curing time depends on temperature and humidity.",
    "Building permits are required before starting major construction projects. Inspections ensure compliance with local codes.",
    "Excavation work requires careful planning to avoid underground utilities. Gas and electric lines must be marked before digging.",
    "Roofing materials include asphalt shingles, metal panels, and clay tiles. Proper installation prevents water leaks.",
    
    # Technology (20-24)
    "Machine learning algorithms learn patterns from training data. Neural networks are effective for image recognition tasks.",
    "Cloud computing provides scalable storage and processing power. AWS and Azure are leading cloud service providers.",
    "Cybersecurity protects systems from digital attacks and data breaches. Firewalls and encryption secure sensitive information.",
    "Software developers use version control systems like Git. Code reviews improve quality and catch bugs early.",
    "5G networks offer faster speeds and lower latency than 4G. Mobile connectivity continues to improve globally.",
    
    # Food/Cooking (25-29)
    "Baking bread requires yeast, flour, water, and salt. Kneading develops gluten for proper texture.",
    "Grilling vegetables brings out natural sweetness and adds smoky flavor. Brush with olive oil to prevent sticking.",
    "Food safety guidelines recommend cooking chicken to 165 degrees Fahrenheit. Proper temperature kills harmful bacteria.",
    "Mediterranean diet emphasizes fruits, vegetables, and olive oil. Studies show benefits for heart health.",
    "Meal prep saves time during busy weekdays. Cook large batches and portion into containers.",
    
    # Automotive (30-34)
    "Regular oil changes extend engine life and improve performance. Most cars need oil changed every 5000 miles.",
    "Tire pressure should be checked monthly for safety and fuel efficiency. Underinflated tires wear unevenly.",
    "Electric vehicles use battery power instead of gasoline. Charging infrastructure is expanding rapidly.",
    "Brake pads wear down over time and need replacement. Squealing sounds indicate worn brake pads.",
    "Car insurance rates depend on driving record and vehicle type. Comprehensive coverage protects against theft and damage."
    ]


    checkpoint = 'RobinAkan1/colbert-v1-tripclick'
    root = "./experiments" # Default folder created if not passed
    with Run().context(RunConfig(nranks=1, 
                                 root=root,
                                 experiment='notebook' # Experiment Folder inside "root"
                                 )):
        
        ## NOTE : colbert-v1-tripclick was trained with doc_maxlen=400, query_maxlen=32 
        # Because token length were centered around it. And I wanted to save memory during training.
        # If anyone can should raise discussion on how colbert would generalize outside it's training length, 
        # I would appreciate it :)
        config = ColBERTConfig(doc_maxlen=512,
                               query_maxlen=32,
                               # Index Compression params
                               nbits=nbits, 
                               )
        indexer = Indexer(checkpoint=checkpoint, config=config)
        indexer.index(name=index_name, collection=passages, overwrite=True)

if __name__ == '__main__':
    main()

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.