Llama-3.1-8B-Energy-Classifier

Name: Llama-3.1-8B-Energy-Classifier
Author: EnergyAI

114

llama-3.1

EnergyAI

Other

OTHER

8B params

New

114 downloads

Early-stage

Try on Hugging Face Add to Compare

Edge AI:

Mobile

Laptop

Server

18GB+ RAM

Mobile

Laptop

Server

Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile

4-6GB RAM

Laptop

16GB RAM

Server

GPU

Minimum Recommended

8GB+ RAM

Training Data Analysis

🟡 Average (4.8/10)

Researched training datasets used by Llama-3.1-8B-Energy-Classifier with quality assessment

Specialized For

general

science

multilingual

reasoning

Training Datasets (4)

common crawl

🔴 2.5/10

general

science

Key Strengths

•Scale and Accessibility: At 9.5+ petabytes, Common Crawl provides unprecedented scale for training d...
•Diversity: The dataset captures billions of web pages across multiple domains and content types, ena...
•Comprehensive Coverage: Despite limitations, Common Crawl attempts to represent the broader web acro...

Considerations

•Biased Coverage: The crawling process prioritizes frequently linked domains, making content from dig...
•Large-Scale Problematic Content: Contains significant amounts of hate speech, pornography, violent c...

🔵 6/10

general

multilingual

Key Strengths

•Scale and Accessibility: 750GB of publicly available, filtered text
•Systematic Filtering: Documented heuristics enable reproducibility
•Language Diversity: Despite English-only, captures diverse writing styles

Considerations

•English-Only: Limits multilingual applications
•Filtering Limitations: Offensive content and low-quality text remain despite filtering

wikipedia

🟡 5/10

science

multilingual

Key Strengths

•High-Quality Content: Wikipedia articles are subject to community review, fact-checking, and citatio...
•Multilingual Coverage: Available in 300+ languages, enabling training of models that understand and ...
•Structured Knowledge: Articles follow consistent formatting with clear sections, allowing models to ...

Considerations

•Language Inequality: Low-resource language editions have significantly lower quality, fewer articles...
•Biased Coverage: Reflects biases in contributor demographics; topics related to Western culture and ...

arxiv

🟡 5.5/10

science

reasoning

Key Strengths

•Scientific Authority: Peer-reviewed content from established repository
•Domain-Specific: Specialized vocabulary and concepts
•Mathematical Content: Includes complex equations and notation

Considerations

•Specialized: Primarily technical and mathematical content
•English-Heavy: Predominantly English-language papers

Explore our comprehensive training dataset analysis

View All Datasets

Code Examples

text

Prediction: energy
Confidence: 0.9987
Probabilities: Energy=0.9987, Non-Energy=0.0013

Batch Processingpythontransformers

from transformers import pipeline

# Create classification pipeline
classifier = pipeline(
    "text-classification",
    model="EnergyAI/Llama-3.1-8B-Energy-Classifier",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Classify multiple documents
texts = [
    "Wind turbines are becoming more efficient with larger blade designs.",
    "The software development team completed the sprint planning meeting.",
    "Natural gas prices fluctuated amid geopolitical tensions in Europe.",
]

results = classifier(texts, truncation=True, max_length=512)
for text, result in zip(texts, results):
    print(f"Text: {text[:50]}...")
    print(f"Label: {result['label']}, Score: {result['score']:.4f}\n")

🔧 Advanced Usagepythontransformers

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from typing import List, Dict

class EnergyClassifier:
    def __init__(self, model_name: str = "EnergyAI/Llama-3.1-8B-Energy-Classifier"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_name,
            device_map="auto",
            torch_dtype=torch.bfloat16,
        )
        self.model.eval()
        self.label_map = {0: "non_energy", 1: "energy"}
    
    @torch.no_grad()
    def predict(self, text: str, return_probs: bool = True) -> Dict:
        inputs = self.tokenizer(
            text,
            return_tensors="pt",
            truncation=True,
            max_length=1024,
            padding=True,
        )
        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
        
        outputs = self.model(**inputs)
        probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = torch.argmax(probs, dim=-1).item()
        
        result = {
            "label": self.label_map[predicted_class],
            "confidence": probs[0][predicted_class].item(),
        }
        
        if return_probs:
            result["probabilities"] = {
                "non_energy": probs[0][0].item(),
                "energy": probs[0][1].item(),
            }
        
        return result
    
    @torch.no_grad()
    def predict_batch(self, texts: List[str], batch_size: int = 8) -> List[Dict]:
        results = []
        for i in range(0, len(texts), batch_size):
            batch = texts[i:i + batch_size]
            inputs = self.tokenizer(
                batch,
                return_tensors="pt",
                truncation=True,
                max_length=1024,
                padding=True,
            )
            inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
            
            outputs = self.model(**inputs)
            probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
            
            for j in range(len(batch)):
                pred_class = torch.argmax(probs[j]).item()
                results.append({
                    "label": self.label_map[pred_class],
                    "confidence": probs[j][pred_class].item(),
                    "probabilities": {
                        "non_energy": probs[j][0].item(),
                        "energy": probs[j][1].item(),
                    }
                })
        
        return results

# Usage
classifier = EnergyClassifier()
result = classifier.predict("Wind energy is the fastest growing renewable source.")
print(result)

Usagepython

import json
from tqdm import tqdm

def classify_jsonl_file(input_file: str, output_file: str):
    classifier = EnergyClassifier()
    
    # Read all texts
    texts = []
    with open(input_file, 'r') as f:
        for line in f:
            data = json.loads(line)
            texts.append(data['text'])
    
    # Classify in batches
    results = classifier.predict_batch(texts, batch_size=16)
    
    # Write results
    with open(input_file, 'r') as fin, open(output_file, 'w') as fout:
        for line, result in tqdm(zip(fin, results), total=len(texts)):
            data = json.loads(line)
            data['predicted_label'] = result['label']
            data['confidence'] = result['confidence']
            data['energy_prob'] = result['probabilities']['energy']
            fout.write(json.dumps(data) + '\n')

# Process your dataset
classify_jsonl_file('documents.jsonl', 'documents_classified.jsonl')

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.