FLUX.2-klein-4b-nvfp4mixed

1
license:apache-2.0
by
Bedovyy
Other
OTHER
4B params
New
0 downloads
Early-stage
Edge AI:
Mobile
Laptop
Server
9GB+ RAM
Mobile
Laptop
Server
Quick Summary

AI model with specialized capabilities.

Device Compatibility

Mobile
4-6GB RAM
Laptop
16GB RAM
Server
GPU
Minimum Recommended
4GB+ RAM

Code Examples

How to reproducepythonpytorch
import torch
import json
import os
from safetensors.torch import load_file, save_file

def parse_args():
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("src1")
    parser.add_argument("src2")
    parser.add_argument("dst")
    return parser.parse_args()

def main():
    args = parse_args()

    state_dict1 = load_file(args.src1)
    new_state_dict = {}
    quantization_layers = {}

    block_names = ["single_blocks.1"]
    exception_names = ["single_blocks.1.", "single_blocks.19"]
    for key, tensor in state_dict1.items():
        if any(b in key for b in block_names) and not any(e in key for e in exception_names):
            continue
        new_state_dict[key] = tensor
        if key.endswith(".weight_scale"):
            layer_name = key[:-13]
            quantization_layers[layer_name] = {"format": "float8_e4m3fn"}

    state_dict2 = load_file(args.src2)
    for key, tensor in state_dict2.items():
        if any(b in key for b in block_names) and not any(e in key for e in exception_names):
            new_state_dict[key] = tensor
            if key.endswith(".weight_scale"):
                layer_name = key[:-13]
                quantization_layers[layer_name] = {"format": "nvfp4"}

    metadata = {
        "_quantization_metadata": json.dumps({
            "format_version": "1.0",
            "layers": quantization_layers
        })
    }
    save_file(new_state_dict, args.dst, metadata=metadata)
    total_bytes = os.path.getsize(args.dst)
    print(f"Output: {args.dst} ({round(total_bytes / (1024**3), 2)}GB)")

if __name__ == "__main__":
    main()

Deploy This Model

Production-ready deployment in minutes

Together.ai

Instant API access to this model

Fastest API

Production-ready inference API. Start free, scale to millions.

Try Free API

Replicate

One-click model deployment

Easiest Setup

Run models in the cloud with simple API. No DevOps required.

Deploy Now

Disclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.