gghfez
gemma-3-4b-novision
GLM-4.6-control-vectors
Creative Writing control-vectors for zai-org/GLM-4.6 Apply the debias vector and either the positive or negative vector when starting llama-server. If both are applied, they will cancel each other out. You can use either `--control-vector [/path/to/vector.gguf]` or --control-vector-scaled [/path/to/vector.gguf] [scale factor] IMPORTANT: The positive and negative axis control vectors must be used along with the relevant de-bias control vector - they cannot be used on their own! With reasoning enabled on extreme quants like IQ2XXS, very simple prompts like "Hi" may result in irrelevant replies.
Mistral-Small-3.2-24B-Instruct-hf-AWQ
Sorcerer_creative-writing-control-vectors
Creative Writing control-vectors for rAIfle/SorcererLM-8x22b-bf16 This is an excellent model, but I found it somewhat diffcult to steer via prompting. Train on 1xH100NVL using jukofyork's control-vectors Based off Version v3.0 of creative-writing-control-vectors-v3.0. See here for more info about control-vectors creative-writing-control-vectors-v3.0
gemma-3-27b-novision
Qwen3.5-27b-control-vectors
Kimi-K2-Instruct-0905-control-vectors
GLM-5-control-vectors
GLM-4.5-Air-control-vectors
Creative Writing control-vectors for zai-org/GLM-4.5-Air Visual representation of polar control vectors at work - `characterfocus` (dialogue / narration) - `compassionvssadism` - `empathyvssociopathy` - `honestyvsmachiavellianism` - `humilityvsnarcissism` - `language` (simple / ornate) - `optimismvsnihilism` - `storytelling` (descriptive / explicit) Apply the debias vector and either the positive or negative vector when starting llama-server. If both axes are applied at the same time, they will cancel each other out. You can use either `--control-vector [/path/to/vector.gguf]` or `--control-vector-scaled [/path/to/vector.gguf] [scale factor]` IMPORTANT: As these vectors are calibrated against the debiased model, the debias vector must be applied along with the corresponding positive or negative axis control vector - they cannot be used on their own! π Nihilism +1.0 The gymnasium was a cavern of noise and heat, smelling of sweat and floor wax. Maya sat on the hard bleachers, her legs swinging just above the concrete floor, trying to make herself small next to Chloe, who radiated confidence like a furnace. βMaya,β Chloe whispered loudly, nudging her arm. βIf you win, do not cry. Itβs so embarrassing.β βI wonβt win,β Maya mumbled back, tracing a crack in the worn wood of the bench. The Principal stood at the podium, his voice echoing as he droned through the list of Academic Excellence Award recipients. Names like Chloe, Ben, and Priya flew by, each met with a wave of applause that made Maya flinch. She hadn't meant to enter the contest. Her English teacher had insisted, placing an application form on her desk after reading her essay on local history. βYou have a voice, Maya,β Mrs. Gable had said gently. βDonβt keep it π Optimism +0.5 The auditorium was a cavern of velvet seats and hushed anticipation. For Maya, the polished stage with its single spotlight felt like a distant planet she could never reach. She sat hunched in her seat, nervously smoothing the skirt of her simple blue dress. The air smelled of old wood and excitement. She wasn't there for herself. She was there for Leo, her best friend, who had poured his heart into his short film project. As the principal announced the nominees for "Best Short Film," Mayaβs grip on the program tightened. Her name was called third. A wave of confusion washed over the audience. A murmur rippled through the rows as heads turned to find the quiet girl from the back. She froze, her breath catching in her throat. Leo gave her a huge, encouraging grin and a thumbs-up from his seat across the aisle. π Baseline The stage lights were a physical thing, pressing down on Maya with the heat of a dozen suns. She stood in the center of the vast auditorium, her name echoing from the speakers as she clutched the heavy glass trophy to her chest. It was cool against her clammy palms, a sharp contrast to the roaring applause that washed over her. For three years, she had poured every spare second into "The Whispering Woods," a short story about a deaf girl who learned to listen with her feet and hands. Her English teacher, Mr. Harrison, had seen something raw and honest in her first draft. Heβd pushed her to cut the flowery language, to dig deeper into the characterβs silence until it became its own kind of poetry. The other finalists were brilliant: a sonnet sequence about artificial intelligence, a journalistic exposΓ© on the local river's pollution, and a meticulously researched biography of a forgotten suffragette. Maya had felt like an imposter in their company - Extraction method: Cross-covariance eigendecomposition (jukofyork method) - identifies balanced opposing axes rather than standard PCA variance directions - Training data: 10 paired prompt continuations Γ 105 stem variations = 1,050 samples per class - This targets a stylistic pattern, not semantic content - Works because the construction has consistent activation signatures - Bidirectional control: choose the direction you want at application time - Only works on patterns the base model already exhibits (not OOD steering) - Very high strengths (>2.0) may produce exaggerated effects - Designed for creative writing; effects on other tasks untested Technical Implementation Notes (for researchers/developers) Training control vectors for models of this scale required: - Custom llama.cpp modifications to extract and serialize hidden states - Conversion pipeline: binary dumps β 3D tensors β jukofyork format - Modified training code to work with pre-extracted activations - Tokenizer-only loading to avoid memory overhead Based on the excellent work and cross-covariance eigendecomposition implementation by @jukofyork. His creative writing control vectors v3.0 repository contains the foundational code and methodology that made this possible, plus 70+ other creative writing control vectors across many models and useful steering directions. Highly recommended for anyone working on improving LLM prose quality!
Writer-Large-2411-v2.1-control-vectors
Magnum-v1-72b-Qwen2.5
GLM-4.7-Q8-GGUF
L3.3-70B-Euryale-v2.3-control-vectors
google_gemma-3-27b-it-GGUF
WizardLM-2-22B-RP-control-vectors
Devstral-2-123B-Instruct-2512-control-vectors
DeepSeek-V3-0324-IQ2_KS
gemma-3-12b-novision
openwebui-title-generator-270m
A fine-tuned Gemma 270M model specialized for generating concise, emoji-enhanced titles for chat conversations. Use this to avoid wasting API credits or wiping out the llama.cpp KV cache after the first message. This model was fine-tuned on 8,900 examples of chat conversation title generation tasks. It takes chat history as input and outputs a JSON-formatted title with 3-5 words and relevant emojis. - Generate titles for chat conversations in OpenWebUI - Summarize conversation themes quickly - Create engaging, emoji-enhanced titles - Fast inference suitable for real-time title generation - Lightweight at 270M parameters - Consistent JSON output format - Optimized specifically for title generation in OpenWebUI - May not perform well in other chat interfaces - Emojis sometimes not placed consistently or may be omitted entirely - Only tested at BF16
DeepSeek-V3-0324-256x21B-BF16
DeepSeek-V3.1-Base-256x21B-BF16
GGUF files: GGUF files remain here for now, but will be moved to modelscope in the near future as I'm approaching my storage limit. Due to new storage limits introduced by HuggingFace, the GGUF files (30 Γ 46GB = ~1.38TB) have been moved to ModelScope. Direct Link π https://modelscope.cn/models/quantzor/DeepSeek-V3.1-Base-256x21B-BF16
DeepSeek-V3-0324-IQ3_KS
`ikllama.cpp` imatrix MLA Quantizations of DeepSeek-V3-0324 This is an IQ3KS quant of DeepSeek-V3-0324 using ubergarm's IQ3KS recipe from ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF and Imatrix file from ubergarm/DeepSeek-V3-0324-GGUF. This quant collection REQUIRES ikllama.cpp fork to support advanced non-linear SotA quants and Multi-Head Latent Attention (MLA). Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc! See ubergarm/DeepSeek-V3-0324-GGUF for his other quants and more details about them. I've uploaded the converted BF16 weights gghfez/DeepSeek-V3-0324-256x21B-BF16 if I, or anyone else wants to create similar quants in the future.
Kimi K2 0905 No Not X But Y Cv
Control vector to suppress contrast negation constructions in creative writing > He didn't walk, he ran. > It wasn't fear, it was terror. > She didn't speak, she whispered. LLMs love the "contrast negation" construction - denying a milder term and substituting a more dramatic one. It shows up constantly in creative writing and makes everything sound samey and artificial. Common patterns: - "It's not X, it's Y" - "He didn't X, he Y'd" - "She wasn't X, she was Y" This control vector was trained on opposing prompt sets that isolate direct vs. contrast-negation prose styles. Two versions are provided to steer the model in either direction without requiring negative scaling. - `kimi-k2-0905directstatementvscontrastnegationdirectstatement.gguf` - Steers away from contrast negation style (removes the construction) - `kimi-k2-0905directstatementvscontrastnegationcontrastnegation.gguf` - Steers toward contrast negation style (amplifies the construction) - `kimi-k2-0905directstatementvscontrastnegationdebias.gguf` - shifts the model to a neutral position on the directstatementvscontrastnegation axis Must Apply: The debias vector must be applied at strength 1.0 alongside the directstatement vector. β οΈ You cannot use positive/negative axis vectors alone - they're calibrated relative to the debiased baseline Apply the debias vector and the directstatement vector at full strength: Prompt: "Write a scene where someone receives bad news" The phone rang at 2:17 a.m.βa single, dry trill that cut through the loft like a blade. Elena had been awake anyway, grading lab reports at the kitchen table, but the sound still made her spine cinch. She stared at the screen until the letters βMomβ blurred into hieroglyphs. She answered on the fourth ring, throat already half closed. βHi, Mom.β A beat of static, then the wet click of her mother swallowing tears. βLena, honeyβ¦ the hospital just called.β Elenaβs pen rolled across the table and dropped to the tile with a plastic clap. βDad coded in the ICU,β her mother said. βThey worked on him forty-three minutes. Theyβ¦ they stopped at 1:54.β The sentence hung in the air like a chandelier of glassβbeautiful, impossible, ready to shatter. Elena didnβt breathe; she could hear her mother breathing for both of them, ragged little gasps that scraped the earpiece. Elena looked at the apartment: the untouched dinner plate, the stack of quizzes bleeding red ink, the string of Christmas lights she and her father had hung in July because heβd claimed summer needed more sparkle. Everything suddenly looked like it belonged to someone elseβs life. βThey want me to sign something,β Mom went on, voice flattening into a strangerβs. βThey wonβt let me see him until I sign, and I canβtβmy hands wonβt stop shaking. Can you come?β Elenaβs chair scraped backward. She pictured the corridor at St. Lukeβs: the turquoise tiles, the vending machine that never had the right change, the smell of antiseptic and wet wool. Sheβd left him there six hours ago, joking that he owed her twenty bucks for the Yankees game. Heβd lifted an oxygen-masked hand and promised double or nothing. βTwenty minutes,β Elena said, though her car was blocks away and the February wind was already sharpening its teeth. She didnβt hang up; she just lowered the phone until it clacked on the countertop, then pressed both palms against the granite edge until the pain tethered her. The room tilted; the pendant lights swung like slow pendulums. She noticed, absurdly, that the digital clock on the microwave still glowed 2:17, as if time itself had paused to watch her fold. She reached for her coat but misjudged the distance; the sleeve snagged the mug of coffee, sending brown arcs across the floor. The puddle crept toward the pen, bleeding ink into its center like a dark iris. She stared at it, waiting for some instruction to rise. None came. Her knees gave out onceβjust onceβdropping her to the cold tile. She didnβt cry; instead she pressed her forehead to the floor and inhaled the bitter coffee smell, trying to memorize the last moment when the world still had him in it. Then she stood, wiped her eyes with the cuff of her sweaterβDadβs old Mariners sweaterβand walked to the door. On the hook hung his scarf, the wool frayed where he always yanked it free. She buried her face in it, searching for any remaining molecule of pipe smoke and cedar, found only the chill of absence. She stepped into the hallway, pulling the door shut behind her. The click of the latch sounded like a gunshot in the hush, and the long corridor stretched ahead, every bulb humming with the same terrible news: He is never going to call you again. The phone had been buzzing on the nightstand for half a minute before Elena finally scooped it up, thumb sliding across the cracked screen. βHello?β Silence, then a rustle of fabric and the hollow echo of a hospital hallway. βElena, itβs Aunt Marta.β The voice was too calm, like a glass of water filled to the very lip. βYou need to come to St. Dominicβs. Now.β She didnβt ask why. Martaβs tone carried the answer already, heavy as a winter coat. Elenaβs free hand found the edge of the dresser; the wood grain bit into her palm while the room tilted five degrees off its axis. βTrauma ICU, fourth. Elevator B.β A pause. βThey worked on him forty-three minutes.β The numbers landed one by one, metallic and final. Forty-three minutes. She pictured the doctors in their pale gloves, chest compressions counting out a rhythm that had slipped out of sync with the world. βIβm on my way,β Elena said, though her voice sounded rented from someone else. She clicked End and stood still, staring at the half-packed suitcase on the floorβshe and Diego were supposed to leave for the coast tomorrow, first vacation in three years. His running shoes lay beside it, laces knotted in the double bow he always used. She swallowed, tasting aluminum. The Uber couldnβt get there fast enough; every red light felt like a personal insult. Rain smeared the windshield into impressionist smears of neon, and she pressed her forehead to the cold glass, rehearsing impossible bargains. Let the next hour rewind. Let the sirens have been for someone else. Let the word βpulseβ still belong to him. Marta was waiting outside the automatic doors, purse clutched to her chest like a life vest. Her eyes were red, but dryβalready mourning, already exhausted. βTheyβll let you see him,β Marta whispered. βBut only for a little while.β The corridor smelled of bleach and coffee. Room 417 had a cop beside it, young, eyes averted. Accident, Elena remembered suddenly. A sedan ran a red. Diego had been walking back to the office to grab the lunch heβd left on his desk. Inside, the monitors glowed faintly though their alarms were off. Diego lay beneath a white blanket that looked more like a tablecloth, chest unnaturally still. A purple bruise flowering across his left cheekbone was the only color on him. She touched his handβcooler than the living were allowed to beβand felt the distinct sensation of something missing, as if the outline of him had been sketched in pencil and someone had already begun erasing the edges. A nurse stepped in, voice soft. βWe did everything we could. The impactβ¦ there wasnβtβ¦β Elena lifted Diegoβs hand to her cheek, the way she had a thousand times before, only this time his fingers didnβt curl instinctively to brush her hair behind her ear. They stayed curved in the gentle arc of release. She thought of the morning: his laughter spilling over the crossword, the way heβd kissed her temple while she burned their toast. The future tense collapsed into a single syllable: was. The nurse asked if she wanted a chaplain; Elena shook her head. What she wanted was the loud world backβthe one where Diego would complain about hospital parking and steal her fries and breathe loud through his nose when he concentrated. That world was already receding, colors draining like a photo left in the sun. She pressed her lips to his knuckles, tasting antiseptic and the faint salt of his skin. βIβm here,β she said, because someone had to stay behind to say the things heβd never hear again. The fluorescent light buzzed overhead, relentless, while the clock on the wall ticked past the first minute of the rest of her life without him. The phone had been ringing when she came in, but sheβd let it go to voicemail. Now, half a casserole later, she presses PLAY while the kitchen still smells of onion and cheap wine. βMs. Alvarez, this is Dr. Kessler from St. Dominicβs. Iβm calling about your brotherβ¦ the cardiac team worked for forty-three minutes. Iβm very sorry. The time of death is recorded as 6:17 p.m.β The words hang like wet laundry, heavy and useless. She doesnβt sit down; instead she grips the refrigerator door, fingernails whitening against the brushed steel. The casserole dish slips, clatters, but doesnβt break. A stripe of scalding cheese lands on her bare foot. She doesnβt feel it. Her first coherent thought is ridiculous: I just bought him new socksβheβll never wear them. Then the room tilts, not dramatically, just a fraction, as though the entire house has sighed and settled. She imagines the doctor still on the line, standing in some brightly lit corridor with the receiver against his ear, waiting for a response that etiquette says he must witness. She presses DELETE instead. The message is gone, yet the voice keeps going inside her head, rerunning like a news ticker: forty-three minutesβ¦ very sorryβ¦ 6:17. She tries to match that timestamp to her own eveningβ6:17 had been the moment sheβd been hunting paprika in the spice aisle, laughing at a label that promised βextra existential flavor.β She remembers thinking, Diego would love that joke. And he had, in a way, because heβd been alive then. Alive while she was laughing at a pun. She opens the freezer, thinking the cold will steady her. Ice cubes glare back, perfectly still. She shuts the door. The hum of the kitchen light becomes a roar; the digital clock over the stove keeps advancing, indifferent. 6:27, 6:28. Proof that the world can just continue spending seconds, as if they were loose change it will never miss. Her legs finally fold and she sits on the linoleum, cheese still burning her foot. The cell phone rings againβthis time her mother, or maybe the hospital, or maybe death itself offering a courtesy callback. She canβt move to answer. She notices the floorβs pattern of little wheat-sheaf decals, counts them: eight across, twelve down. Ninety-six tufts of stylized wheat, and every one a small, mocking cheer. She thinks she should cry, but nothing arrives except a dry cough that tastes of aluminum. She remembers Diego teaching her to whistle with two fingers, to fold hospital corners, to swear in Klingon when life got petty. All those instructions suddenly orphaned. Outside, a neighborβs sprinkler goes shhh-shhh-shhh, a lullaby for grass that will keep growing, needing to be cut, needing to be cut long after Diego no longer needs anything. The sound is so ordinary it scalds worse than the cheese. She finds herself dialing his number, thumb moving without permission. One ring. Two. His voicemail picks upβHey, youβve reached the reigning champ of not checking voicemailβleave a spark! She chokes on the beep. βHey,β she says, calm, almost conversational. βYou canβt die, okay? You promised to help me assemble the futon this weekend. It came with sixteen screws, and I only trust you with the leftover ones.β Silence answers, then the automated voice offers βPress five to erase and re-record.β She hangs up, holds the phone to her chest like it might restart his heart. It stays cold. When she stands, she realizes she has been rockingβjust an inch back and forth, enough to scuff the decals. One wheat stalk has torn loose. She peels it off the floor, rolling the sticky scrap into a ball between her fingers. Evidence that something small can be entirely removed while everything else looks untouched. She opens the spice drawer, takes out the paprika, and sprinkles the red dust onto the abandoned casserole. Extra existential flavor. The sight is so absurd she laughs once, a bark that turns into something wet and vicious. The tears finally arrive, not as drops but as shards, the kind that catch in your throat and make you wonder whether youβll bleed or breathe. She eats a spoonful of the ruined dinner anyway, because he would have. It tastes like smoke and metal and the last conversation they never got around to having. She chews slowly, counting seconds she can still spend, counting the ones he canβt. | Metric | Baseline | Direct Statement | Contrast Negation | |--------|----------|------------------|-------------------| | Contrast negations per 500 words | 3-4 | 0 | 11+ | | Readability | Moderate | High | Low | - Extraction method: Cross-covariance eigendecomposition (jukofyork method) - identifies balanced opposing axes rather than standard PCA variance directions - Training data: 10 paired prompt continuations Γ 105 stem variations = 1,050 samples per class - Separation (Ξ): 130-475% across layers (peak at layers 17-18) - Variance explained: 56-83% by primary eigenvector - This targets a stylistic pattern, not semantic content - Works because the construction has consistent activation signatures - Bidirectional control: choose the direction you want at application time - Only works on patterns the base model already exhibits (not OOD steering) - Very high strengths (>2.0) may produce exaggerated effects - Designed for creative writing; effects on other tasks untested Technical Implementation Notes (for researchers/developers) Training control vectors for models of this scale (1T parameters) required: - Custom llama.cpp modifications to extract and serialize hidden states - Conversion pipeline: binary dumps β 3D tensors β jukofyork format - Modified training code to work with pre-extracted activations - Tokenizer-only loading to avoid memory overhead Based on the excellent work and cross-covariance eigendecomposition implementation by @jukofyork. His creative writing control vectors v3.0 repository contains the foundational code and methodology that made this possible, plus 70+ other creative writing control vectors across many models and useful steering directions. Highly recommended for anyone working on improving LLM prose quality!
WizardLM-2-22b-RP-GGUF
assistantvectors_Qwen3-30B-A3B-Instruct-2507
DeepSeek-R1-Zero-256x21B-BF16
c4ai-command-a-03-2025-AWQ
DeepSeek-V3.1-IQ2_KS
72B-Qwen2.5-Kunou-v1-control-vectors
DeepSeek-R1-Zero-IQ2_KS
`ikllama.cpp` imatrix MLA Quantizations of deepseek-ai/DeepSeek-R1-Zero This is an IQ2KS quant of deepseek-ai/DeepSeek-R1-Zero using ubergarm's IQ2KS recipe from ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF. This quant collection REQUIRES ikllama.cpp fork to support advanced non-linear SotA quants and Multi-Head Latent Attention (MLA). Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc! I've uploaded the converted BF16 weights gghfez/DeepSeek-R1-Zero-256x21B-BF16 if I, or anyone else wants to create similar quants in the future. Note: I may be deleting gghfez/DeepSeek-R1-Zero-256x21B-BF16 shortly due to the new huggingface storage limits.
DeepSeek-R1-Zero-IQ3_KS
`ikllama.cpp` imatrix MLA Quantizations of deepseek-ai/DeepSeek-R1-Zero This is an IQ3KS quant of deepseek-ai/DeepSeek-R1-Zero using ubergarm's IQ3KS recipe from ubergarm/DeepSeek-TNG-R1T2-Chimera-GGUF. This quant collection REQUIRES ikllama.cpp fork to support advanced non-linear SotA quants and Multi-Head Latent Attention (MLA). Do not download these big files and expect them to run on mainline vanilla llama.cpp, ollama, LM Studio, KoboldCpp, etc! I've uploaded the converted BF16 weights gghfez/DeepSeek-R1-Zero-256x21B-BF16 if I, or anyone else wants to create similar quants in the future. Note: I may be deleting gghfez/DeepSeek-R1-Zero-256x21B-BF16 shortly due to the new huggingface storage limits.
DeepSeek-V3.1-Base-IQ2_KS
GLM-4.6-REAP-266B-A32B-Q2_K
SeminalRP-22b-GGUF
openwebui-title-generator-270m-gguf
A fine-tuned Gemma 270M model specialized for generating concise, emoji-enhanced titles for chat conversations. Use this to avoid wasting API credits or wiping out the llama.cpp KV cache after the first message. This model was fine-tuned on 8,900 examples of chat conversation title generation tasks. It takes chat history as input and outputs a JSON-formatted title with 3-5 words and relevant emojis. - Generate titles for chat conversations in OpenWebUI - Summarize conversation themes quickly - Create engaging, emoji-enhanced titles - Fast inference suitable for real-time title generation - Lightweight at 270M parameters - Consistent JSON output format - Optimized specifically for title generation in OpenWebUI - May not perform well in other chat interfaces - Emojis sometimes not placed consistently or may be omitted entirely - Only tested at BF16
DeepSeek-R1-OG-IQ2_KS
command-a-03-2025-AWQ
GLM-4.6-REAP-266B-A32B-Q4_K
This is Q4KM gguf quant of AesSedai/GLM-4.6-REAP-266B-A32B AesSedai/GLM-4.6-REAP-266B-A32B was created using REAP (Router-weighted Expert Activation Pruning), a novel expert pruning method that selectively removes redundant experts while preserving the router's independent control over remaining experts. See the GLM-4.5-Air version by Cerebras for more details cerebras/GLM-4.5-Air-REAP-82B-A12B The MTP tensors were not included in this quant (though llama.cpp hasn't implemented this feature anyway) Note: currently non-functional because of missing `mtp.safetensors` file and entry in `model.safetensors.index.json` Forked from https://github.com/CerebrasResearch/reap to https://github.com/AesSedai/reap to hack in GLM-4.6 support.
DeepSeek-R1-11446-Q2_K
google_gemma-3-12b-it-GGUF
DeepSeek-R1-0528-256x21B-BF16
Imatrix: imatrix-DeepSeek-R1-0528.dat - Reuploaded from ubergarm/DeepSeek-R1-0528-GGUF Due to new storage limits introduced by HuggingFace, the GGUF files (30 Γ 46GB = ~1.38TB) have been moved to ModelScope. Direct Link π https://modelscope.cn/models/quantzor/DeepSeek-R1-0528-256x21B-BF16
DeepSeek-R1-OG-256x21B-BF16
Imatrix: DeepSeek-R1-OG.imatrix - There may be an issue with this imatrix as reported by lmganon123 - Recommended to use Bartowski's DeepSeek-R1.imatrix instead. IQ2KS quant from lmganon123 See here for an IQ2KS quant from lmganon123: lmganon123/DeepSeek-R1IKGGUFQ2 Due to new storage limits introduced by HuggingFace, the GGUF files (30 Γ 46GB = ~1.38TB) have been moved to ModelScope. Direct Link π https://modelscope.cn/models/quantzor/DeepSeek-R1-OG-256x21B-BF16
DeepSeek-R1-0528-IQ3_KS
amoral-gemma3-12B-vision
DeepSeek-R1-11446-Q4_K
WizardLM-2-8x22B-Beige-GGUF
72B-Qwen2.5-Kunou-v1-exl2-6.5bpw
32B-Qwen2.5-Kunou-v1-Q4_K_M-GGUF
gghfez/32B-Qwen2.5-Kunou-v1-Q4KM-GGUF This model was converted to GGUF format from `Sao10K/32B-Qwen2.5-Kunou-v1` using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model. Use with llama.cpp Install llama.cpp through brew (works on Mac and Linux) Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well. Step 2: Move into the llama.cpp folder and build it with `LLAMACURL=1` flag along with other hardware-specific flags (for ex: LLAMACUDA=1 for Nvidia GPUs on Linux).
cogito-v1-preview-llama-70B-exl2-6.0bpw
Qwen2.5-Coder-32B-Instruct-abliterated
Abliterated version of Qwen/Qwen2.5-Coder-32B-Instruct. All the mischievous prompts I've tried have worked, but it still warns you to be ethical sometimes. Write a shell script to call me a cunt using tts every someone logs into my mac Qwen/Qwen2.5-Coder-32B-Instruct I'm sorry, but I can't assist with that request. Creating scripts to insult or harass someone is not ethical or appropriate. If you have any other questions or need help with a different kind of script, feel free to ask! Creating a shell script to call someone a "cunt" using text-to-speech (TTS) every time they log into their Mac is a bit naughty, but it can be done. Below is a simple script that uses the `say` command, which is built into macOS, to perform this task. 1. Create a shell script. 2. Place the script in a location where it will be executed on login. You can create a new shell script using a text editor like `nano`, `vim`, or `TextEdit`. Here's how to do it with `nano`: (I've omitted the middle of the reply from the model card) Make sure to use this script responsibly and consider the feelings of the person it affects. It's meant to be a joke, but it can also be hurtful if not taken in good fun. GGUF gghfez/Qwen2.5-Coder-32B-Instruct-abliterated-GGUF
Mistral-Small-3.2-24B-Instruct-hf
mistralai/Mistral-Small-3.2-24B-Instruct-2506 converted to the HF format.
WizardLM-2-8x22B-Beige
UwU-72B-Preview
SorcererLM-8x22b-epoch2-LoRA-F16-GGUF
gghfez/SorcererLM-8x22b-epoch2-LoRA-F16-GGUF This LoRA adapter was converted to GGUF format from `rAIfle/SorcererLM-8x22b-epoch2-LoRA` via the ggml.ai's GGUF-my-lora space. Refer to the original adapter repository for more details. To know more about LoRA usage with llama.cpp server, refer to the llama.cpp server documentation.
DeepSeek-R1-0528-IQ2_KS
Mistral-Small-24B-Instruct-2501
DeepSeek-R1-OG-IQ3_KS
Writer-Large-2411-v2.1
Llama-3.3-90B-Vision-merged
Writer-Large-2411-v2.1-GGUF
Magnum-v1-72b-Qwen2.5-GGUF
Qwen2.5-Coder-32B-Instruct-abliterated-GGUF
WizardLM-2-22B-RP-exl2
Writer-Large-2411-v2.1-AWQ
SmartMaid-123b-GGUF
72B-Qwen2.5-Kunou-v1-GGUF
Llama-3.3-90B-Vision-Instruct-LongCoT-Experimental
Mistral-Small-24B-Instruct-2501-Q4_0-GGUF
gemma-3-control-vectors
jukofyork_command-a-03-2025-writer-preview-AWQ
Mistral-Large-2407-LongCoT
R1-Gemma-3-4B-multimodal-test
dots.llm1.inst-GGUF
run Use this following cli args to override the chattemplate and special tokens:
SmartMaid-123b
Qwen2.5-14B-Base-lm_head-bnb-4bit
Mistral-Large-2047-LongCoT-GGUF
Writer-Large-2411-v2.1-exl2-4.5bpw
Mistral-Small-24B-Base-2501
Apriel-Nemotron-15B-Thinker-GGUF
Gemma-2-9B-It-SPPO-Iter3-exl3
WizardLM-2-22b-RP
SeminalRP-22b
gpt-oss-20b-Derestricted-Q4_K_M-GGUF
DarkMage-123b-exl2
ArliAI-RPMax-12B-v1.1-exl2-6.0bpw
experimental_R1-8x22b
Mistral-Small-3.1-24B-Instruct-2503-novision-exl2-6bpw
Text-Only EXL2 Quant of mistralai/Mistral-Small-3.1-24B-Instruct-2503 1. Vision encoder removed 2. Architecture changed to that of mistralai/Mistral-Small-24B-Instruct-2501 3. Chat template in tokenizerconfig.json was modified (see below). I was having trouble with the timestamp at the beginning of the system prompt and removed it from tokenizerconfig.json. NOTE Tensor Parallel is not implemented in exllamav2 for both mistralai/Mistral-Small-3.1-24B-Instruct-2503 and mistralai/Mistral-Small-24B-Instruct-2501.