hotchpotch

36 models • 2 total models in database
Sort by:

japanese-reranker-xsmall-v2

| モデル名 | レイヤー数 | 隠れ層サイズ | スコア(avg) | 速度(GPU) | | --- | --- | --- | --- | --- | | hotchpotch/japanese-reranker-tiny-v2 | 3 | 256 | 0.8138 | 2.1s | | hotchpotch/japanese-reranker-xsmall-v2 | 10 | 256 | 0.8699 | 6.5s | | hotchpotch/japanese-reranker-small-v2 | 13 | 384 | 0.8856 | 15.2s | | hotchpotch/japanese-reranker-base-v2 | 19 | 512 | 0.8930 | 32.5s | | hotchpotch/japanese-reranker-cross-encoder-xsmall-v1 | 6 | 384 | 0.8131 | 20.5s | | hotchpotch/japanese-reranker-cross-encoder-small-v1 | 12 | 384 | 0.8254 | 40.3s | | hotchpotch/japanese-reranker-cross-encoder-base-v1 | 12 | 768 | 0.8484 | 96.8s | | hotchpotch/japanese-reranker-cross-encoder-large-v1 | 24 | 1024 | 0.8661 | 312.2s | | hotchpotch/japanese-bge-reranker-v2-m3-v1 | 24 | 1024 | 0.8584 | 310.6s | - とても小さく速く実用的な日本語リランカー japanese-reranker-tiny,xsmall v2 を公開 - 日本語最高性能のRerankerをリリース / そもそも Reranker とは? - 日本語 Reranker 作成のテクニカルレポート GPU が Flash Attention 2 をサポートしている場合、flash-attn ライブラリを入れることで、高速な推論が可能です。 CPU 環境や arm 環境などで、より高速に動かしたい場合は onnx や量子化モデルを利用できます。 japanese-reranker-tiny-v2、japanese-reranker-xsmall-v2、japanese-reranker-small-v2、japanese-reranker-base-v2は、以下の特徴を持つ小型リランカーモデルです: 1. CPUやAppleシリコン環境でも実用的な速度で動作 2. 高価なGPUリソースなしでもRAGシステムの精度向上が可能 3. エッジデバイスでの展開や低レイテンシが要求される本番環境で活用可能 4. ModernBertベースのruri-v3-pt-30m(tiny/xsmall)、ruri-v3-pt-70m(small)、ruri-v3-pt-130m(base)を利用 | モデル名 | avg | JQaRA | JaCWIR | MIRACL | JSQuAD | |---------|-----|-------|--------|--------|--------| | japanese-reranker-tiny-v2 | 0.8138 | 0.6455 | 0.9287 | 0.7201 | 0.9608 | | japanese-reranker-xsmall-v2 | 0.8699 | 0.7403 | 0.9409 | 0.8206 | 0.9776 | | japanese-reranker-small-v2 | 0.8856 | 0.7633 | 0.9586 | 0.8385 | 0.9821 | | japanese-reranker-base-v2 | 0.8930 | 0.7845 | 0.9603 | 0.8425 | 0.9845 | | japanese-reranker-cross-encoder-xsmall-v1 | 0.8131 | 0.6136 | 0.9376 | 0.7411 | 0.9602 | | japanese-reranker-cross-encoder-small-v1 | 0.8254 | 0.6247 | 0.9390 | 0.7776 | 0.9604 | | japanese-reranker-cross-encoder-base-v1 | 0.8484 | 0.6711 | 0.9337 | 0.8180 | 0.9708 | | japanese-reranker-cross-encoder-large-v1 | 0.8661 | 0.7099 | 0.9364 | 0.8406 | 0.9773 | | japanese-bge-reranker-v2-m3-v1 | 0.8584 | 0.6918 | 0.9372 | 0.8423 | 0.9624 | | bge-reranker-v2-m3 | 0.8512 | 0.6730 | 0.9343 | 0.8374 | 0.9599 | | ruri-v3-reranker-310m | 0.9171 | 0.8688 | 0.9506 | 0.8670 | 0.9820 | 以下は約15万ペアをリランキングした際の推論速度結果(トークナイズ時間を除く純粋なモデル推論時間)です。MPS(Appleシリコン)とCPU測定にはM4 Max、GPUにはRTX5090を使用しています。GPU処理では flash-attention2 を使用しています。 | モデル名 | レイヤー数 | 隠れ層サイズ | 速度(GPU) | 速度(MPS) | 速度(CPU) | |---------|------------|-------------|-----------|-----------|-----------| | japanese-reranker-tiny-v2 | 3 | 256 | 2.1s | 82s | 702s | | japanese-reranker-xsmall-v2 | 10 | 256 | 6.5s | 303s | 2300s | | japanese-reranker-small-v2 | 13 | 384 | 15.2s | | | | japanese-reranker-base-v2 | 19 | 512 | 32.5s | | | | japanese-reranker-cross-encoder-xsmall-v1 | 6 | 384 | 20.5s | | | | japanese-reranker-cross-encoder-small-v1 | 12 | 384 | 40.3s | | | | japanese-reranker-cross-encoder-base-v1 | 12 | 768 | 96.8s | | | | japanese-reranker-cross-encoder-large-v1 | 24 | 1024 | 312.2s | | | | japanese-bge-reranker-v2-m3-v1 | 24 | 1024 | 310.6s | | | | bge-reranker-v2-m3 | 24 | 1024 | 310.7s | | | | ruri-v3-reranker-310m | 25 | 768 | 81.4s | | |

license:mit
48,513
4

japanese-reranker-cross-encoder-small-v1

license:mit
9,892
3

japanese-bge-reranker-v2-m3-v1

| モデル名 | layers | hiddensize | | ----------------------------------------------------------------------------------------------------------------------------------- | ------ | ----------- | | hotchpotch/japanese-reranker-cross-encoder-xsmall-v1 | 6 | 384 | | hotchpotch/japanese-reranker-cross-encoder-small-v1 | 12 | 384 | | hotchpotch/japanese-reranker-cross-encoder-base-v1 | 12 | 768 | | hotchpotch/japanese-reranker-cross-encoder-large-v1 | 24 | 1024 | | hotchpotch/japanese-bge-reranker-v2-m3-v1 | 24 | 1024 | - 日本語最高性能のRerankerをリリース / そもそも Reranker とは? - 日本語 Reranker 作成のテクニカルレポート | Model Name | JQaRA | JaCWIR | MIRACL | JSQuAD | | ------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------- | --------------------------------------------- | | japanese-reranker-cross-encoder-xsmall-v1 | 0.6136 | 0.9376 | 0.7411 | 0.9602 | | japanese-reranker-cross-encoder-small-v1 | 0.6247 | 0.939 | 0.7776 | 0.9604 | | japanese-reranker-cross-encoder-base-v1 | 0.6711 | 0.9337 | 0.818 | 0.9708 | | japanese-reranker-cross-encoder-large-v1 | 0.7099 | 0.9364 | 0.8406 | 0.9773 | | japanese-bge-reranker-v2-m3-v1 | 0.6918 | 0.9372 | 0.8423 | 0.9624 | | bge-reranker-v2-m3 | 0.673 | 0.9343 | 0.8374 | 0.9599 | | bge-reranker-large | 0.4718 | 0.7332 | 0.7666 | 0.7081 | | bge-reranker-base | 0.2445 | 0.4905 | 0.6792 | 0.5757 | | cross-encoder-mmarco-mMiniLMv2-L12-H384-v1 | 0.5588 | 0.9211 | 0.7158 | 0.932 | | shioriha-large-reranker | 0.5775 | 0.8458 | 0.8084 | 0.9262 | | bge-m3+all | 0.576 | 0.904 | 0.7926 | 0.9226 | | bge-m3+dense | 0.539 | 0.8642 | 0.7753 | 0.8815 | | bge-m3+colbert | 0.5656 | 0.9064 | 0.7902 | 0.9297 | | bge-m3+sparse | 0.5088 | 0.8944 | 0.6941 | 0.9184 | | JaColBERTv2 | 0.5847 | 0.9185 | 0.6861 | 0.9247 | | multilingual-e5-large | 0.554 | 0.8759 | 0.7722 | 0.8892 | | multilingual-e5-small | 0.4917 | 0.869 | 0.7025 | 0.8565 | | bm25 | 0.458 | 0.8408 | 0.4387 | 0.9002 |

license:mit
7,999
15

japanese-splade-v2

NaNK
license:mit
7,709
16

japanese-reranker-cross-encoder-large-v1

license:mit
4,832
16

japanese-reranker-cross-encoder-xsmall-v1

hotchpotch/japanese-reranker-cross-encoder-xsmall-v1 | モデル名 | layers | hiddensize | | ----------------------------------------------------------------------------------------------------------------------------------- | ------ | ----------- | | hotchpotch/japanese-reranker-cross-encoder-xsmall-v1 | 6 | 384 | | hotchpotch/japanese-reranker-cross-encoder-small-v1 | 12 | 384 | | hotchpotch/japanese-reranker-cross-encoder-base-v1 | 12 | 768 | | hotchpotch/japanese-reranker-cross-encoder-large-v1 | 24 | 1024 | | hotchpotch/japanese-bge-reranker-v2-m3-v1 | 24 | 1024 | - 日本語最高性能のRerankerをリリース / そもそも Reranker とは? - 日本語 Reranker 作成のテクニカルレポート | Model Name | JQaRA | JaCWIR | MIRACL | JSQuAD | | ------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------- | --------------------------------------------- | | japanese-reranker-cross-encoder-xsmall-v1 | 0.6136 | 0.9376 | 0.7411 | 0.9602 | | japanese-reranker-cross-encoder-small-v1 | 0.6247 | 0.939 | 0.7776 | 0.9604 | | japanese-reranker-cross-encoder-base-v1 | 0.6711 | 0.9337 | 0.818 | 0.9708 | | japanese-reranker-cross-encoder-large-v1 | 0.7099 | 0.9364 | 0.8406 | 0.9773 | | japanese-bge-reranker-v2-m3-v1 | 0.6918 | 0.9372 | 0.8423 | 0.9624 | | bge-reranker-v2-m3 | 0.673 | 0.9343 | 0.8374 | 0.9599 | | bge-reranker-large | 0.4718 | 0.7332 | 0.7666 | 0.7081 | | bge-reranker-base | 0.2445 | 0.4905 | 0.6792 | 0.5757 | | cross-encoder-mmarco-mMiniLMv2-L12-H384-v1 | 0.5588 | 0.9211 | 0.7158 | 0.932 | | shioriha-large-reranker | 0.5775 | 0.8458 | 0.8084 | 0.9262 | | bge-m3+all | 0.576 | 0.904 | 0.7926 | 0.9226 | | bge-m3+dense | 0.539 | 0.8642 | 0.7753 | 0.8815 | | bge-m3+colbert | 0.5656 | 0.9064 | 0.7902 | 0.9297 | | bge-m3+sparse | 0.5088 | 0.8944 | 0.6941 | 0.9184 | | JaColBERTv2 | 0.5847 | 0.9185 | 0.6861 | 0.9247 | | multilingual-e5-large | 0.554 | 0.8759 | 0.7722 | 0.8892 | | multilingual-e5-small | 0.4917 | 0.869 | 0.7025 | 0.8565 | | bm25 | 0.458 | 0.8408 | 0.4387 | 0.9002 |

license:mit
3,835
7

japanese-reranker-tiny-v2

license:mit
1,470
6

japanese-reranker-base-v2

license:mit
1,092
5

open-provence-reranker-xsmall-v1

✂️ OpenProvence: An Open-Source Implementation of Efficient and Robust Context Pruning for Retrieval-Augmented Generation > ⚡️ Lightweight Provence-style rerankers that keep the answers and drop the noise for retrieval-augmented generation. OpenProvence follows the Provence approach to simultaneously prune irrelevant passages and produce a reranking score for question-answering workflows. Modern agents—DeepResearch loops, autonomous search pipelines, context engineering systems—tend to accumulate tangential paragraphs that inflate LLM token budgets. Drop an OpenProvence checkpoint in front of your LLM to extract only the passages that matter. We provide open weights along with MIT-licensed training, inference, and dataset-construction tooling for reproducible workflows on commodity hardware. - GitHub OpenProvence - 🤗 Spaces WebUI Inference DEMO - Pruning power – Drop ~99% of off-topic sentences while still compressing 80–90% of relevant text; MLDR evaluations confirm the answers stay intact. - Ship-ready checkpoints – Four bilingual models (30M–310M parameters) on Hugging Face under MIT; the 30M xsmall runs comfortably on CPU and screams on GPU. - Reproducible training – Follow the training guide to train every checkpoint on a single ≥16 GB NVIDIA GPU. - Dataset tooling – Build OpenProvence-format corpora from your own data with the dataset creation guide. - Evaluation utilities – CLI runners for dataset retention sweeps and MLDR long-document benchmarks keep regression tracking straightforward. - Documentation-first – End-to-end reports, guides, and configs cover training, evaluation, and dataset creation. - Teacher model – A multilingual span annotator, query-context-pruner-multilingual-Qwen3-4B, powers custom label pipelines. Pick the checkpoint that matches your latency and language targets. All checkpoints are hosted on Hugging Face with permissive licensing. | Model | Language | Hugging Face ID | Parameters | Notes | |-------|----------|-----------------|------------|-------| | base | English & Japanese | hotchpotch/open-provence-reranker-v1 | 130M | Balanced accuracy vs. speed for bilingual workloads | | xsmall | English & Japanese | hotchpotch/open-provence-reranker-xsmall-v1 | 30M | Fastest option; practical even without a GPU | | large | English & Japanese | hotchpotch/open-provence-reranker-large-v1 | 310M | Highest compression at comparable F2 scores | | en-gte | English | hotchpotch/open-provence-reranker-v1-gte-modernbert-base | 149M | English-only checkpoint with top reranking fidelity | For CUDA hosts, you can optionally install flash-attention for faster inference. OpenProvence checkpoints expose a single `process` method that accepts raw question/context strings, applies sentence-level pruning, and returns the reranking score alongside compression metrics. `process()` accepts either a single query/context pair or batched inputs. Use `question: str` with `context: str` for one document, `question: str` with `context: List[str]` to prune multiple documents for the same query, or `question: List[str]` and `context: List[str]` to batch independent pairs. To feed pre-segmented sentences, pass `context: List[List[str]]`; each inner list is treated as already split and the built-in splitter is skipped. > ⚠️ Shape matters: A lone string paired with a list of contexts is interpreted as one query with many documents. Make sure `question` and `context` have matching shapes when batching to avoid truncated or duplicated outputs. - `question: str | Sequence[str]` – Query text. Provide a list to batch multiple questions; each item pairs with the corresponding entry in `context`. - `context: str | Sequence[str] | Sequence[Sequence[str]]` – Contexts aligned to the query. Use a list for one document per query, or a list of lists to supply multiple documents (or pre-split sentences) for each query. - `title: str | Sequence[str] | Sequence[Sequence[str]] | None` – Optional titles aligned to each context. The default sentinel `"firstsentence"` marks the opening sentence so you can keep it by pairing with `alwaysselecttitle=True` or `firstlineastitle=True`; without those flags it is scored like any other sentence. Set `None` to disable all title handling. - `threshold: float` (default `0.1`) – Pruning probability cutoff. Larger values discard more sentences; `0.05–0.5` works well across datasets. - `batchsize: int` (default `32`) – Number of contexts processed per inference batch. Increase for throughput, decrease if you run out of memory. - `language: str | None` – Built-in splitter selection (`"auto"`, `"ja"`, `"en"`). The default behaves like `"auto"` and detects Japanese vs. English automatically. - `reorder: bool` and `topk: int | None` – When `reorder=True`, contexts are sorted by reranker score. Combine with `topk` to keep only the top-ranked documents. - `firstlineastitle: bool` / `alwaysselecttitle: bool` – Extract the first non-empty line as a title and optionally guarantee the title sentence survives pruning. - `returnsentencemetrics: bool` / `returnsentencetexts: bool` – Include per-sentence probabilities and kept/removed sentence lists in the output for analysis workflows. Detailed metrics live in the OpenProvence v1 Evaluation Report. Highlights below show MLDR at each model's best Has Answer threshold, plus cross-dataset means at the standard 0.10 threshold: MLDR with LLM eval · English (best Has Answer per model) Threshold (p) lists the pruning probability that produced the highest Has Answer score for each checkpoint. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 93.68 | 0.00% | 0.00% | | en-gte | 149M | 0.10 | 94.25 | 92.33% | 99.91% | | xsmall | 30M | 0.05 | 93.68 | 82.18% | 99.18% | | base | 130M | 0.05 | 93.68 | 90.05% | 99.62% | | large | 310M | 0.10 | 93.10 | 94.38% | 99.90% | | naver-provence | 305M | 0.10 | 93.10 | 94.00% | 99.50% | Highlights: en-gte surpasses the original (no compression) baseline in Has Answer score while achieving over 92% compression on positive samples. The large model (310M) achieves performance comparable to the naver/provence baseline (305M) with similar parameter counts. Smaller models (xsmall and base) match the original baseline's accuracy with substantial compression benefits, though with slightly lower compression rates compared to the larger models. MLDR with LLM eval · Japanese (best Has Answer per model) Threshold (p) again shows the probability cutoff that maximised Has Answer for each model. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 77.71 | 0.00% | 0.00% | | xsmall | 30M | 0.05 | 81.93 | 76.46% | 96.11% | | base | 130M | 0.05 | 83.13 | 80.98% | 97.89% | | large | 310M | 0.10 | 79.52 | 87.89% | 98.82% | Highlights: All models significantly outperform the original baseline on Japanese MLDR. base delivers the top Has Answer score (+5.42 points over original) while retaining strong compression. Even large, which prioritizes maximum compression (nearly 88% positive, 99% negative), exceeds the original baseline by 1.81 points. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | en-gte | 0.734 | 39.9% | 0.55 | | xsmall | 0.696 | 33.8% | 0.34 | | base | 0.737 | 39.9% | 0.69 | | large | 0.749 | 41.7% | 1.04 | Highlights: en-gte is the top English reranker at this threshold, while large gives the best compression with a modest latency bump. xsmall remains the latency leader. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | xsmall | 0.727 | 53.2% | 0.32 | | base | 0.768 | 57.4% | 1.06 | | large | 0.783 | 59.1% | 1.69 | Highlights: base and large deliver the strongest F2 on Japanese corpora, with large leading on compression. xsmall stays nimble for CPU-centric deployments. OpenProvence v1 checkpoints are distilled from multilingual QA corpora that were re-labeled with the Qwen3-4B teacher. English coverage spans hotchpotch/msmarco-context-relevance, hotchpotch/gooaq-context-relevance-130k, and hotchpotch/natural-questions-context-relevance. Japanese coverage comes from hotchpotch/japanese-context-relevance, which includes MS MARCO JA and native QA sources. All datasets expose sentence-span keep/drop labels plus teacher reranker scores, so you can reproduce or extend the mixture for your own domains. This model family was trained with the open-source OpenProvence stack and is reproducible on a single ≥16 GB NVIDIA GPU. 1. Teacher Label Generation (DeepSeek-V3) Use DeepSeek-V3 to annotate question/context relevance, producing the multilingual 140k-sample dataset qa-context-relevance-multilingual-140k. 2. Teacher Context-Relevance SFT (Qwen3-4B) Fine-tune Qwen3-4B to build the multilingual teacher query-context-pruner-multilingual-Qwen3-4B, enabling fast, consistent span-level annotations. 3. Context-Relevance Dataset Construction Generate sentence-span labels and teacher scores from the following corpora: English: - hotchpotch/msmarco-context-relevance - hotchpotch/gooaq-context-relevance-130k - hotchpotch/natural-questions-context-relevance Japanese: - hotchpotch/japanese-context-relevance Deduplicate near-identical negatives and follow the dataset creation guide for preprocessing tips. 4. Final Model Training Distill existing reranker scores into a unified model that combines the cross-encoder reranker head with the context-pruning head. Reference docs/train.md for configuration details and baseline commands. - Provence: efficient and robust context pruning for retrieval-augmented generation inspired the overall approach. Huge thanks to the Naver Labs Europe authors for releasing both the paper and the naver/provence-reranker-debertav3-v1 checkpoint that validated how powerful Provence-style pruning can be. - Sentence Transformers provided invaluable reference implementations for cross-encoder training that informed our pipelines. If you use OpenProvence in your research, please cite it:

license:mit
760
0

japanese-reranker-cross-encoder-base-v1

license:mit
641
1

mMiniLMv2-L6-H384

license:mit
334
1

japanese-reranker-small-v2

license:mit
302
2

open-provence-reranker-v1

✂️ OpenProvence: An Open-Source Implementation of Efficient and Robust Context Pruning for Retrieval-Augmented Generation > ⚡️ Lightweight Provence-style rerankers that keep the answers and drop the noise for retrieval-augmented generation. OpenProvence follows the Provence approach to simultaneously prune irrelevant passages and produce a reranking score for question-answering workflows. Modern agents—DeepResearch loops, autonomous search pipelines, context engineering systems—tend to accumulate tangential paragraphs that inflate LLM token budgets. Drop an OpenProvence checkpoint in front of your LLM to extract only the passages that matter. We provide open weights along with MIT-licensed training, inference, and dataset-construction tooling for reproducible workflows on commodity hardware. - GitHub OpenProvence - 🤗 Spaces WebUI Inference DEMO - Pruning power – Drop ~99% of off-topic sentences while still compressing 80–90% of relevant text; MLDR evaluations confirm the answers stay intact. - Ship-ready checkpoints – Four bilingual models (30M–310M parameters) on Hugging Face under MIT; the 30M xsmall runs comfortably on CPU and screams on GPU. - Reproducible training – Follow the training guide to train every checkpoint on a single ≥16 GB NVIDIA GPU. - Dataset tooling – Build OpenProvence-format corpora from your own data with the dataset creation guide. - Evaluation utilities – CLI runners for dataset retention sweeps and MLDR long-document benchmarks keep regression tracking straightforward. - Documentation-first – End-to-end reports, guides, and configs cover training, evaluation, and dataset creation. - Teacher model – A multilingual span annotator, query-context-pruner-multilingual-Qwen3-4B, powers custom label pipelines. Pick the checkpoint that matches your latency and language targets. All checkpoints are hosted on Hugging Face with permissive licensing. | Model | Language | Hugging Face ID | Parameters | Notes | |-------|----------|-----------------|------------|-------| | base | English & Japanese | hotchpotch/open-provence-reranker-v1 | 130M | Balanced accuracy vs. speed for bilingual workloads | | xsmall | English & Japanese | hotchpotch/open-provence-reranker-xsmall-v1 | 30M | Fastest option; practical even without a GPU | | large | English & Japanese | hotchpotch/open-provence-reranker-large-v1 | 310M | Highest compression at comparable F2 scores | | en-gte | English | hotchpotch/open-provence-reranker-v1-gte-modernbert-base | 149M | English-only checkpoint with top reranking fidelity | For CUDA hosts, you can optionally install flash-attention for faster inference. OpenProvence checkpoints expose a single `process` method that accepts raw question/context strings, applies sentence-level pruning, and returns the reranking score alongside compression metrics. `process()` accepts either a single query/context pair or batched inputs. Use `question: str` with `context: str` for one document, `question: str` with `context: List[str]` to prune multiple documents for the same query, or `question: List[str]` and `context: List[str]` to batch independent pairs. To feed pre-segmented sentences, pass `context: List[List[str]]`; each inner list is treated as already split and the built-in splitter is skipped. > ⚠️ Shape matters: A lone string paired with a list of contexts is interpreted as one query with many documents. Make sure `question` and `context` have matching shapes when batching to avoid truncated or duplicated outputs. - `question: str | Sequence[str]` – Query text. Provide a list to batch multiple questions; each item pairs with the corresponding entry in `context`. - `context: str | Sequence[str] | Sequence[Sequence[str]]` – Contexts aligned to the query. Use a list for one document per query, or a list of lists to supply multiple documents (or pre-split sentences) for each query. - `title: str | Sequence[str] | Sequence[Sequence[str]] | None` – Optional titles aligned to each context. The default sentinel `"firstsentence"` marks the opening sentence so you can keep it by pairing with `alwaysselecttitle=True` or `firstlineastitle=True`; without those flags it is scored like any other sentence. Set `None` to disable all title handling. - `threshold: float` (default `0.1`) – Pruning probability cutoff. Larger values discard more sentences; `0.05–0.5` works well across datasets. - `batchsize: int` (default `32`) – Number of contexts processed per inference batch. Increase for throughput, decrease if you run out of memory. - `language: str | None` – Built-in splitter selection (`"auto"`, `"ja"`, `"en"`). The default behaves like `"auto"` and detects Japanese vs. English automatically. - `reorder: bool` and `topk: int | None` – When `reorder=True`, contexts are sorted by reranker score. Combine with `topk` to keep only the top-ranked documents. - `firstlineastitle: bool` / `alwaysselecttitle: bool` – Extract the first non-empty line as a title and optionally guarantee the title sentence survives pruning. - `returnsentencemetrics: bool` / `returnsentencetexts: bool` – Include per-sentence probabilities and kept/removed sentence lists in the output for analysis workflows. Detailed metrics live in the OpenProvence v1 Evaluation Report. Highlights below show MLDR at each model's best Has Answer threshold, plus cross-dataset means at the standard 0.10 threshold: MLDR with LLM eval · English (best Has Answer per model) Threshold (p) lists the pruning probability that produced the highest Has Answer score for each checkpoint. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 93.68 | 0.00% | 0.00% | | en-gte | 149M | 0.10 | 94.25 | 92.33% | 99.91% | | xsmall | 30M | 0.05 | 93.68 | 82.18% | 99.18% | | base | 130M | 0.05 | 93.68 | 90.05% | 99.62% | | large | 310M | 0.10 | 93.10 | 94.38% | 99.90% | | naver-provence | 305M | 0.10 | 93.10 | 94.00% | 99.50% | Highlights: en-gte surpasses the original (no compression) baseline in Has Answer score while achieving over 92% compression on positive samples. The large model (310M) achieves performance comparable to the naver/provence baseline (305M) with similar parameter counts. Smaller models (xsmall and base) match the original baseline's accuracy with substantial compression benefits, though with slightly lower compression rates compared to the larger models. MLDR with LLM eval · Japanese (best Has Answer per model) Threshold (p) again shows the probability cutoff that maximised Has Answer for each model. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 77.71 | 0.00% | 0.00% | | xsmall | 30M | 0.05 | 81.93 | 76.46% | 96.11% | | base | 130M | 0.05 | 83.13 | 80.98% | 97.89% | | large | 310M | 0.10 | 79.52 | 87.89% | 98.82% | Highlights: All models significantly outperform the original baseline on Japanese MLDR. base delivers the top Has Answer score (+5.42 points over original) while retaining strong compression. Even large, which prioritizes maximum compression (nearly 88% positive, 99% negative), exceeds the original baseline by 1.81 points. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | en-gte | 0.734 | 39.9% | 0.55 | | xsmall | 0.696 | 33.8% | 0.34 | | base | 0.737 | 39.9% | 0.69 | | large | 0.749 | 41.7% | 1.04 | Highlights: en-gte is the top English reranker at this threshold, while large gives the best compression with a modest latency bump. xsmall remains the latency leader. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | xsmall | 0.727 | 53.2% | 0.32 | | base | 0.768 | 57.4% | 1.06 | | large | 0.783 | 59.1% | 1.69 | Highlights: base and large deliver the strongest F2 on Japanese corpora, with large leading on compression. xsmall stays nimble for CPU-centric deployments. OpenProvence v1 checkpoints are distilled from multilingual QA corpora that were re-labeled with the Qwen3-4B teacher. English coverage spans hotchpotch/msmarco-context-relevance, hotchpotch/gooaq-context-relevance-130k, and hotchpotch/natural-questions-context-relevance. Japanese coverage comes from hotchpotch/japanese-context-relevance, which includes MS MARCO JA and native QA sources. All datasets expose sentence-span keep/drop labels plus teacher reranker scores, so you can reproduce or extend the mixture for your own domains. This model family was trained with the open-source OpenProvence stack and is reproducible on a single ≥16 GB NVIDIA GPU. 1. Teacher Label Generation (DeepSeek-V3) Use DeepSeek-V3 to annotate question/context relevance, producing the multilingual 140k-sample dataset qa-context-relevance-multilingual-140k. 2. Teacher Context-Relevance SFT (Qwen3-4B) Fine-tune Qwen3-4B to build the multilingual teacher query-context-pruner-multilingual-Qwen3-4B, enabling fast, consistent span-level annotations. 3. Context-Relevance Dataset Construction Generate sentence-span labels and teacher scores from the following corpora: English: - hotchpotch/msmarco-context-relevance - hotchpotch/gooaq-context-relevance-130k - hotchpotch/natural-questions-context-relevance Japanese: - hotchpotch/japanese-context-relevance Deduplicate near-identical negatives and follow the dataset creation guide for preprocessing tips. 4. Final Model Training Distill existing reranker scores into a unified model that combines the cross-encoder reranker head with the context-pruning head. Reference docs/train.md for configuration details and baseline commands. - Provence: efficient and robust context pruning for retrieval-augmented generation inspired the overall approach. Huge thanks to the Naver Labs Europe authors for releasing both the paper and the naver/provence-reranker-debertav3-v1 checkpoint that validated how powerful Provence-style pruning can be. - Sentence Transformers provided invaluable reference implementations for cross-encoder training that informed our pipelines. If you use OpenProvence in your research, please cite it:

license:mit
146
0

open-provence-reranker-v1-gte-modernbert-base

✂️ OpenProvence: An Open-Source Implementation of Efficient and Robust Context Pruning for Retrieval-Augmented Generation > ⚡️ Lightweight Provence-style rerankers that keep the answers and drop the noise for retrieval-augmented generation. OpenProvence follows the Provence approach to simultaneously prune irrelevant passages and produce a reranking score for question-answering workflows. Modern agents—DeepResearch loops, autonomous search pipelines, context engineering systems—tend to accumulate tangential paragraphs that inflate LLM token budgets. Drop an OpenProvence checkpoint in front of your LLM to extract only the passages that matter. We provide open weights along with MIT-licensed training, inference, and dataset-construction tooling for reproducible workflows on commodity hardware. - GitHub OpenProvence - 🤗 Spaces WebUI Inference DEMO - Pruning power – Drop ~99% of off-topic sentences while still compressing 80–90% of relevant text; MLDR evaluations confirm the answers stay intact. - Ship-ready checkpoints – Four bilingual models (30M–310M parameters) on Hugging Face under MIT; the 30M xsmall runs comfortably on CPU and screams on GPU. - Reproducible training – Follow the training guide to train every checkpoint on a single ≥16 GB NVIDIA GPU. - Dataset tooling – Build OpenProvence-format corpora from your own data with the dataset creation guide. - Evaluation utilities – CLI runners for dataset retention sweeps and MLDR long-document benchmarks keep regression tracking straightforward. - Documentation-first – End-to-end reports, guides, and configs cover training, evaluation, and dataset creation. - Teacher model – A multilingual span annotator, query-context-pruner-multilingual-Qwen3-4B, powers custom label pipelines. Pick the checkpoint that matches your latency and language targets. All checkpoints are hosted on Hugging Face with permissive licensing. | Model | Language | Hugging Face ID | Parameters | Notes | |-------|----------|-----------------|------------|-------| | base | English & Japanese | hotchpotch/open-provence-reranker-v1 | 130M | Balanced accuracy vs. speed for bilingual workloads | | xsmall | English & Japanese | hotchpotch/open-provence-reranker-xsmall-v1 | 30M | Fastest option; practical even without a GPU | | large | English & Japanese | hotchpotch/open-provence-reranker-large-v1 | 310M | Highest compression at comparable F2 scores | | en-gte | English | hotchpotch/open-provence-reranker-v1-gte-modernbert-base | 149M | English-only checkpoint with top reranking fidelity | For CUDA hosts, you can optionally install flash-attention for faster inference. OpenProvence checkpoints expose a single `process` method that accepts raw question/context strings, applies sentence-level pruning, and returns the reranking score alongside compression metrics. `process()` accepts either a single query/context pair or batched inputs. Use `question: str` with `context: str` for one document, `question: str` with `context: List[str]` to prune multiple documents for the same query, or `question: List[str]` and `context: List[str]` to batch independent pairs. To feed pre-segmented sentences, pass `context: List[List[str]]`; each inner list is treated as already split and the built-in splitter is skipped. > ⚠️ Shape matters: A lone string paired with a list of contexts is interpreted as one query with many documents. Make sure `question` and `context` have matching shapes when batching to avoid truncated or duplicated outputs. - `question: str | Sequence[str]` – Query text. Provide a list to batch multiple questions; each item pairs with the corresponding entry in `context`. - `context: str | Sequence[str] | Sequence[Sequence[str]]` – Contexts aligned to the query. Use a list for one document per query, or a list of lists to supply multiple documents (or pre-split sentences) for each query. - `title: str | Sequence[str] | Sequence[Sequence[str]] | None` – Optional titles aligned to each context. The default sentinel `"firstsentence"` marks the opening sentence so you can keep it by pairing with `alwaysselecttitle=True` or `firstlineastitle=True`; without those flags it is scored like any other sentence. Set `None` to disable all title handling. - `threshold: float` (default `0.1`) – Pruning probability cutoff. Larger values discard more sentences; `0.05–0.5` works well across datasets. - `batchsize: int` (default `32`) – Number of contexts processed per inference batch. Increase for throughput, decrease if you run out of memory. - `language: str | None` – Built-in splitter selection (`"auto"`, `"ja"`, `"en"`). The default behaves like `"auto"` and detects Japanese vs. English automatically. - `reorder: bool` and `topk: int | None` – When `reorder=True`, contexts are sorted by reranker score. Combine with `topk` to keep only the top-ranked documents. - `firstlineastitle: bool` / `alwaysselecttitle: bool` – Extract the first non-empty line as a title and optionally guarantee the title sentence survives pruning. - `returnsentencemetrics: bool` / `returnsentencetexts: bool` – Include per-sentence probabilities and kept/removed sentence lists in the output for analysis workflows. Detailed metrics live in the OpenProvence v1 Evaluation Report. Highlights below show MLDR at each model's best Has Answer threshold, plus cross-dataset means at the standard 0.10 threshold: MLDR with LLM eval · English (best Has Answer per model) Threshold (p) lists the pruning probability that produced the highest Has Answer score for each checkpoint. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 93.68 | 0.00% | 0.00% | | en-gte | 149M | 0.10 | 94.25 | 92.33% | 99.91% | | xsmall | 30M | 0.05 | 93.68 | 82.18% | 99.18% | | base | 130M | 0.05 | 93.68 | 90.05% | 99.62% | | large | 310M | 0.10 | 93.10 | 94.38% | 99.90% | | naver-provence | 305M | 0.10 | 93.10 | 94.00% | 99.50% | Highlights: en-gte surpasses the original (no compression) baseline in Has Answer score while achieving over 92% compression on positive samples. The large model (310M) achieves performance comparable to the naver/provence baseline (305M) with similar parameter counts. Smaller models (xsmall and base) match the original baseline's accuracy with substantial compression benefits, though with slightly lower compression rates compared to the larger models. MLDR with LLM eval · Japanese (best Has Answer per model) Threshold (p) again shows the probability cutoff that maximised Has Answer for each model. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 77.71 | 0.00% | 0.00% | | xsmall | 30M | 0.05 | 81.93 | 76.46% | 96.11% | | base | 130M | 0.05 | 83.13 | 80.98% | 97.89% | | large | 310M | 0.10 | 79.52 | 87.89% | 98.82% | Highlights: All models significantly outperform the original baseline on Japanese MLDR. base delivers the top Has Answer score (+5.42 points over original) while retaining strong compression. Even large, which prioritizes maximum compression (nearly 88% positive, 99% negative), exceeds the original baseline by 1.81 points. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | en-gte | 0.734 | 39.9% | 0.55 | | xsmall | 0.696 | 33.8% | 0.34 | | base | 0.737 | 39.9% | 0.69 | | large | 0.749 | 41.7% | 1.04 | Highlights: en-gte is the top English reranker at this threshold, while large gives the best compression with a modest latency bump. xsmall remains the latency leader. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | xsmall | 0.727 | 53.2% | 0.32 | | base | 0.768 | 57.4% | 1.06 | | large | 0.783 | 59.1% | 1.69 | Highlights: base and large deliver the strongest F2 on Japanese corpora, with large leading on compression. xsmall stays nimble for CPU-centric deployments. OpenProvence v1 checkpoints are distilled from multilingual QA corpora that were re-labeled with the Qwen3-4B teacher. English coverage spans hotchpotch/msmarco-context-relevance, hotchpotch/gooaq-context-relevance-130k, and hotchpotch/natural-questions-context-relevance. Japanese coverage comes from hotchpotch/japanese-context-relevance, which includes MS MARCO JA and native QA sources. All datasets expose sentence-span keep/drop labels plus teacher reranker scores, so you can reproduce or extend the mixture for your own domains. This model family was trained with the open-source OpenProvence stack and is reproducible on a single ≥16 GB NVIDIA GPU. 1. Teacher Label Generation (DeepSeek-V3) Use DeepSeek-V3 to annotate question/context relevance, producing the multilingual 140k-sample dataset qa-context-relevance-multilingual-140k. 2. Teacher Context-Relevance SFT (Qwen3-4B) Fine-tune Qwen3-4B to build the multilingual teacher query-context-pruner-multilingual-Qwen3-4B, enabling fast, consistent span-level annotations. 3. Context-Relevance Dataset Construction Generate sentence-span labels and teacher scores from the following corpora: English: - hotchpotch/msmarco-context-relevance - hotchpotch/gooaq-context-relevance-130k - hotchpotch/natural-questions-context-relevance Japanese: - hotchpotch/japanese-context-relevance Deduplicate near-identical negatives and follow the dataset creation guide for preprocessing tips. 4. Final Model Training Distill existing reranker scores into a unified model that combines the cross-encoder reranker head with the context-pruning head. Reference docs/train.md for configuration details and baseline commands. - Provence: efficient and robust context pruning for retrieval-augmented generation inspired the overall approach. Huge thanks to the Naver Labs Europe authors for releasing both the paper and the naver/provence-reranker-debertav3-v1 checkpoint that validated how powerful Provence-style pruning can be. - Sentence Transformers provided invaluable reference implementations for cross-encoder training that informed our pipelines. If you use OpenProvence in your research, please cite it:

license:mit
107
0

open-provence-reranker-large-v1

✂️ OpenProvence: An Open-Source Implementation of Efficient and Robust Context Pruning for Retrieval-Augmented Generation > ⚡️ Lightweight Provence-style rerankers that keep the answers and drop the noise for retrieval-augmented generation. OpenProvence follows the Provence approach to simultaneously prune irrelevant passages and produce a reranking score for question-answering workflows. Modern agents—DeepResearch loops, autonomous search pipelines, context engineering systems—tend to accumulate tangential paragraphs that inflate LLM token budgets. Drop an OpenProvence checkpoint in front of your LLM to extract only the passages that matter. We provide open weights along with MIT-licensed training, inference, and dataset-construction tooling for reproducible workflows on commodity hardware. - GitHub OpenProvence - 🤗 Spaces WebUI Inference DEMO - Pruning power – Drop ~99% of off-topic sentences while still compressing 80–90% of relevant text; MLDR evaluations confirm the answers stay intact. - Ship-ready checkpoints – Four bilingual models (30M–310M parameters) on Hugging Face under MIT; the 30M xsmall runs comfortably on CPU and screams on GPU. - Reproducible training – Follow the training guide to train every checkpoint on a single ≥16 GB NVIDIA GPU. - Dataset tooling – Build OpenProvence-format corpora from your own data with the dataset creation guide. - Evaluation utilities – CLI runners for dataset retention sweeps and MLDR long-document benchmarks keep regression tracking straightforward. - Documentation-first – End-to-end reports, guides, and configs cover training, evaluation, and dataset creation. - Teacher model – A multilingual span annotator, query-context-pruner-multilingual-Qwen3-4B, powers custom label pipelines. Pick the checkpoint that matches your latency and language targets. All checkpoints are hosted on Hugging Face with permissive licensing. | Model | Language | Hugging Face ID | Parameters | Notes | |-------|----------|-----------------|------------|-------| | base | English & Japanese | hotchpotch/open-provence-reranker-v1 | 130M | Balanced accuracy vs. speed for bilingual workloads | | xsmall | English & Japanese | hotchpotch/open-provence-reranker-xsmall-v1 | 30M | Fastest option; practical even without a GPU | | large | English & Japanese | hotchpotch/open-provence-reranker-large-v1 | 310M | Highest compression at comparable F2 scores | | en-gte | English | hotchpotch/open-provence-reranker-v1-gte-modernbert-base | 149M | English-only checkpoint with top reranking fidelity | For CUDA hosts, you can optionally install flash-attention for faster inference. OpenProvence checkpoints expose a single `process` method that accepts raw question/context strings, applies sentence-level pruning, and returns the reranking score alongside compression metrics. `process()` accepts either a single query/context pair or batched inputs. Use `question: str` with `context: str` for one document, `question: str` with `context: List[str]` to prune multiple documents for the same query, or `question: List[str]` and `context: List[str]` to batch independent pairs. To feed pre-segmented sentences, pass `context: List[List[str]]`; each inner list is treated as already split and the built-in splitter is skipped. > ⚠️ Shape matters: A lone string paired with a list of contexts is interpreted as one query with many documents. Make sure `question` and `context` have matching shapes when batching to avoid truncated or duplicated outputs. - `question: str | Sequence[str]` – Query text. Provide a list to batch multiple questions; each item pairs with the corresponding entry in `context`. - `context: str | Sequence[str] | Sequence[Sequence[str]]` – Contexts aligned to the query. Use a list for one document per query, or a list of lists to supply multiple documents (or pre-split sentences) for each query. - `title: str | Sequence[str] | Sequence[Sequence[str]] | None` – Optional titles aligned to each context. The default sentinel `"firstsentence"` marks the opening sentence so you can keep it by pairing with `alwaysselecttitle=True` or `firstlineastitle=True`; without those flags it is scored like any other sentence. Set `None` to disable all title handling. - `threshold: float` (default `0.1`) – Pruning probability cutoff. Larger values discard more sentences; `0.05–0.5` works well across datasets. - `batchsize: int` (default `32`) – Number of contexts processed per inference batch. Increase for throughput, decrease if you run out of memory. - `language: str | None` – Built-in splitter selection (`"auto"`, `"ja"`, `"en"`). The default behaves like `"auto"` and detects Japanese vs. English automatically. - `reorder: bool` and `topk: int | None` – When `reorder=True`, contexts are sorted by reranker score. Combine with `topk` to keep only the top-ranked documents. - `firstlineastitle: bool` / `alwaysselecttitle: bool` – Extract the first non-empty line as a title and optionally guarantee the title sentence survives pruning. - `returnsentencemetrics: bool` / `returnsentencetexts: bool` – Include per-sentence probabilities and kept/removed sentence lists in the output for analysis workflows. Detailed metrics live in the OpenProvence v1 Evaluation Report. Highlights below show MLDR at each model's best Has Answer threshold, plus cross-dataset means at the standard 0.10 threshold: MLDR with LLM eval · English (best Has Answer per model) Threshold (p) lists the pruning probability that produced the highest Has Answer score for each checkpoint. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 93.68 | 0.00% | 0.00% | | en-gte | 149M | 0.10 | 94.25 | 92.33% | 99.91% | | xsmall | 30M | 0.05 | 93.68 | 82.18% | 99.18% | | base | 130M | 0.05 | 93.68 | 90.05% | 99.62% | | large | 310M | 0.10 | 93.10 | 94.38% | 99.90% | | naver-provence | 305M | 0.10 | 93.10 | 94.00% | 99.50% | Highlights: en-gte surpasses the original (no compression) baseline in Has Answer score while achieving over 92% compression on positive samples. The large model (310M) achieves performance comparable to the naver/provence baseline (305M) with similar parameter counts. Smaller models (xsmall and base) match the original baseline's accuracy with substantial compression benefits, though with slightly lower compression rates compared to the larger models. MLDR with LLM eval · Japanese (best Has Answer per model) Threshold (p) again shows the probability cutoff that maximised Has Answer for each model. | Model | Params | Threshold (p) | Has Answer (%) | Compression (pos) | Compression (neg) | |-------|--------|---------------|----------------|-------------------|-------------------| | none (original) | - | - | 77.71 | 0.00% | 0.00% | | xsmall | 30M | 0.05 | 81.93 | 76.46% | 96.11% | | base | 130M | 0.05 | 83.13 | 80.98% | 97.89% | | large | 310M | 0.10 | 79.52 | 87.89% | 98.82% | Highlights: All models significantly outperform the original baseline on Japanese MLDR. base delivers the top Has Answer score (+5.42 points over original) while retaining strong compression. Even large, which prioritizes maximum compression (nearly 88% positive, 99% negative), exceeds the original baseline by 1.81 points. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | en-gte | 0.734 | 39.9% | 0.55 | | xsmall | 0.696 | 33.8% | 0.34 | | base | 0.737 | 39.9% | 0.69 | | large | 0.749 | 41.7% | 1.04 | Highlights: en-gte is the top English reranker at this threshold, while large gives the best compression with a modest latency bump. xsmall remains the latency leader. | Model | Mean F2 | Mean Compression | Mean Inference Time (s) | |-------|---------|------------------|-------------------------| | xsmall | 0.727 | 53.2% | 0.32 | | base | 0.768 | 57.4% | 1.06 | | large | 0.783 | 59.1% | 1.69 | Highlights: base and large deliver the strongest F2 on Japanese corpora, with large leading on compression. xsmall stays nimble for CPU-centric deployments. OpenProvence v1 checkpoints are distilled from multilingual QA corpora that were re-labeled with the Qwen3-4B teacher. English coverage spans hotchpotch/msmarco-context-relevance, hotchpotch/gooaq-context-relevance-130k, and hotchpotch/natural-questions-context-relevance. Japanese coverage comes from hotchpotch/japanese-context-relevance, which includes MS MARCO JA and native QA sources. All datasets expose sentence-span keep/drop labels plus teacher reranker scores, so you can reproduce or extend the mixture for your own domains. This model family was trained with the open-source OpenProvence stack and is reproducible on a single ≥16 GB NVIDIA GPU. 1. Teacher Label Generation (DeepSeek-V3) Use DeepSeek-V3 to annotate question/context relevance, producing the multilingual 140k-sample dataset qa-context-relevance-multilingual-140k. 2. Teacher Context-Relevance SFT (Qwen3-4B) Fine-tune Qwen3-4B to build the multilingual teacher query-context-pruner-multilingual-Qwen3-4B, enabling fast, consistent span-level annotations. 3. Context-Relevance Dataset Construction Generate sentence-span labels and teacher scores from the following corpora: English: - hotchpotch/msmarco-context-relevance - hotchpotch/gooaq-context-relevance-130k - hotchpotch/natural-questions-context-relevance Japanese: - hotchpotch/japanese-context-relevance Deduplicate near-identical negatives and follow the dataset creation guide for preprocessing tips. 4. Final Model Training Distill existing reranker scores into a unified model that combines the cross-encoder reranker head with the context-pruning head. Reference docs/train.md for configuration details and baseline commands. - Provence: efficient and robust context pruning for retrieval-augmented generation inspired the overall approach. Huge thanks to the Naver Labs Europe authors for releasing both the paper and the naver/provence-reranker-debertav3-v1 checkpoint that validated how powerful Provence-style pruning can be. - Sentence Transformers provided invaluable reference implementations for cross-encoder training that informed our pipelines. If you use OpenProvence in your research, please cite it:

license:mit
64
0

query-crafter-japanese-Qwen3-1.7B

NaNK
license:apache-2.0
50
11

query-context-pruner-multilingual-Qwen3-4B

NaNK
license:mit
26
0

japanese-splade-base-v1

NaNK
license:mit
8
8

query-crafter-japanese-Qwen3-4B

Please see 👉 https://huggingface.co/hotchpotch/query-crafter-japanese-Qwen3-1.7B

NaNK
license:apache-2.0
8
0

vespa-onnx-intfloat-multilingual-e5-large

license:mit
6
2

vespa-onnx-intfloat-multilingual-e5-small

license:mit
5
0

query-context-pruner-multilingual-Qwen3-1.7B

NaNK
license:mit
5
0

bert-base-japanese-v3-retromae

NaNK
license:mit
4
1

ruri-pt-base-retromae

NaNK
license:mit
4
1

youri-7b-sft-qa-context-jaqket-gptq

NaNK
llama
4
0

vespa-onnx-BAAI-bge-m3-only-dense

license:mit
3
3

mMiniLMv2-L12-H384

license:mit
2
2

luke-japanese-base-lite-xlm-roberta

license:apache-2.0
2
0

japanese-splade-base-v1_5

license:mit
2
0

youri-7b-sft-qa-context-jaqket-awq

NaNK
llama
1
1

fineweb-2-edu-japanese-classifier

NaNK
license:mit
1
1

Static Embedding Japanese

license:mit
0
33

open-calm-7b_lora_open2ch

NaNK
0
2

xlm-roberta-japanese-tokenizer

license:mit
0
2

japanese-splade-base-v1-dummy-fast-tokenizer-for-tei

NaNK
license:mit
0
1

fineweb-2-japanese-text-cleaner

NaNK
license:mit
0
1