kalle07

18 models • 1 total models in database
Sort by:

embedder_collection

This is a collection of more than 25 types of embedding models and a really brief introduction to what you should know about embedding.If you don't keep a few things in mind, you won't be satisfied with the results. All models tested with ALLM(AnythingLLM) with LM-Studio as server, all models should be work with ollama the setup for local documents described below is allmost the same, GPT4All has only one model (nomic), and koboldcpp and JAN(Menlo) is not build in right now but in development (sometimes the results are more truthful if the “chat with document only” option is used) BTW the embedder-model is only a part of a good RAG (Retrieval-Augmented Generation) ⇨ give me a ❤️, if you like ;) nomic-embed-text (up to 2048t context length) mxbai-embed-large mug-b-1.6 snowflake-arctic-embed-l-v2.0 (up to 8192t context length) Ger-RAG-BGE-M3 (german, up to 8192t context length) german-roberta bge-m3 (up to 8192t context length) Working well, all other its up to you! Some models are very similar! (jina and qwen based you can add manual to LM-Studio, set model "gear wheel" below "overide domain type") With the same setting, these embedders found same 6-7 snippets out of 10 from a book. This means that only 3-4 snippets were different, but I didn't test it extensively. Further tests have shown that the following models are suitable for complex tasks (German-text, but should be similar in English). Jina-DE, nomic was not that good. I'm not convinced by large models such as Qwen or JinaaiV3 and V4 doesnt work with LM studio; they are ten times slower and the result is not ten times better. Despite all this, you can recognize tables and some images. There are two embedder to find toxic content (toxic-prompt-roberta and minilmv2-toxic-jigsaw), dont know how good it works, and from ibm it give a whole LLM model (granite-guardian). Short hints for using (Example for a large context with many expected hits): Set your (Max Tokens)context-lenght 16000t main-LLM-model "LM-Studio with ALLM you must set also in LM-Stutio settings!" , set your embedder-model (Max Embedding Chunk Length) 1024t,set (Max Context Snippets) 14, in ALLM set also (Text splitting & Chunking Preferences - Text Chunk Size) 1024 character parts and (Search Preference) "accuracy". And set in your workspace 14 snippets. Hint in ALLM, set all in LM studio start both models and both are on top in ALLM. -> Ok what that mean! Your document will be embedd in x times 1024t chunks(snippets), You can receive 14-snippets a 1024t (~14000t) from your document ~10000words(10pages) and ~2000t left (from 16000t) for the answer ~1000words (2 pages). You can play and set for your needs, eg 8-snippets a 2048t, or 28-snippets a 512t ... (every time you change the chunk-length the document must be embedd again). With these settings everything fits best for ONE answer, if you need more for a conversation, you should set lower and/or disable the document. english vs german differ 50% in calculate tokens/word but ~5000 characters is one page of a book (no matter ger/en). But if you calculate with words ... words in german are longer, that means per word more token. The example is english, for german you can add apox 50% more token/word (1000 words ~1800t) 1200t (~1000 words ~5000 chracter) ~0.1GB, this is aprox one page with small font best to get in mind: 5000-6000 characters correspond to approximately one page and approximately 1200-1400 token. 8000t (~6000 words) ~1.5GB VRAM usage 16000t (~12000 words) ~3GB VRAM usage 32000t (~24000 words) ~6GB VRAM usage The vector size, or dimensionality, is the number of numbers in each embedding vector. Common embedding models produce vectors ranging from 384 dimensions (e.g., all-MiniLM-L6-v2) to 3072 dimensions (text-embedding-3-large). Higher dimensions capture more semantic details but require more storage and computational resources for database indexing and search. Some models allow you to shorten vectors (e.g., use only 256 out of 3072 dimensions) to save space while retaining high performance. Vector count refers to the total number of vectors stored, which usually corresponds to the number of content chunks indexed. More vectors mean more granularity for search and retrieval but also increase database size and operational overhead. Chunk Length Chunk length is the size (usually measured in words, tokens, or characters) of the text split for embedding (ALLM chunk length/ chunk size -> in characters). here is a tokenizer calculator https://quizgecko.com/tools/token-counter and a Vram calculator - (you need the original model link NOT the GGUF) https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator second VRAM calc for GGUF -> YOU need format "https://huggingface.co/provider/model/blob/main/model.gguf" Example: "https://huggingface.co/unsloth/granite-3.3-8b-instruct-GGUF/blob/main/granite-3.3-8b-instruct-UD-Q8KXL.gguf" https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator You have a txt/pdf file maybe 90000words(~300pages) a book. You ask the model lets say "what is described in chapter called XYZ in relation to person ZYX". Now it searches for keywords or similar semantic terms in the document. if it has found them, lets say word and meaning around “XYZ and ZYX” , now a piece of text 1024token around this word “XYZ/ZYX” is cut out at this point. (In reality, it's all done with coded numbers per chunck and thats why you dont can search for single numbers or words, but dosnt matter - the principle) This text snippet is then used for your answer. If, for example, the word/meaning “XYZ” occurs 50 times in one txt, not all 50 are used for answer, only the number of snippets with the best ranking are used If only one snippet corresponds to your question all other snippets can negatively influence your answer because they do not fit the topic (usually 4 to 32 snippet are fine) If you expect multible search results in your docs try 16-snippets or more, if you expect only 2 than dont use more! If you use chunk-length ~2048(chars) you receive more content, if you use ~512chars you receive more facts BUT lower chunk-length are more chunks and need much longer time. A question for "summary of the document" is most time not useful, if the document has an introduction or summaries its searching there if you have luck. If a book has a table of contents or a bibliography, I would delete these pages as they often contain relevant search terms but do not help answer your question. If the documents small like 10-20 Pages, its better you copy the whole text inside the CHAT, some options called "pin". If a TXT file is embedded, you cannot create a summary! Only the snippets found are used for this purpose. The same applies to word search or page search—in most cases, it does not work because it is not a word search but a search for similar expressions. Nevertheless, the main model is also important ! Especially to deal with the context length and I don't mean just the theoretical number you can set. Some models can handle 128k or 1M tokens, but even with 16k or 32k input the response with the same snippets as input is worse than with other well developed models. llama3.1, llama3.2, qwen2.5, deepseek-r1-distill, gemma-3, granite, SauerkrautLM-Nemo(german) ... (llama3 or phi3.5 are not working well) ⇨ best models for english and german: granit3.2-8b (2b version also) - https://huggingface.co/ibm-research/granite-3.2-8b-instruct-GGUF Chocolatine-2-14B (other versions also) - https://huggingface.co/mradermacher/Chocolatine-2-14B-Instruct-DPO-v2.0b11-GGUF QwQ-LCoT- (7/14B) - https://huggingface.co/mradermacher/QwQ-LCoT-14B-Conversational-GGUF gemma-3 (4/12/27B) - https://huggingface.co/bartowski/googlegemma-3-12b-it-GGUF ... Important -> The Systemprompt (some examples): The system prompt is weighted with a certain amount of influence around your question. You can easily test it once without or with a nonsensical system prompt. "You are a helpful assistant who provides an overview of ... under the aspects of ... . You use attached excerpts from the collection to generate your answers! Weight each individual excerpt in order, with the most important excerpts at the top and the less important ones further down. The context of the entire article should not be given too much weight. Answer the user's question! After your answer, briefly explain why you included excerpts (1 to X) in your response and justify briefly if you considered some of them unimportant!" (change it for your needs, this example works well when I consult a book about a person and a term related to them, the explanation part was just a test for myself) "You are an imaginative storyteller who crafts compelling narratives with depth, creativity, and coherence. Your goal is to develop rich, engaging stories that captivate readers, staying true to the themes, tone, and style appropriate for the given prompt. You use attached excerpts from the collection to generate your answers! When generating stories, ensure the coherence in characters, setting, and plot progression. Be creative and introduce imaginative twists and unique perspectives." "You are are a warm and engaging companion who loves to talk about cooking, recipes and the joy of food. Your aim is to share delicious recipes, cooking tips and the stories behind different cultures in a personal, welcoming and knowledgeable way." btw. Jinja templates very new ... the usual templates with usual models are fine, but merged models have a lot of optimization potential (but dont ask me iam not a coder) DOC/PDF to TXT Prepare your documents by yourself! Bad Input = bad Output! In most cases, it is not immediately obvious how the document is made available to the embedder. In ALLM its "c:\Users\XXX\AppData\Roaming\anythingllm-desktop\storage\documents", you can open with a text editor to check the quality. In nearly all cases images and tables, page-numbers, chapters, formulas and sections/paragraph-format not well implement. You can start by simply saving the PDF as a TXT file, and you will then see in the TXT file how the embedding-model would see the content. An easy start is to use a python based pdf-parser (it give a lot) also OCR based for images. option one only for simple txt/tables converting: All in all you can tune a lot your code but the difficulties lie in the details. my option, one exe for windows and also python, a second option with ocr: https://huggingface.co/kalle07/pdf2txtparserconverter my raw keyword search and snippet extractor https://huggingface.co/kalle07/raw-txt-snippet-creator it give some ready to use examples, which are already pretty good, ~10-20 code-lines. only Indexing option One hint for fast search on 10000s of PDF/TXT/DOC (its only indexing, not embedding) you can use it as a simple way to find your top 5-10 articles or books, you can then make these available to an LLM. Jabref - https://github.com/JabRef/jabref/tree/v6.0-alpha?tab=readme-ov-file https://builds.jabref.org/main/ or docfetcher - https://docfetcher.sourceforge.io/en/index.html (yes old but very useful) (ALL licenses and terms of use go to original author) avemio/German-RAG-BGE-M3-MERGED-x-SNOWFLAKE-ARCTIC-HESSIAN-AI (German, English) maidalun1020/bce-embedding-basev1 (English and Chinese) maidalun1020/bce-reranker-basev1 (English, Chinese, Japanese and Korean) BAAI/bge-reranker-v2-m3 (English and Chinese) BAAI/bge-reranker-v2-gemma (English and Chinese) BAAI/bge-m3 (English and Chinese) avsolatorio/GIST-large-Embedding-v0 (English) ibm-granite/granite-embedding-278m-multilingual (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese) ibm-granite/granite-embedding-125m-english Labib11/MUG-B-1.6 (?) mixedbread-ai/mxbai-embed-large-v1 (multi) nomic-ai/nomic-embed-text-v1.5 (English, multi) nomic-ai/nomic-embed-text-v2-moe (English, Spanish, French, German, Italian, Portuguese, Polish ... 100-languages) Snowflake/snowflake-arctic-embed-l-v2.0 (English, multi) intfloat/multilingual-e5-large-instruct (100 languages) T-Systems-onsite/german-roberta-sentence-transformer-v2 T-Systems-onsite/cross-en-de-es-roberta-sentence-transformer (English, German, Spanish) T-Systems-onsite/cross-en-de-fr-roberta-sentence-transforme (English, German, France) mixedbread-ai/mxbai-embed-2d-large-v1 jinaai/jina-embeddings-v2-base-en Qwen/Qwen3-Embedding-0.6B HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1.5 thenlper/gte-large sentence-transformers/all-MiniLM-L6-v2 TatonkaHF/bge-m3enru (En - RU)

10,330
28

Llama-3.2-3B-Instruct-heretic_R5_KL003-bf16-gguf

NaNK
llama
278
0

mem-agent-thinking-heretic_R8_KL001-bf16-gguf

license:apache-2.0
216
0

llama-3.3-8b-instruct-heretic_R7_KL008_q8_0-gguf

NaNK
llama
167
0

phi-4-mini-instruct-heretic_R6_K007_bf16-gguf

license:mit
153
0

gemma-3-4b-it-heretic_R18_K005-bf16-gguf

NaNK
113
0

ministral_3-3b-instruct-2512-heretic_R3_KL0029-bf16-gguf

NaNK
license:apache-2.0
109
1

RynnBrain-8B-Q8-heretic-gguf

NaNK
license:apache-2.0
104
0

granite-3.2-2b-instruct-heretic_R3_K002-gguf

NaNK
license:apache-2.0
80
1

Jan-v1-2509_128_4_autoround-q4km-gguf

NaNK
license:apache-2.0
75
0

FLUX.2-klein-9B-lora-set

NaNK
62
1

Teuken-7B-instruct-v0.6-Q8_0-GGUF

NaNK
llama-cpp
30
0

Jan-v1-2509_128_4_autoround

NaNK
license:apache-2.0
28
0

granite-3.2-2b-instruct-heretic_R3_K002-q8_0-gguf

NaNK
license:apache-2.0
12
0

pdf2txt_parser_converter

PDF to TXT converter ready to chunck for your RAG ONLY WINDOWS EXE and PY available (en and german) better input = better output PDF Parser - Sevenof9v7f.exe - EXE GUI PDF Parser - Sevenof9v7f.py - Python, you need import libs doclingbysevenof9v1.py - Python, you need nvidia RTX to run it fast all other older versions ⇨ give me a ❤️, if you like ;) DOWNLOAD: "PDF Parser - Sevenof9v7f.exe" or PDF "Parser - Sevenof9v7f.py" or "doclingbysevenof9v1.py" (read below) ... Most LLM applications only convert your PDF simple to txt, nothing more, its like you save your PDF as txt file. For usual flowing text Books its quite okay! But often blocks of text that are close together are mixed up and tables cannot be read logically. Therefore its better to convert it with some help of a "Parser" . The embedder can now find a better context. I work with " pdfplumber/pdfminer " none OCR, so its very fast! Works with single and multi PDF list, works with folder Intelligent multiprocessing ~10-20 pages per second Error tolerant, that means if your PDF is not convertible, it will be skipped, no special handling Instant view of the result, hit one pdf on top of the list Removes about 5% of the margins around the page Converts some common tables as json inside the txt file Add the absolute PAGE number to each page Add the tag “chapter” or “important” to large and/or bold font. All txt files will be created in original folder of PDF, same name as .txt All txt files will be overwritten if you start converting with same PDF If there are many text blocks on a page, it may be that text blocks that you would read first appear further down the page. (It is a compromise between many layout options) Small blocks of text (such as units or individual numbers), usually near diagrams and sketches, appear at the end of each page I advise against using a PDF file directly for RAG formatting (embedding), as you never know how it will look, and incorrect input can lead to poor results aprox 5 to 20 Pages/sec - depends on complexity and system-power tested on 300 PDF files ~30000 pages This I have created with my brain and the help of chatGPT, Iam not a coder... sorry so I will not fulfill any wishes unless there are real errors. It is really hard for me with GUI and the Function and in addition to compile it. For the python-file you need to import missing libraries. Of course there is a lot of need for optimization(save/error-handling) or the use of other parser libraries, but it's a start. i am working on a 50% faster version. in addition, the GUI should allow more influence on the processing, e.g. faster raw text, trim margins (yes/no) and set % yourself, set unimportant text block size, layout with line breaks or force more continuous text. preview on first 10 pages with generated images what is detected with border arround text and tables Give me a hand if you can ;) ... I also have a " docling " parser with OCR (GPU is need for fast processing), its only be a python-file, not compiled. You have to download all libs, and if you start (first time) internal also OCR models are downloaded. At the moment i have prepared a kind of multi docling, the number of parallel processed PDFs depend on VRAM and if you use OCR only for tables or for all. I have set VRAM = 16GB (my GPU RAM, you should set yours) and the multiple calls for docling are VRAM/1.3, so it uses ~12GB (in my version) and processes 12 PDFs at once, only txt and tables are converted, so no images no diagrams (to process pages in parallel its to complicate). For now all PDFs must be same folder like the python file. If you change OCR for all the VRAM consum is rasing you have to set 1.3 to 2 or more. now have fun and leave a comment if you like ;) on discord "sevenof9" maybe of interest: a txt snippet generator if you cant use an embedder or you need exact character-match https://huggingface.co/kalle07/raw-txt-snippet-creator my embedder collection: https://huggingface.co/kalle07/embeddercollection I am not responsible for any errors or crashes on your system. If you use it, you take full responsibility!

0
10

SmartTaskMonitor

0
1

raw-txt-snippet-creator

0
1

SmartTaskTool

0
1