knowledgator

96 models • 4 total models in database
Sort by:

SMILES2IUPAC-canonical-base

SMILES2IUPAC-canonical-base was designed to accurately translate SMILES chemical names to IUPAC standards. SMILES2IUPAC-canonical-base is based on the MT5 model with optimizations in implementing different tokenizers for the encoder and decoder. - Developed by: Knowladgator Engineering - Model type: Encoder-Decoder with attention mechanism - Language(s) (NLP): SMILES, IUPAC (English) - License: Apache License 2.0 Model Sources - Paper: coming soon - Demo: ChemicalConverters SMILES to IUPAC ! Preferred IUPAC style To choose the preferred IUPAC style, place style tokens before your SMILES sequence. | Style Token | Description | |-------------|----------------------------------------------------------------------------------------------------| | ` ` | The most known name of the substance, sometimes is the mixture of traditional and systematic style | | ` ` | The totally systematic style without trivial names | | ` ` | The style is based on trivial names of the parts of substances | Validation SMILES to IUPAC translations It's possible to validate the translations by reverse translation into IUPAC and calculating Tanimoto similarity of two molecules fingerprints. ` ` The larger is Tanimoto similarity, the larger is probability, that the prediction was correct. This model has limited accuracy in processing large molecules and currently, doesn't support isomeric and isotopic SMILES. The model was trained on 100M examples of SMILES-IUPAC pairs with lr=0.00001, batchsize=512 for 2 epochs. | Model | Accuracy | BLEU-4 score | Size(MB) | |-------------------------------------|---------|------------------|----------| | SMILES2IUPAC-canonical-small |75% |0.93 |23 | | SMILES2IUPAC-canonical-base |86.9% |0.964 |180 | | STOUT V2.0\ |66.65% |0.92 |128 | | STOUT V2.0 (according to our tests) | |0.89 |128 | According to the original paper https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00512-4

license:apache-2.0
10,625
9

gliner-multitask-v1.0

license:apache-2.0
6,280
34

modern-gliner-bi-large-v1.0

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type using a bidirectional transformer encoders (BERT-like). It provides a practical alternative to traditional NER models, which are limited to predefined entities, and Large Language Models (LLMs) that, despite their flexibility, are costly and large for resource-constrained scenarios. This particular version utilizes bi-encoder architecture, where the textual encoder is ModernBERT-large and entity label encoder is sentence transformer - BGE-base-en. Such architecture brings several advantages over uni-encoder GLiNER: An unlimited amount of entities can be recognized at a single time; Faster inference if entity embeddings are preprocessed; Better generalization to unseen entities; Utilization of ModernBERT uncovers up to 4 times better efficiency in comparison to DeBERTa-based models and context length up to 8,192 tokens while demonstrating comparable results. However, bi-encoder architecture has some drawbacks such as a lack of inter-label interactions that make it hard for the model to disambiguate semantically similar but contextually different entities. Installation & Usage Install or update the gliner package: You need to install the latest version of transformers to use this model: Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using `GLiNER.frompretrained` and predict entities with `predictentities`. If you want to use flash attention or increase sequence length, please, check the following code: Firstly, install Flash Attention and Triton packages: If you have a large amount of entities and want to pre-embed them, please, refer to the following code snippet: Below you can see the table with benchmarking results on various named entity recognition datasets: | Dataset | Score | |-------------------------|--------| | ACE 2004 | 30.5% | | ACE 2005 | 26.7% | | AnatEM | 37.2% | | Broad Tweet Corpus | 72.1% | | CoNLL 2003 | 69.3% | | FabNER | 22.0% | | FindVehicle | 40.3% | | GENIANER | 55.6% | | HarveyNER | 16.1% | | MultiNERD | 73.8% | | Ontonotes | 39.2% | | PolyglotNER | 49.1% | | TweetNER7 | 39.6% | | WikiANN en | 54.7% | | WikiNeural | 83.7% | | bc2gm | 53.7% | | bc4chemd | 52.1% | | bc5cdr | 67.0% | | ncbi | 61.7% | | Average | 49.7% | | | | | CrossNERAI | 58.1% | | CrossNERliterature | 60.0% | | CrossNERmusic | 73.0% | | CrossNERpolitics | 72.8% | | CrossNERscience | 66.5% | | mit-movie | 47.6% | | mit-restaurant | 40.6% | | Average (zero-shot benchmark) | 59.8% | Connect with our community on Discord for news, support, and discussion about our models. Join Discord.

license:apache-2.0
5,424
59

comprehend_it-base

license:apache-2.0
3,339
86

gliclass-edge-v3.0

license:apache-2.0
2,067
16

gliclass-base-v3.0

license:apache-2.0
1,729
8

gliclass-large-v3.0

license:apache-2.0
1,139
6

gliner-pii-base-v1.0

license:apache-2.0
909
7

gliclass-small-v1.0

license:apache-2.0
900
2

Gliner Multitask Large V0.5

🚀 Meet the first multi-task prompt-tunable GLiNER model 🚀 GLiNER-Multitask is a model designed to extract various pieces of information from plain text based on a user-provided custom prompt. This versatile model leverages a bidirectional transformer encoder, similar to BERT, which ensures both high generalization and compute efficiency despite its compact size. The `gliner-multitask-large` variant achieves state-of-the-art performance on NER zero-shot benchmarks, demonstrating its robustness and flexibility. It excels not only in named entity recognition but also in handling various other information extraction tasks, making it a powerful tool for diverse natural language processing applications. Supported tasks: Named Entity Recognition (NER): Identifies and categorizes entities such as names, organizations, dates, and other specific items in the text. Relation Extraction: Detects and classifies relationships between entities within the text. Summarization: Extract the most important sentences that summarize the input text, capturing the essential information. Sentiment Extraction: Identify parts of the text that signalize a positive, negative, or neutral sentiment; Key-Phrase Extraction: Identifies and extracts important phrases and keywords from the text. Question-answering: Finding an answer in the text given a question; Open Information Extraction: Extracts pieces of text given an open prompt from a user, for example, product description extraction; Installation To use this model, you must install the GLiNER Python library: Once you've downloaded the GLiNER library, you can import the GLiNER class. You can then load this model using GLiNER.frompretrained. Construct relations extraction pipeline with utca First of all, we need import neccessary components of the library and initalize predictor - GLiNER model and construct pipeline that combines NER and realtions extraction: To run pipeline we need to specify entity types and relations with their parameters: With threshold parameters, you can control how much information you want to extract. Our multitask model demonstrates comparable performance on different zero-shot benchmarks to dedicated models to NER task (all labels were lowecased in this testing): | Model | Dataset | Precision | Recall | F1 Score | F1 Score (Decimal) | |------------------------------------|--------------------|-----------|--------|----------|--------------------| | numind/NuNERZero-span | CrossNERAI | 63.82% | 56.82% | 60.12% | 0.6012 | | | CrossNERliterature| 73.53% | 58.06% | 64.89% | 0.6489 | | | CrossNERmusic | 72.69% | 67.40% | 69.95% | 0.6995 | | | CrossNERpolitics | 77.28% | 68.69% | 72.73% | 0.7273 | | | CrossNERscience | 70.08% | 63.12% | 66.42% | 0.6642 | | | mit-movie | 63.00% | 48.88% | 55.05% | 0.5505 | | | mit-restaurant | 54.81% | 37.62% | 44.62% | 0.4462 | | | Average | | | | 0.6196 | | knowledgator/gliner-multitask-v0.5 | CrossNERAI | 51.00% | 51.11% | 51.05% | 0.5105 | | | CrossNERliterature | 72.65% | 65.62% | 68.96% | 0.6896 | | | CrossNERmusic | 74.91% | 73.70% | 74.30% | 0.7430 | | | CrossNERpolitics | 78.84% | 77.71% | 78.27% | 0.7827 | | | CrossNERscience | 69.20% | 65.48% | 67.29% | 0.6729 | | | mit-movie | 61.29% | 52.59% | 56.60% | 0.5660 | | | mit-restaurant | 50.65% | 38.13% | 43.51% | 0.4351 | | | Average | | | | 0.6276 | | urchade/glinerlarge-v2.1 | CrossNERAI | 54.98% | 52.00% | 53.45% | 0.5345 | | | CrossNERliterature| 59.33% | 56.47% | 57.87% | 0.5787 | | | CrossNERmusic | 67.39% | 66.77% | 67.08% | 0.6708 | | | CrossNERpolitics | 66.07% | 63.76% | 64.90% | 0.6490 | | | CrossNERscience | 61.45% | 62.56% | 62.00% | 0.6200 | | | mit-movie | 55.94% | 47.36% | 51.29% | 0.5129 | | | mit-restaurant | 53.34% | 40.83% | 46.25% | 0.4625 | | | Average | | | | 0.5754 | | EmergentMethods/glinerlargenews-v2.1| CrossNERAI | 59.60% | 54.55% | 56.96% | 0.5696 | | | CrossNERliterature| 65.41% | 56.16% | 60.44% | 0.6044 | | | CrossNERmusic | 67.47% | 63.08% | 65.20% | 0.6520 | | | CrossNERpolitics | 66.05% | 60.07% | 62.92% | 0.6292 | | | CrossNERscience | 68.44% | 63.57% | 65.92% | 0.6592 | | | mit-movie | 65.85% | 49.59% | 56.57% | 0.5657 | | | mit-restaurant | 54.71% | 35.94% | 43.38% | 0.4338 | | | Average | | | | 0.5876 | Connect with our community on Discord for news, support, and discussion about our models. Join Discord.

license:apache-2.0
692
134

SMILES2IUPAC-canonical-small

license:apache-2.0
628
7

Llama-encoder-1.0B

NaNK
llama
567
3

Gliner Pii Large V1.0

license:apache-2.0
496
26

gliner-pii-edge-v1.0

license:apache-2.0
473
10

gliclass-modern-large-v3.0

license:apache-2.0
353
13

gliner-pii-small-v1.0

license:apache-2.0
302
4

gliclass-base-v2.0-rac-init

NaNK
license:apache-2.0
260
10

gliclass-modern-base-v3.0

license:apache-2.0
232
3

gliner-x-large

license:apache-2.0
217
31

gliclass-large-v1.0

license:apache-2.0
151
5

gliner-poly-small-v1.0

license:apache-2.0
123
15

comprehend_it-multilingual-t5-base

license:apache-2.0
116
26

Qwen-encoder-0.5B

NaNK
license:apache-2.0
101
9

IUPAC2SMILES-canonical-base

license:apache-2.0
95
6

gliclass-modern-base-v2.0-init

NaNK
license:apache-2.0
93
24

t5-for-ie

license:apache-2.0
92
4

gliner-bi-edge-v2.0

license:apache-2.0
79
4

gliclass-base-v1.0-lw

license:apache-2.0
77
2

UTC DeBERTa Large V2

license:apache-2.0
76
24

gliner-linker-large-v1.0

license:apache-2.0
76
7

Qwen-encoder-1.5B

NaNK
license:apache-2.0
69
2

gliner-relex-large-v1.0

license:apache-2.0
69
0

UTC-DeBERTa-small-v2

license:apache-2.0
67
1

gliclass-modern-large-v2.0

license:apache-2.0
64
3

gliclass-modern-large-v2.0-init

NaNK
license:apache-2.0
63
8

gliner-x-small-v0.5

license:cc-by-nc-sa-4.0
63
4

flan-t5-small-for-classification

license:apache-2.0
63
0

flan-t5-large-for-classification

license:apache-2.0
59
1

gliner-relex-large-v0.5

license:apache-2.0
58
20

gliner-linker-base-v1.0

license:apache-2.0
55
5

gliclass-base-v1.0

license:apache-2.0
53
3

gliclass-qwen-1.5B-v1.0

NaNK
license:apache-2.0
51
2

gliclass-large-v1.0-lw

license:apache-2.0
49
3

gliner-x-base

license:apache-2.0
45
8

gliclass-x-base

license:apache-2.0
42
5

gliner-bi-large-v1.0

license:apache-2.0
41
24

gliclass-large-v1.0-init

license:apache-2.0
35
14

gliner-linker-rerank-v1.0

license:apache-2.0
34
5

gliclass-base-v1.0-init

license:apache-2.0
33
2

Qwen2-0.5Bchp-test1

NaNK
33
0

retrico-lm-2b-sft-gemma

NaNK
31
0

gliner-decoder-large-v1.0

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type in a zero-shot manner. This architecture combines: An encoder for representing entity spans A decoder for generating label names This hybrid approach enables new use cases such as entity linking and expands GLiNER’s capabilities. By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their richer knowledge capacity while maintaining competitive inference speed. Open ontology: Works when the label set is unknown Multi-label entity recognition: Assign multiple labels to a single entity Entity linking: Handle large label sets via constrained generation Knowledge expansion: Gain from large decoder models Efficient: Minimal speed reduction on GPU compared to single-encoder GLiNER Usage If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below: If you need to run a model on many text and/or set some labels constraints, please check example below: You can limit the decoder to generate labels only from a predefined set: Two label trie implementations are available. For a faster, memory-efficient C++ version, install Cython: This can significantly improve performance and reduce memory usage, especially with millions of labels.

license:apache-2.0
30
17

modern-gliner-bi-base-v1.0

NaNK
license:apache-2.0
25
26

gliner-bi-small-v1.0

license:apache-2.0
22
10

gliner-bi-small-v2.0

license:apache-2.0
19
3

retrico-lm-gemma4-grpo-2b

NaNK
19
0

gliner-x-small

license:apache-2.0
18
15

gliner-decoder-small-v1.0

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type in a zero-shot manner. This architecture combines: An encoder for representing entity spans A decoder for generating label names This hybrid approach enables new use cases such as entity linking and expands GLiNER’s capabilities. By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their richer knowledge capacity while maintaining competitive inference speed. Open ontology: Works when the label set is unknown Multi-label entity recognition: Assign multiple labels to a single entity Entity linking: Handle large label sets via constrained generation Knowledge expansion: Gain from large decoder models Efficient: Minimal speed reduction on GPU compared to single-encoder GLiNER Usage If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below: If you need to run a model on many text and/or set some labels constraints, please check example below: You can limit the decoder to generate labels only from a predefined set: Two label trie implementations are available. For a faster, memory-efficient C++ version, install Cython: This can significantly improve performance and reduce memory usage, especially with millions of labels.

license:apache-2.0
17
4

gliclass-small-v1.0-init

license:apache-2.0
16
5

gliner-x-base-v0.5

license:cc-by-nc-sa-4.0
14
3

gliclass-qwen-0.5B-v1.0

NaNK
license:apache-2.0
14
1

UTC-DeBERTa-base-v2

license:apache-2.0
12
0

SMILES-DeBERTa-small

license:apache-2.0
11
3

gliner-llama-1.3B-v1.0

NaNK
license:apache-2.0
10
1

gliner-qwen-0.5B-v1.0

NaNK
license:apache-2.0
9
2

gliclass_msmarco_merged

9
0

gliner-qwen-1.5B-v1.0

NaNK
license:apache-2.0
8
5

gliclass-modern-base-v2.0

license:apache-2.0
8
2

gliclass-llama-1.3B-v1.0

NaNK
license:apache-2.0
8
1

gliclass-small-v1.0-lw

license:apache-2.0
8
0

gliner-decoder-base-v1.0

GLiNER is a Named Entity Recognition (NER) model capable of identifying any entity type in a zero-shot manner. This architecture combines: An encoder for representing entity spans A decoder for generating label names This hybrid approach enables new use cases such as entity linking and expands GLiNER’s capabilities. By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their richer knowledge capacity while maintaining competitive inference speed. Open ontology: Works when the label set is unknown Multi-label entity recognition: Assign multiple labels to a single entity Entity linking: Handle large label sets via constrained generation Knowledge expansion: Gain from large decoder models Efficient: Minimal speed reduction on GPU compared to single-encoder GLiNER Usage If you need an open ontology entity extraction use tag `label` in the list of labels, please check example below: If you need to run a model on many text and/or set some labels constraints, please check example below: You can limit the decoder to generate labels only from a predefined set: Two label trie implementations are available. For a faster, memory-efficient C++ version, install Cython: This can significantly improve performance and reduce memory usage, especially with millions of labels.

license:apache-2.0
7
11

SMILES-DeBERTa-base

license:apache-2.0
7
4

gliner-bi-llama-v1.0

license:apache-2.0
7
0

UTC-DeBERTa-large

license:apache-2.0
6
14

UTC-DeBERTa-small

license:apache-2.0
6
12

UTC-DeBERTA-base

license:apache-2.0
6
8

gliner-llama-1B-v1.0

NaNK
license:apache-2.0
6
6

IUPAC2SMILES-canonical-small

license:apache-2.0
6
5

gliner-bi-base-v1.0

license:apache-2.0
6
4

SMILES-DeBERTa-large

license:apache-2.0
6
3

Sheared-LLaMA-encoder-1.3B

NaNK
llama
5
2

gliner-x-large-v0.5

license:cc-by-nc-sa-4.0
4
9

flan-t5-base-for-classification

license:apache-2.0
3
2

UTC-T5-large

license:apache-2.0
2
5

gliner-llama-multitask-1B-v1.0

NaNK
license:apache-2.0
1
1

SMILES2IUPAC-isomeric-small

license:apache-2.0
1
0

UTC-DeBERTA-large-fusing

license:apache-2.0
1
0

Qwen2-0.5Bchp-570k

NaNK
1
0

Qwen2-0.5Bchp-690-updated-MultiBio-1

NaNK
1
0

gliclass-bi-fused-small

1
0

SMILES-FAST-TOKENIZER

0
3

IUPAC-FAST-TOKENIZER

0
3

gliner-poly-base-v1.0

license:apache-2.0
0
3

gliclass-instruct-edge-v1.0

license:apache-2.0
0
1

gliclass-instruct-large-v1.0

license:apache-2.0
0
1

gliclass-instruct-base-v1.0

license:apache-2.0
0
1