OpenMed

500 models • 4 total models in database
Sort by:

OpenMed-NER-PharmaDetect-SuperClinical-434M

--- widget: - text: "Administration of metformin reduced glucose levels significantly." - text: "The study evaluated the efficacy of cisplatin in cancer treatment." - text: "Patients received ibuprofen for inflammation management." - text: "The patient's medication was switched to tamoxifen to prevent breast cancer recurrence." - text: "Lithium carbonate is often prescribed for the management of bipolar disorder." tags: - token-classification - named-entity-recognition - biomedical-nlp - transfo

license:apache-2.0
895,464
16

OpenMed-NER-ChemicalDetect-ElectraMed-33M

--- widget: - text: "The patient was administered acetylsalicylic acid for pain relief." - text: "Treatment with doxorubicin showed significant improvement in tumor regression." - text: "The compound benzylpenicillin demonstrated strong antimicrobial activity." - text: "Further studies are needed to understand the effects of methotrexate on rheumatoid arthritis." - text: "The synthesis of vancomycin remains a significant challenge in organic chemistry." tags: - token-classification - named-entit

license:apache-2.0
735,024
1

OpenMed-NER-PharmaDetect-BigMed-278M

--- widget: - text: "Administration of metformin reduced glucose levels significantly." - text: "The study evaluated the efficacy of cisplatin in cancer treatment." - text: "Patients received ibuprofen for inflammation management." - text: "The patient's medication was switched to tamoxifen to prevent breast cancer recurrence." - text: "Lithium carbonate is often prescribed for the management of bipolar disorder." tags: - token-classification - named-entity-recognition - biomedical-nlp - transfo

license:apache-2.0
731,907
2

OpenMed-NER-ChemicalDetect-ModernMed-149M

--- widget: - text: "The patient was administered acetylsalicylic acid for pain relief." - text: "Treatment with doxorubicin showed significant improvement in tumor regression." - text: "The compound benzylpenicillin demonstrated strong antimicrobial activity." - text: "Further studies are needed to understand the effects of methotrexate on rheumatoid arthritis." - text: "The synthesis of vancomycin remains a significant challenge in organic chemistry." tags: - token-classification - named-entit

license:apache-2.0
727,675
0

OpenMed-NER-DiseaseDetect-BioMed-335M

--- widget: - text: "The patient was diagnosed with diabetes mellitus type 2." - text: "Symptoms of Alzheimer's disease became apparent over several months." - text: "Treatment for hypertension was initiated immediately." - text: "A possible link between Crohn's disease and gut microbiota is being investigated." - text: "The patient has a family history of cystic fibrosis." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - disease-entity-recognition - medi

license:apache-2.0
668,792
2

OpenMed-NER-OncologyDetect-MultiMed-568M

--- widget: - text: "Mutations in KRAS gene drive oncogenic transformation." - text: "The tumor suppressor p53 pathway was disrupted." - text: "EGFR amplification promotes cancer cell proliferation." - text: "Loss of function of the PTEN gene is common in many cancers." - text: "The PI3K/AKT/mTOR pathway is a critical regulator of cell growth." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - cancer-genetics - oncology - gene-regulation - cancer-research

license:apache-2.0
657,424
1

OpenMed-NER-OncologyDetect-BigMed-278M

--- widget: - text: "Mutations in KRAS gene drive oncogenic transformation." - text: "The tumor suppressor p53 pathway was disrupted." - text: "EGFR amplification promotes cancer cell proliferation." - text: "Loss of function of the PTEN gene is common in many cancers." - text: "The PI3K/AKT/mTOR pathway is a critical regulator of cell growth." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - cancer-genetics - oncology - gene-regulation - cancer-research

license:apache-2.0
653,855
0

OpenMed-NER-GenomicDetect-PubMed-109M

--- widget: - text: "The BRCA2 gene is associated with hereditary breast cancer." - text: "Mutations in the CFTR gene cause cystic fibrosis." - text: "The APOE gene variant affects Alzheimer's disease risk." - text: "The HTT gene provides instructions for making a protein called huntingtin." - text: "Sickle cell disease is caused by a mutation in the HBB gene." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - gene-recognition - genetics - genomics - molec

license:apache-2.0
651,907
2

OpenMed-NER-AnatomyDetect-ElectraMed-109M

--- widget: - text: "The patient complained of pain in the left ventricle region." - text: "Examination revealed inflammation of the hippocampus." - text: "The liver showed signs of fatty infiltration." - text: "An MRI of the cerebrum showed no signs of abnormalities." - text: "The procedure involved an incision near the femoral artery." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - anatomical-entity-recognition - medical-terminology - anatomy - health

license:apache-2.0
651,751
1

OpenMed-NER-DiseaseDetect-ElectraMed-109M

--- widget: - text: "The patient was diagnosed with diabetes mellitus type 2." - text: "Symptoms of Alzheimer's disease became apparent over several months." - text: "Treatment for hypertension was initiated immediately." - text: "A possible link between Crohn's disease and gut microbiota is being investigated." - text: "The patient has a family history of cystic fibrosis." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - disease-entity-recognition - medi

license:apache-2.0
647,881
0

OpenMed-NER-SpeciesDetect-ModernClinical-395M

--- widget: - text: "Escherichia coli bacteria were found in the water samples." - text: "The study included specimens from Homo sapiens and Mus musculus." - text: "Saccharomyces cerevisiae is commonly used in biotechnology applications." - text: "The venom of the black mamba, Dendroaspis polylepis, is highly neurotoxic." - text: "Canis lupus familiaris has been domesticated for thousands of years." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - species

license:apache-2.0
647,469
0

OpenMed-NER-GenomeDetect-ModernMed-395M

--- widget: - text: "The EGFR gene mutation was identified in lung cancer patients." - text: "Overexpression of HER2 protein correlates with poor prognosis." - text: "The TP53 gene encodes a tumor suppressor protein." - text: "The BRAF V600E mutation is a common driver in melanoma." - text: "Insulin receptor signaling is essential for glucose homeostasis." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - gene-recognition - protein-recognition - genomics -

license:apache-2.0
647,267
0

OpenMed-NER-DNADetect-SuperClinical-184M

--- widget: - text: "The p53 protein plays a crucial role in tumor suppression." - text: "Expression of BRCA1 gene was significantly upregulated in breast tissue." - text: "The NF-kB pathway regulates inflammatory responses." - text: "Activation of the STAT3 signaling pathway is observed in many cancers." - text: "The experiment involved transfecting HeLa cells with a plasmid containing the target gene." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - pr

license:apache-2.0
646,625
1

OpenMed-NER-OrganismDetect-BioPatient-108M

--- widget: - text: "Caenorhabditis elegans is a model organism for genetic studies." - text: "The research focused on Drosophila melanogaster development." - text: "Arabidopsis thaliana serves as a model for plant biology." - text: "The zebrafish, Danio rerio, is widely used for studying vertebrate development." - text: "Neurospora crassa is a type of red bread mold used in genetic research." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - species-recog

license:apache-2.0
640,733
0

OpenMed-NER-GenomeDetect-ModernMed-149M

--- widget: - text: "The EGFR gene mutation was identified in lung cancer patients." - text: "Overexpression of HER2 protein correlates with poor prognosis." - text: "The TP53 gene encodes a tumor suppressor protein." - text: "The BRAF V600E mutation is a common driver in melanoma." - text: "Insulin receptor signaling is essential for glucose homeostasis." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - gene-recognition - protein-recognition - genomics -

license:apache-2.0
640,281
0

OpenMed-NER-PathologyDetect-TinyMed-135M

--- widget: - text: "Early detection of breast cancer improves survival rates." - text: "The patient exhibited symptoms consistent with Parkinson's disease." - text: "Genetic testing revealed predisposition to Huntington's disease." - text: "Malaria is a life-threatening disease caused by parasites transmitted through mosquito bites." - text: "Multiple sclerosis affects the central nervous system, leading to a range of symptoms." tags: - token-classification - named-entity-recognition - biomedic

license:apache-2.0
586,134
0

OpenMed-NER-ChemicalDetect-ModernMed-395M

--- widget: - text: "The patient was administered acetylsalicylic acid for pain relief." - text: "Treatment with doxorubicin showed significant improvement in tumor regression." - text: "The compound benzylpenicillin demonstrated strong antimicrobial activity." - text: "Further studies are needed to understand the effects of methotrexate on rheumatoid arthritis." - text: "The synthesis of vancomycin remains a significant challenge in organic chemistry." tags: - token-classification - named-entit

license:apache-2.0
582,873
0

OpenMed-NER-DNADetect-SuperMedical-125M

--- widget: - text: "The p53 protein plays a crucial role in tumor suppression." - text: "Expression of BRCA1 gene was significantly upregulated in breast tissue." - text: "The NF-kB pathway regulates inflammatory responses." - text: "Activation of the STAT3 signaling pathway is observed in many cancers." - text: "The experiment involved transfecting HeLa cells with a plasmid containing the target gene." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - pr

license:apache-2.0
580,378
0

OpenMed-NER-SpeciesDetect-ElectraMed-109M

--- widget: - text: "Escherichia coli bacteria were found in the water samples." - text: "The study included specimens from Homo sapiens and Mus musculus." - text: "Saccharomyces cerevisiae is commonly used in biotechnology applications." - text: "The venom of the black mamba, Dendroaspis polylepis, is highly neurotoxic." - text: "Canis lupus familiaris has been domesticated for thousands of years." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - species

license:apache-2.0
577,841
0

OpenMed-NER-PharmaDetect-ModernClinical-149M

--- widget: - text: "Administration of metformin reduced glucose levels significantly." - text: "The study evaluated the efficacy of cisplatin in cancer treatment." - text: "Patients received ibuprofen for inflammation management." - text: "The patient's medication was switched to tamoxifen to prevent breast cancer recurrence." - text: "Lithium carbonate is often prescribed for the management of bipolar disorder." tags: - token-classification - named-entity-recognition - biomedical-nlp - transfo

license:apache-2.0
517,774
3

OpenMed-NER-OrganismDetect-TinyMed-82M

--- widget: - text: "Caenorhabditis elegans is a model organism for genetic studies." - text: "The research focused on Drosophila melanogaster development." - text: "Arabidopsis thaliana serves as a model for plant biology." - text: "The zebrafish, Danio rerio, is widely used for studying vertebrate development." - text: "Neurospora crassa is a type of red bread mold used in genetic research." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - species-recog

license:apache-2.0
511,232
0

OpenMed-NER-BloodCancerDetect-TinyMed-65M

--- widget: - text: "The patient presented with chronic lymphocytic leukemia symptoms." - text: "B-cell proliferation was observed in bone marrow samples." - text: "Treatment with ibrutinib showed promising results." - text: "Flow cytometry confirmed the diagnosis of chronic lymphocytic leukemia." - text: "The patient had del(17p), a high-risk feature in CLL." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - leukemia - hematology - cancer - clinical-medic

license:apache-2.0
509,488
0

OpenMed-NER-GenomicDetect-BigMed-560M

--- widget: - text: "The BRCA2 gene is associated with hereditary breast cancer." - text: "Mutations in the CFTR gene cause cystic fibrosis." - text: "The APOE gene variant affects Alzheimer's disease risk." - text: "The HTT gene provides instructions for making a protein called huntingtin." - text: "Sickle cell disease is caused by a mutation in the HBB gene." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - gene-recognition - genetics - genomics - molec

license:apache-2.0
455,161
1

OpenMed-NER-ChemicalDetect-MultiMed-568M

--- widget: - text: "The patient was administered acetylsalicylic acid for pain relief." - text: "Treatment with doxorubicin showed significant improvement in tumor regression." - text: "The compound benzylpenicillin demonstrated strong antimicrobial activity." - text: "Further studies are needed to understand the effects of methotrexate on rheumatoid arthritis." - text: "The synthesis of vancomycin remains a significant challenge in organic chemistry." tags: - token-classification - named-entit

license:apache-2.0
443,325
1

OpenMed-NER-ProteinDetect-SuperClinical-141M

--- widget: - text: "The Maillard reaction is responsible for the browning of many foods." - text: "Casein micelles are the primary protein component of milk." - text: "Starch gelatinization is a key process in cooking pasta and rice." - text: "Polyphenols in green tea have antioxidant properties." - text: "Omega-3 fatty acids are essential fats found in fish oil." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - protein-interactions - molecular-biology -

license:apache-2.0
438,932
0

OpenMed-NER-ChemicalDetect-BigMed-560M

--- widget: - text: "The patient was administered acetylsalicylic acid for pain relief." - text: "Treatment with doxorubicin showed significant improvement in tumor regression." - text: "The compound benzylpenicillin demonstrated strong antimicrobial activity." - text: "Further studies are needed to understand the effects of methotrexate on rheumatoid arthritis." - text: "The synthesis of vancomycin remains a significant challenge in organic chemistry." tags: - token-classification - named-entit

license:apache-2.0
438,026
0

OpenMed-NER-SpeciesDetect-ModernMed-149M

--- widget: - text: "Escherichia coli bacteria were found in the water samples." - text: "The study included specimens from Homo sapiens and Mus musculus." - text: "Saccharomyces cerevisiae is commonly used in biotechnology applications." - text: "The venom of the black mamba, Dendroaspis polylepis, is highly neurotoxic." - text: "Canis lupus familiaris has been domesticated for thousands of years." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - species

license:apache-2.0
433,121
0

OpenMed-NER-BloodCancerDetect-ElectraMed-560M

--- widget: - text: "The patient presented with chronic lymphocytic leukemia symptoms." - text: "B-cell proliferation was observed in bone marrow samples." - text: "Treatment with ibrutinib showed promising results." - text: "Flow cytometry confirmed the diagnosis of chronic lymphocytic leukemia." - text: "The patient had del(17p), a high-risk feature in CLL." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - leukemia - hematology - cancer - clinical-medic

license:apache-2.0
380,916
1

OpenMed-NER-PharmaDetect-BioPatient-108M

--- widget: - text: "Administration of metformin reduced glucose levels significantly." - text: "The study evaluated the efficacy of cisplatin in cancer treatment." - text: "Patients received ibuprofen for inflammation management." - text: "The patient's medication was switched to tamoxifen to prevent breast cancer recurrence." - text: "Lithium carbonate is often prescribed for the management of bipolar disorder." tags: - token-classification - named-entity-recognition - biomedical-nlp - transfo

license:apache-2.0
379,886
1

OpenMed-NER-ChemicalDetect-SnowMed-568M

--- widget: - text: "The patient was administered acetylsalicylic acid for pain relief." - text: "Treatment with doxorubicin showed significant improvement in tumor regression." - text: "The compound benzylpenicillin demonstrated strong antimicrobial activity." - text: "Further studies are needed to understand the effects of methotrexate on rheumatoid arthritis." - text: "The synthesis of vancomycin remains a significant challenge in organic chemistry." tags: - token-classification - named-entit

license:apache-2.0
376,901
0

OpenMed-NER-OrganismDetect-BioMed-109M

--- widget: - text: "Caenorhabditis elegans is a model organism for genetic studies." - text: "The research focused on Drosophila melanogaster development." - text: "Arabidopsis thaliana serves as a model for plant biology." - text: "The zebrafish, Danio rerio, is widely used for studying vertebrate development." - text: "Neurospora crassa is a type of red bread mold used in genetic research." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - species-recog

license:apache-2.0
366,423
1

OpenMed-NER-GenomicDetect-PubMed-335M

--- widget: - text: "The BRCA2 gene is associated with hereditary breast cancer." - text: "Mutations in the CFTR gene cause cystic fibrosis." - text: "The APOE gene variant affects Alzheimer's disease risk." - text: "The HTT gene provides instructions for making a protein called huntingtin." - text: "Sickle cell disease is caused by a mutation in the HBB gene." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - gene-recognition - genetics - genomics - molec

license:apache-2.0
365,961
0

OpenMed-NER-DiseaseDetect-SuperClinical-434M

--- widget: - text: "The patient was diagnosed with diabetes mellitus type 2." - text: "Symptoms of Alzheimer's disease became apparent over several months." - text: "Treatment for hypertension was initiated immediately." - text: "A possible link between Crohn's disease and gut microbiota is being investigated." - text: "The patient has a family history of cystic fibrosis." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - disease-entity-recognition - medi

license:apache-2.0
305,806
5

OpenMed-NER-GenomicDetect-SnowMed-568M

--- widget: - text: "The BRCA2 gene is associated with hereditary breast cancer." - text: "Mutations in the CFTR gene cause cystic fibrosis." - text: "The APOE gene variant affects Alzheimer's disease risk." - text: "The HTT gene provides instructions for making a protein called huntingtin." - text: "Sickle cell disease is caused by a mutation in the HBB gene." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - gene-recognition - genetics - genomics - molec

license:apache-2.0
296,789
2

OpenMed-NER-OncologyDetect-SuperClinical-434M

--- widget: - text: "Mutations in KRAS gene drive oncogenic transformation." - text: "The tumor suppressor p53 pathway was disrupted." - text: "EGFR amplification promotes cancer cell proliferation." - text: "Loss of function of the PTEN gene is common in many cancers." - text: "The PI3K/AKT/mTOR pathway is a critical regulator of cell growth." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - cancer-genetics - oncology - gene-regulation - cancer-research

license:apache-2.0
245,963
7

OpenMed-NER-PathologyDetect-PubMed-v2-109M

--- widget: - text: "Early detection of breast cancer improves survival rates." - text: "The patient exhibited symptoms consistent with Parkinson's disease." - text: "Genetic testing revealed predisposition to Huntington's disease." - text: "Malaria is a life-threatening disease caused by parasites transmitted through mosquito bites." - text: "Multiple sclerosis affects the central nervous system, leading to a range of symptoms." tags: - token-classification - named-entity-recognition - biomedic

license:apache-2.0
243,406
5

OpenMed-NER-SpeciesDetect-PubMed-335M

--- widget: - text: "Escherichia coli bacteria were found in the water samples." - text: "The study included specimens from Homo sapiens and Mus musculus." - text: "Saccharomyces cerevisiae is commonly used in biotechnology applications." - text: "The venom of the black mamba, Dendroaspis polylepis, is highly neurotoxic." - text: "Canis lupus familiaris has been domesticated for thousands of years." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - species

license:apache-2.0
203,153
2

OpenMed-NER-DiseaseDetect-TinyMed-135M

--- widget: - text: "The patient was diagnosed with diabetes mellitus type 2." - text: "Symptoms of Alzheimer's disease became apparent over several months." - text: "Treatment for hypertension was initiated immediately." - text: "A possible link between Crohn's disease and gut microbiota is being investigated." - text: "The patient has a family history of cystic fibrosis." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - disease-entity-recognition - medi

license:apache-2.0
187,003
0

OpenMed-NER-GenomeDetect-PubMed-109M

--- widget: - text: "The EGFR gene mutation was identified in lung cancer patients." - text: "Overexpression of HER2 protein correlates with poor prognosis." - text: "The TP53 gene encodes a tumor suppressor protein." - text: "The BRAF V600E mutation is a common driver in melanoma." - text: "Insulin receptor signaling is essential for glucose homeostasis." tags: - token-classification - named-entity-recognition - biomedical-nlp - transformers - gene-recognition - protein-recognition - genomics -

license:apache-2.0
184,644
0

OpenMed-NER-PharmaDetect-SuperMedical-125M

--- widget: - text: "Administration of metformin reduced glucose levels significantly." - text: "The study evaluated the efficacy of cisplatin in cancer treatment." - text: "Patients received ibuprofen for inflammation management." - text: "The patient's medication was switched to tamoxifen to prevent breast cancer recurrence." - text: "Lithium carbonate is often prescribed for the management of bipolar disorder." tags: - token-classification - named-entity-recognition - biomedical-nlp - transfo

license:apache-2.0
181,265
4

OpenMed-NER-PathologyDetect-BigMed-560M

--- widget: - text: "Early detection of breast cancer improves survival rates." - text: "The patient exhibited symptoms consistent with Parkinson's disease." - text: "Genetic testing revealed predisposition to Huntington's disease." - text: "Malaria is a life-threatening disease caused by parasites transmitted through mosquito bites." - text: "Multiple sclerosis affects the central nervous system, leading to a range of symptoms." tags: - token-classification - named-entity-recognition - biomedic

license:apache-2.0
177,738
8

OpenMed-NER-PharmaDetect-BigMed-560M

license:apache-2.0
175,364
0

OpenMed-NER-OncologyDetect-ModernMed-395M

license:apache-2.0
171,228
0

OpenMed-NER-ChemicalDetect-PubMed-335M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
166,436
0

OpenMed-NER-AnatomyDetect-ElectraMed-560M

license:apache-2.0
165,310
0

OpenMed-NER-OncologyDetect-SuperMedical-355M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.90` - Precision: `0.89` - Recall: `0.91` - Accuracy: `0.94` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
160,061
2

OpenMed-NER-ProteinDetect-SnowMed-568M

license:apache-2.0
134,583
0

OpenMed-NER-PathologyDetect-PubMed-109M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.90` - Precision: `0.88` - Recall: `0.92` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
115,481
1

OpenMed-NER-OncologyDetect-PubMed-335M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.88` - Precision: `0.88` - Recall: `0.87` - Accuracy: `0.95` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
113,636
0

OpenMed-NER-GenomeDetect-SuperClinical-434M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.90` - Precision: `0.90` - Recall: `0.91` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: deberta-v3-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
97,180
5

OpenMed-NER-DNADetect-SuperClinical-434M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.82` - Precision: `0.78` - Recall: `0.86` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: deberta-v3-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
91,944
0

OpenMed-NER-ChemicalDetect-EuroMed-212M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.91` - Precision: `0.92` - Recall: `0.91` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: EuroBERT-210m - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: EuroBERT-210m - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
85,198
4

OpenMed-NER-OncologyDetect-PubMed-109M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.81` - Precision: `0.80` - Recall: `0.81` - Accuracy: `0.94` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
77,615
0

OpenMed-NER-OrganismDetect-BioMed-335M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species names from the species-800 dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. Current Model Performance - F1 Score: `0.86` - Precision: `0.86` - Recall: `0.87` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OrganismDetect-BioMed-335M | 0.8639 | 0.8557 | 0.8722 | 0.9715 | | 🥈 2 | OpenMed-NER-OrganismDetect-PubMed-335M | 0.8550 | 0.8370 | 0.8737 | 0.9698 | | 🥉 3 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8458 | 0.8287 | 0.8637 | 0.9690 | | 4 | OpenMed-NER-OrganismDetect-MultiMed-335M | 0.8441 | 0.8352 | 0.8532 | 0.9670 | | 5 | OpenMed-NER-OrganismDetect-SuperClinical-434M | 0.8435 | 0.8291 | 0.8585 | 0.9670 | | 6 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8349 | 0.8082 | 0.8634 | 0.9685 | | 7 | OpenMed-NER-OrganismDetect-MultiMed-568M | 0.8313 | 0.8053 | 0.8592 | 0.9703 | | 8 | OpenMed-NER-OrganismDetect-ElectraMed-335M | 0.8288 | 0.8176 | 0.8404 | 0.9631 | | 9 | OpenMed-NER-OrganismDetect-BioPatient-108M | 0.8154 | 0.8140 | 0.8169 | 0.9591 | | 10 | OpenMed-NER-OrganismDetect-ElectraMed-33M | 0.8121 | 0.7772 | 0.8503 | 0.9600 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
76,317
1

OpenMed-NER-AnatomyDetect-PubMed-335M

Specialized model for Anatomical Entity Recognition - Anatomical structures and body parts [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for anatomical entity recognition - anatomical structures and body parts. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated ANATOMY dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Anatomy corpus focuses on anatomical entity recognition for medical terminology and healthcare applications. The Anatomy corpus is a specialized biomedical NER dataset designed for recognizing anatomical entities and medical terminology in clinical and biomedical texts. This corpus contains annotations for anatomical structures, body parts, organs, and physiological systems mentioned in medical literature. It is essential for developing clinical NLP systems, medical education tools, and healthcare informatics applications where accurate anatomical entity identification is crucial. The dataset supports the development of automated systems for medical coding, clinical decision support, and anatomical knowledge extraction from medical records and literature. It serves as a valuable resource for training NER models used in medical imaging, surgical planning, and clinical documentation. Current Model Performance - F1 Score: `0.91` - Precision: `0.90` - Recall: `0.91` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-AnatomyDetect-ElectraMed-560M | 0.9063 | 0.9083 | 0.9044 | 0.9825 | | 🥈 2 | OpenMed-NER-AnatomyDetect-PubMed-335M | 0.9063 | 0.8995 | 0.9131 | 0.9851 | | 🥉 3 | OpenMed-NER-AnatomyDetect-SuperClinical-434M | 0.9024 | 0.9040 | 0.9008 | 0.9836 | | 4 | OpenMed-NER-AnatomyDetect-ElectraMed-335M | 0.9020 | 0.9024 | 0.9016 | 0.9787 | | 5 | OpenMed-NER-AnatomyDetect-MultiMed-568M | 0.9012 | 0.8977 | 0.9048 | 0.9812 | | 6 | OpenMed-NER-AnatomyDetect-PubMed-109M | 0.9004 | 0.8941 | 0.9067 | 0.9844 | | 7 | OpenMed-NER-AnatomyDetect-SuperMedical-355M | 0.9002 | 0.8974 | 0.9029 | 0.9815 | | 8 | OpenMed-NER-AnatomyDetect-BigMed-560M | 0.8980 | 0.9007 | 0.8954 | 0.9814 | | 9 | OpenMed-NER-AnatomyDetect-BioMed-335M | 0.8961 | 0.8941 | 0.8982 | 0.9830 | | 10 | OpenMed-NER-AnatomyDetect-BioClinical-108M | 0.8961 | 0.8960 | 0.8962 | 0.9768 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: ANATOMY - Description: Anatomical Entity Recognition - Anatomical structures and body parts Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
65,977
2

OpenMed-NER-BloodCancerDetect-SuperClinical-434M

🧬 OpenMed-NER-BloodCancerDetect-SuperClinical-434M Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for clinical entity recognition - clinical entities related to chronic lymphocytic leukemia. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. Current Model Performance - F1 Score: `0.89` - Precision: `0.87` - Recall: `0.92` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-BloodCancerDetect-ElectraMed-560M | 0.9575 | 0.9264 | 0.9907 | 0.9843 | | 🥈 2 | OpenMed-NER-BloodCancerDetect-SuperClinical-434M | 0.8902 | 0.8652 | 0.9167 | 0.9701 | | 🥉 3 | OpenMed-NER-BloodCancerDetect-TinyMed-82M | 0.8793 | 0.7904 | 0.9908 | 0.9449 | | 4 | OpenMed-NER-BloodCancerDetect-TinyMed-135M | 0.8792 | 0.8750 | 0.8835 | 0.9668 | | 5 | OpenMed-NER-BloodCancerDetect-TinyMed-65M | 0.8547 | 0.7812 | 0.9434 | 0.9686 | | 6 | OpenMed-NER-BloodCancerDetect-SuperMedical-125M | 0.8488 | 1.0000 | 0.7373 | 0.9274 | | 7 | OpenMed-NER-BloodCancerDetect-SnowMed-568M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 8 | OpenMed-NER-BloodCancerDetect-BigMed-278M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 9 | OpenMed-NER-BloodCancerDetect-SuperMedical-355M | 0.8421 | 0.9816 | 0.7373 | 0.9248 | | 10 | OpenMed-NER-BloodCancerDetect-ElectraMed-335M | 0.8364 | 0.7302 | 0.9787 | 0.9581 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: deberta-v3-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
62,563
0

OpenMed-NER-PharmaDetect-SuperMedical-355M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.96` - Precision: `0.95` - Recall: `0.97` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
61,434
0

OpenMed-NER-DiseaseDetect-PubMed-335M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.91` - Precision: `0.89` - Recall: `0.93` - Accuracy: `0.98` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
60,429
0

OpenMed-NER-OncologyDetect-TinyMed-65M

license:apache-2.0
60,023
2

OpenMed-NER-PharmaDetect-ElectraMed-560M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.96` - Precision: `0.95` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
57,051
0

OpenMed-NER-ChemicalDetect-PubMed-109M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.95` - Precision: `0.94` - Recall: `0.95` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
55,925
1

OpenMed-NER-ProteinDetect-BioMed-335M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.94` - Precision: `0.93` - Recall: `0.95` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
55,900
3

OpenMed-NER-GenomeDetect-EuroMed-212M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.68` - Precision: `0.66` - Recall: `0.69` - Accuracy: `0.92` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: EuroBERT-210m - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: EuroBERT-210m - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
54,784
0

OpenMed-NER-ProteinDetect-EuroMed-212M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.95` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: EuroBERT-210m - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: EuroBERT-210m - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
54,589
0

OpenMed-NER-OncologyDetect-BigMed-560M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.86` - Precision: `0.86` - Recall: `0.85` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: xlm-roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
51,178
2

OpenMed-NER-ProteinDetect-ElectraMed-560M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.96` - Precision: `0.96` - Recall: `0.96` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
51,151
0

OpenMed-NER-DiseaseDetect-SnowMed-568M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.87` - Precision: `0.86` - Recall: `0.89` - Accuracy: `0.98` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
48,251
0

OpenMed-NER-PharmaDetect-ModernMed-395M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.95` - Precision: `0.94` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: ModernBERT-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: ModernBERT-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
45,409
0

OpenMed-NER-AnatomyDetect-TinyMed-135M

Specialized model for Anatomical Entity Recognition - Anatomical structures and body parts [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for anatomical entity recognition - anatomical structures and body parts. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated ANATOMY dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Anatomy corpus focuses on anatomical entity recognition for medical terminology and healthcare applications. The Anatomy corpus is a specialized biomedical NER dataset designed for recognizing anatomical entities and medical terminology in clinical and biomedical texts. This corpus contains annotations for anatomical structures, body parts, organs, and physiological systems mentioned in medical literature. It is essential for developing clinical NLP systems, medical education tools, and healthcare informatics applications where accurate anatomical entity identification is crucial. The dataset supports the development of automated systems for medical coding, clinical decision support, and anatomical knowledge extraction from medical records and literature. It serves as a valuable resource for training NER models used in medical imaging, surgical planning, and clinical documentation. Current Model Performance - F1 Score: `0.87` - Precision: `0.87` - Recall: `0.87` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-AnatomyDetect-ElectraMed-560M | 0.9063 | 0.9083 | 0.9044 | 0.9825 | | 🥈 2 | OpenMed-NER-AnatomyDetect-PubMed-335M | 0.9063 | 0.8995 | 0.9131 | 0.9851 | | 🥉 3 | OpenMed-NER-AnatomyDetect-SuperClinical-434M | 0.9024 | 0.9040 | 0.9008 | 0.9836 | | 4 | OpenMed-NER-AnatomyDetect-ElectraMed-335M | 0.9020 | 0.9024 | 0.9016 | 0.9787 | | 5 | OpenMed-NER-AnatomyDetect-MultiMed-568M | 0.9012 | 0.8977 | 0.9048 | 0.9812 | | 6 | OpenMed-NER-AnatomyDetect-PubMed-109M | 0.9004 | 0.8941 | 0.9067 | 0.9844 | | 7 | OpenMed-NER-AnatomyDetect-SuperMedical-355M | 0.9002 | 0.8974 | 0.9029 | 0.9815 | | 8 | OpenMed-NER-AnatomyDetect-BigMed-560M | 0.8980 | 0.9007 | 0.8954 | 0.9814 | | 9 | OpenMed-NER-AnatomyDetect-BioMed-335M | 0.8961 | 0.8941 | 0.8982 | 0.9830 | | 10 | OpenMed-NER-AnatomyDetect-BioClinical-108M | 0.8961 | 0.8960 | 0.8962 | 0.9768 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: ANATOMY - Description: Anatomical Entity Recognition - Anatomical structures and body parts Training Details - Base Model: distilbert-base-multilingual-cased - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: distilbert-base-multilingual-cased - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
42,116
0

OpenMed-NER-DiseaseDetect-ModernClinical-149M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.88` - Precision: `0.87` - Recall: `0.89` - Accuracy: `0.97` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: BioClinical-ModernBERT-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
42,035
0

OpenMed-NER-AnatomyDetect-SuperClinical-434M

Specialized model for Anatomical Entity Recognition - Anatomical structures and body parts [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for anatomical entity recognition - anatomical structures and body parts. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated ANATOMY dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Anatomy corpus focuses on anatomical entity recognition for medical terminology and healthcare applications. The Anatomy corpus is a specialized biomedical NER dataset designed for recognizing anatomical entities and medical terminology in clinical and biomedical texts. This corpus contains annotations for anatomical structures, body parts, organs, and physiological systems mentioned in medical literature. It is essential for developing clinical NLP systems, medical education tools, and healthcare informatics applications where accurate anatomical entity identification is crucial. The dataset supports the development of automated systems for medical coding, clinical decision support, and anatomical knowledge extraction from medical records and literature. It serves as a valuable resource for training NER models used in medical imaging, surgical planning, and clinical documentation. Current Model Performance - F1 Score: `0.90` - Precision: `0.90` - Recall: `0.90` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-AnatomyDetect-ElectraMed-560M | 0.9063 | 0.9083 | 0.9044 | 0.9825 | | 🥈 2 | OpenMed-NER-AnatomyDetect-PubMed-335M | 0.9063 | 0.8995 | 0.9131 | 0.9851 | | 🥉 3 | OpenMed-NER-AnatomyDetect-SuperClinical-434M | 0.9024 | 0.9040 | 0.9008 | 0.9836 | | 4 | OpenMed-NER-AnatomyDetect-ElectraMed-335M | 0.9020 | 0.9024 | 0.9016 | 0.9787 | | 5 | OpenMed-NER-AnatomyDetect-MultiMed-568M | 0.9012 | 0.8977 | 0.9048 | 0.9812 | | 6 | OpenMed-NER-AnatomyDetect-PubMed-109M | 0.9004 | 0.8941 | 0.9067 | 0.9844 | | 7 | OpenMed-NER-AnatomyDetect-SuperMedical-355M | 0.9002 | 0.8974 | 0.9029 | 0.9815 | | 8 | OpenMed-NER-AnatomyDetect-BigMed-560M | 0.8980 | 0.9007 | 0.8954 | 0.9814 | | 9 | OpenMed-NER-AnatomyDetect-BioMed-335M | 0.8961 | 0.8941 | 0.8982 | 0.9830 | | 10 | OpenMed-NER-AnatomyDetect-BioClinical-108M | 0.8961 | 0.8960 | 0.8962 | 0.9768 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: ANATOMY - Description: Anatomical Entity Recognition - Anatomical structures and body parts Training Details - Base Model: deberta-v3-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
42,027
0

OpenMed-NER-DiseaseDetect-SuperClinical-141M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.89` - Precision: `0.87` - Recall: `0.92` - Accuracy: `0.98` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: deberta-v3-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-small - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
40,973
0

OpenMed-NER-PathologyDetect-SuperClinical-184M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.89` - Precision: `0.86` - Recall: `0.92` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: deberta-v3-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
40,953
0

OpenMed-NER-GenomicDetect-ModernClinical-149M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.97` - Precision: `1.00` - Recall: `0.95` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: BioClinical-ModernBERT-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
39,872
0

OpenMed-NER-PharmaDetect-PubMed-109M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.96` - Precision: `0.94` - Recall: `0.97` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
39,808
0

OpenMed-NER-DiseaseDetect-BioPatient-108M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.90` - Precision: `0.89` - Recall: `0.91` - Accuracy: `0.97` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: BioDischargeSummaryBERT - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioDischargeSummaryBERT - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
39,241
0

OpenMed-NER-PharmaDetect-MultiMed-568M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.95` - Precision: `0.94` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: bge-m3 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-m3 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
38,916
0

OpenMed-NER-ProteinDetect-SuperMedical-355M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.96` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
38,893
0

OpenMed-NER-DiseaseDetect-MultiMed-335M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.90` - Precision: `0.89` - Recall: `0.92` - Accuracy: `0.98` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: bge-large-en-v1.5 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-large-en-v1.5 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
38,834
0

OpenMed-NER-DiseaseDetect-PubMed-v2-109M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.90` - Precision: `0.89` - Recall: `0.91` - Accuracy: `0.98` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
38,662
0

OpenMed-NER-PharmaDetect-BioMed-335M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.95` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
38,638
0

OpenMed-NER-PathologyDetect-PubMed-335M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.91` - Precision: `0.89` - Recall: `0.93` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
38,610
0

OpenMed-NER-ChemicalDetect-SuperMedical-355M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.95` - Precision: `0.94` - Recall: `0.95` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
38,609
0

OpenMed-NER-PathologyDetect-ElectraMed-560M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.90` - Precision: `0.87` - Recall: `0.92` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,364
0

OpenMed-NER-GenomicDetect-BioMed-109M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.99` - Precision: `0.99` - Recall: `0.99` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,334
0

OpenMed-NER-PathologyDetect-ModernClinical-395M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.88` - Precision: `0.87` - Recall: `0.90` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: BioClinical-ModernBERT-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,319
0

OpenMed-NER-OncologyDetect-ElectraMed-109M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.73` - Precision: `0.72` - Recall: `0.74` - Accuracy: `0.90` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: e5-base-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-base-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,291
0

OpenMed-NER-PathologyDetect-SnowMed-568M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.89` - Precision: `0.87` - Recall: `0.91` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,258
0

OpenMed-NER-SpeciesDetect-ElectraMed-560M

Specialized model for Species Entity Recognition - Species and organism names [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species and organism names. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated LINNAEUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Linnaeus corpus is designed for species name identification and taxonomic entity recognition in biomedical literature. The Linnaeus corpus is a specialized biomedical NER dataset focused on species name identification and organism recognition in scientific literature. Named after Carl Linnaeus who established modern taxonomic nomenclature, this corpus contains annotations for species mentions that are normalized to NCBI Taxonomy identifiers. The dataset is crucial for biodiversity informatics, ecological research, and biological literature mining where accurate organism identification is essential. It supports the development of text mining systems for taxonomic studies, species distribution research, and comparative genomics applications. The corpus addresses the challenge of recognizing both scientific names and common names of organisms across diverse biological texts. Current Model Performance - F1 Score: `0.76` - Precision: `0.72` - Recall: `0.82` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-SpeciesDetect-PubMed-335M | 0.9649 | 0.9582 | 0.9718 | 0.9967 | | 🥈 2 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9543 | 0.9422 | 0.9667 | 0.9956 | | 🥉 3 | OpenMed-NER-SpeciesDetect-BioMed-335M | 0.9539 | 0.9441 | 0.9638 | 0.9957 | | 4 | OpenMed-NER-SpeciesDetect-SuperClinical-434M | 0.9534 | 0.9369 | 0.9704 | 0.9959 | | 5 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9502 | 0.9317 | 0.9695 | 0.9951 | | 6 | OpenMed-NER-SpeciesDetect-MultiMed-335M | 0.9479 | 0.9286 | 0.9680 | 0.9955 | | 7 | OpenMed-NER-SpeciesDetect-MultiMed-568M | 0.9460 | 0.9312 | 0.9613 | 0.9957 | | 8 | OpenMed-NER-SpeciesDetect-SuperMedical-355M | 0.9433 | 0.9221 | 0.9655 | 0.9953 | | 9 | OpenMed-NER-SpeciesDetect-SuperClinical-141M | 0.9406 | 0.9290 | 0.9525 | 0.9950 | | 10 | OpenMed-NER-SpeciesDetect-ModernClinical-395M | 0.9385 | 0.9379 | 0.9392 | 0.9940 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: LINNAEUS - Description: Species Entity Recognition - Species and organism names Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,255
0

OpenMed-NER-PathologyDetect-ElectraMed-33M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.82` - Precision: `0.79` - Recall: `0.85` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: e5-small-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-small-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,249
0

OpenMed-NER-OrganismDetect-PubMed-335M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species names from the species-800 dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. Current Model Performance - F1 Score: `0.85` - Precision: `0.84` - Recall: `0.87` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OrganismDetect-BioMed-335M | 0.8639 | 0.8557 | 0.8722 | 0.9715 | | 🥈 2 | OpenMed-NER-OrganismDetect-PubMed-335M | 0.8550 | 0.8370 | 0.8737 | 0.9698 | | 🥉 3 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8458 | 0.8287 | 0.8637 | 0.9690 | | 4 | OpenMed-NER-OrganismDetect-MultiMed-335M | 0.8441 | 0.8352 | 0.8532 | 0.9670 | | 5 | OpenMed-NER-OrganismDetect-SuperClinical-434M | 0.8435 | 0.8291 | 0.8585 | 0.9670 | | 6 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8349 | 0.8082 | 0.8634 | 0.9685 | | 7 | OpenMed-NER-OrganismDetect-MultiMed-568M | 0.8313 | 0.8053 | 0.8592 | 0.9703 | | 8 | OpenMed-NER-OrganismDetect-ElectraMed-335M | 0.8288 | 0.8176 | 0.8404 | 0.9631 | | 9 | OpenMed-NER-OrganismDetect-BioPatient-108M | 0.8154 | 0.8140 | 0.8169 | 0.9591 | | 10 | OpenMed-NER-OrganismDetect-ElectraMed-33M | 0.8121 | 0.7772 | 0.8503 | 0.9600 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
37,248
0

OpenMed-NER-ChemicalDetect-ElectraMed-560M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.95` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
36,918
0

OpenMed-NER-ChemicalDetect-ModernClinical-149M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.93` - Precision: `0.92` - Recall: `0.94` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: BioClinical-ModernBERT-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
36,560
0

OpenMed-NER-GenomeDetect-ModernClinical-149M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.85` - Precision: `0.84` - Recall: `0.86` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: BioClinical-ModernBERT-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,665
0

OpenMed-NER-ChemicalDetect-BioMed-335M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.94` - Precision: `0.94` - Recall: `0.95` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,616
0

OpenMed-NER-DiseaseDetect-TinyMed-82M

license:apache-2.0
35,582
0

OpenMed-NER-PathologyDetect-SuperClinical-141M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.89` - Precision: `0.86` - Recall: `0.92` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: deberta-v3-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-small - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,561
0

OpenMed-NER-GenomicDetect-ElectraMed-560M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.99` - Precision: `1.00` - Recall: `0.99` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,543
0

OpenMed-NER-DiseaseDetect-ElectraMed-560M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.87` - Precision: `0.86` - Recall: `0.88` - Accuracy: `0.97` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,542
0

OpenMed-NER-GenomicDetect-PubMed-v2-109M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `1.00` - Precision: `1.00` - Recall: `1.00` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,537
0

OpenMed-NER-ChemicalDetect-TinyMed-66M

license:apache-2.0
35,519
0

OpenMed-NER-DiseaseDetect-SuperMedical-125M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.88` - Precision: `0.86` - Recall: `0.89` - Accuracy: `0.98` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: roberta-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: roberta-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,486
0

OpenMed-NER-PharmaDetect-PubMed-v2-109M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,480
0

OpenMed-NER-GenomeDetect-ElectraMed-560M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.87` - Precision: `0.86` - Recall: `0.88` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,439
0

OpenMed-NER-GenomicDetect-BioPatient-108M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.99` - Precision: `0.99` - Recall: `0.99` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: BioDischargeSummaryBERT - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioDischargeSummaryBERT - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,415
0

OpenMed-NER-ProteinDetect-TinyMed-135M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.91` - Precision: `0.90` - Recall: `0.92` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: distilbert-base-multilingual-cased - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: distilbert-base-multilingual-cased - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,349
0

OpenMed-NER-DNADetect-ElectraMed-109M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.77` - Precision: `0.73` - Recall: `0.83` - Accuracy: `0.90` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: e5-base-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-base-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,341
0

OpenMed-NER-AnatomyDetect-ModernClinical-149M

Specialized model for Anatomical Entity Recognition - Anatomical structures and body parts [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for anatomical entity recognition - anatomical structures and body parts. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated ANATOMY dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Anatomy corpus focuses on anatomical entity recognition for medical terminology and healthcare applications. The Anatomy corpus is a specialized biomedical NER dataset designed for recognizing anatomical entities and medical terminology in clinical and biomedical texts. This corpus contains annotations for anatomical structures, body parts, organs, and physiological systems mentioned in medical literature. It is essential for developing clinical NLP systems, medical education tools, and healthcare informatics applications where accurate anatomical entity identification is crucial. The dataset supports the development of automated systems for medical coding, clinical decision support, and anatomical knowledge extraction from medical records and literature. It serves as a valuable resource for training NER models used in medical imaging, surgical planning, and clinical documentation. Current Model Performance - F1 Score: `0.86` - Precision: `0.86` - Recall: `0.85` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-AnatomyDetect-ElectraMed-560M | 0.9063 | 0.9083 | 0.9044 | 0.9825 | | 🥈 2 | OpenMed-NER-AnatomyDetect-PubMed-335M | 0.9063 | 0.8995 | 0.9131 | 0.9851 | | 🥉 3 | OpenMed-NER-AnatomyDetect-SuperClinical-434M | 0.9024 | 0.9040 | 0.9008 | 0.9836 | | 4 | OpenMed-NER-AnatomyDetect-ElectraMed-335M | 0.9020 | 0.9024 | 0.9016 | 0.9787 | | 5 | OpenMed-NER-AnatomyDetect-MultiMed-568M | 0.9012 | 0.8977 | 0.9048 | 0.9812 | | 6 | OpenMed-NER-AnatomyDetect-PubMed-109M | 0.9004 | 0.8941 | 0.9067 | 0.9844 | | 7 | OpenMed-NER-AnatomyDetect-SuperMedical-355M | 0.9002 | 0.8974 | 0.9029 | 0.9815 | | 8 | OpenMed-NER-AnatomyDetect-BigMed-560M | 0.8980 | 0.9007 | 0.8954 | 0.9814 | | 9 | OpenMed-NER-AnatomyDetect-BioMed-335M | 0.8961 | 0.8941 | 0.8982 | 0.9830 | | 10 | OpenMed-NER-AnatomyDetect-BioClinical-108M | 0.8961 | 0.8960 | 0.8962 | 0.9768 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: ANATOMY - Description: Anatomical Entity Recognition - Anatomical structures and body parts Training Details - Base Model: BioClinical-ModernBERT-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,325
0

OpenMed-NER-BloodCancerDetect-PubMed-335M

Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for clinical entity recognition - clinical entities related to chronic lymphocytic leukemia. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. Current Model Performance - F1 Score: `0.53` - Precision: `0.48` - Recall: `0.60` - Accuracy: `0.91` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-BloodCancerDetect-ElectraMed-560M | 0.9575 | 0.9264 | 0.9907 | 0.9843 | | 🥈 2 | OpenMed-NER-BloodCancerDetect-SuperClinical-434M | 0.8902 | 0.8652 | 0.9167 | 0.9701 | | 🥉 3 | OpenMed-NER-BloodCancerDetect-TinyMed-82M | 0.8793 | 0.7904 | 0.9908 | 0.9449 | | 4 | OpenMed-NER-BloodCancerDetect-TinyMed-135M | 0.8792 | 0.8750 | 0.8835 | 0.9668 | | 5 | OpenMed-NER-BloodCancerDetect-TinyMed-65M | 0.8547 | 0.7812 | 0.9434 | 0.9686 | | 6 | OpenMed-NER-BloodCancerDetect-SuperMedical-125M | 0.8488 | 1.0000 | 0.7373 | 0.9274 | | 7 | OpenMed-NER-BloodCancerDetect-SnowMed-568M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 8 | OpenMed-NER-BloodCancerDetect-BigMed-278M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 9 | OpenMed-NER-BloodCancerDetect-SuperMedical-355M | 0.8421 | 0.9816 | 0.7373 | 0.9248 | | 10 | OpenMed-NER-BloodCancerDetect-ElectraMed-335M | 0.8364 | 0.7302 | 0.9787 | 0.9581 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,316
0

OpenMed-NER-DNADetect-BioMed-109M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.79` - Precision: `0.74` - Recall: `0.85` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,316
0

OpenMed-NER-DNADetect-TinyMed-82M

license:apache-2.0
35,298
0

OpenMed-NER-PharmaDetect-MultiMed-335M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.96` - Precision: `0.96` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: bge-large-en-v1.5 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-large-en-v1.5 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,290
0

OpenMed-NER-DNADetect-SnowMed-568M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.79` - Precision: `0.75` - Recall: `0.84` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,275
0

OpenMed-NER-PharmaDetect-ElectraMed-33M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.94` - Precision: `0.93` - Recall: `0.95` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: e5-small-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-small-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,268
0

OpenMed-NER-SpeciesDetect-BioMed-335M

Specialized model for Species Entity Recognition - Species and organism names [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species and organism names. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated LINNAEUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Linnaeus corpus is designed for species name identification and taxonomic entity recognition in biomedical literature. The Linnaeus corpus is a specialized biomedical NER dataset focused on species name identification and organism recognition in scientific literature. Named after Carl Linnaeus who established modern taxonomic nomenclature, this corpus contains annotations for species mentions that are normalized to NCBI Taxonomy identifiers. The dataset is crucial for biodiversity informatics, ecological research, and biological literature mining where accurate organism identification is essential. It supports the development of text mining systems for taxonomic studies, species distribution research, and comparative genomics applications. The corpus addresses the challenge of recognizing both scientific names and common names of organisms across diverse biological texts. Current Model Performance - F1 Score: `0.95` - Precision: `0.94` - Recall: `0.96` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-SpeciesDetect-PubMed-335M | 0.9649 | 0.9582 | 0.9718 | 0.9967 | | 🥈 2 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9543 | 0.9422 | 0.9667 | 0.9956 | | 🥉 3 | OpenMed-NER-SpeciesDetect-BioMed-335M | 0.9539 | 0.9441 | 0.9638 | 0.9957 | | 4 | OpenMed-NER-SpeciesDetect-SuperClinical-434M | 0.9534 | 0.9369 | 0.9704 | 0.9959 | | 5 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9502 | 0.9317 | 0.9695 | 0.9951 | | 6 | OpenMed-NER-SpeciesDetect-MultiMed-335M | 0.9479 | 0.9286 | 0.9680 | 0.9955 | | 7 | OpenMed-NER-SpeciesDetect-MultiMed-568M | 0.9460 | 0.9312 | 0.9613 | 0.9957 | | 8 | OpenMed-NER-SpeciesDetect-SuperMedical-355M | 0.9433 | 0.9221 | 0.9655 | 0.9953 | | 9 | OpenMed-NER-SpeciesDetect-SuperClinical-141M | 0.9406 | 0.9290 | 0.9525 | 0.9950 | | 10 | OpenMed-NER-SpeciesDetect-ModernClinical-395M | 0.9385 | 0.9379 | 0.9392 | 0.9940 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: LINNAEUS - Description: Species Entity Recognition - Species and organism names Training Details - Base Model: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,238
0

OpenMed-NER-PharmaDetect-ElectraMed-335M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.96` - Precision: `0.95` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: e5-large-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-large-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,051
0

OpenMed-NER-DiseaseDetect-ElectraMed-33M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.82` - Precision: `0.79` - Recall: `0.85` - Accuracy: `0.96` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: e5-small-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-small-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
35,015
0

OpenMed-NER-PathologyDetect-MultiMed-568M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.88` - Precision: `0.87` - Recall: `0.89` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: bge-m3 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-m3 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,873
0

OpenMed-NER-PathologyDetect-ModernClinical-149M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.86` - Precision: `0.86` - Recall: `0.86` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: BioClinical-ModernBERT-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,858
0

OpenMed-NER-OncologyDetect-PubMed-v2-109M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.86` - Precision: `0.86` - Recall: `0.86` - Accuracy: `0.95` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,846
0

OpenMed-NER-ProteinDetect-BigMed-278M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.95` - Precision: `0.94` - Recall: `0.95` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: xlm-roberta-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,816
0

OpenMed-NER-GenomicDetect-BigMed-278M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.99` - Precision: `1.00` - Recall: `0.99` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: xlm-roberta-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,575
0

OpenMed-NER-AnatomyDetect-BioMed-109M

Specialized model for Anatomical Entity Recognition - Anatomical structures and body parts [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for anatomical entity recognition - anatomical structures and body parts. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated ANATOMY dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Anatomy corpus focuses on anatomical entity recognition for medical terminology and healthcare applications. The Anatomy corpus is a specialized biomedical NER dataset designed for recognizing anatomical entities and medical terminology in clinical and biomedical texts. This corpus contains annotations for anatomical structures, body parts, organs, and physiological systems mentioned in medical literature. It is essential for developing clinical NLP systems, medical education tools, and healthcare informatics applications where accurate anatomical entity identification is crucial. The dataset supports the development of automated systems for medical coding, clinical decision support, and anatomical knowledge extraction from medical records and literature. It serves as a valuable resource for training NER models used in medical imaging, surgical planning, and clinical documentation. Current Model Performance - F1 Score: `0.86` - Precision: `0.86` - Recall: `0.86` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-AnatomyDetect-ElectraMed-560M | 0.9063 | 0.9083 | 0.9044 | 0.9825 | | 🥈 2 | OpenMed-NER-AnatomyDetect-PubMed-335M | 0.9063 | 0.8995 | 0.9131 | 0.9851 | | 🥉 3 | OpenMed-NER-AnatomyDetect-SuperClinical-434M | 0.9024 | 0.9040 | 0.9008 | 0.9836 | | 4 | OpenMed-NER-AnatomyDetect-ElectraMed-335M | 0.9020 | 0.9024 | 0.9016 | 0.9787 | | 5 | OpenMed-NER-AnatomyDetect-MultiMed-568M | 0.9012 | 0.8977 | 0.9048 | 0.9812 | | 6 | OpenMed-NER-AnatomyDetect-PubMed-109M | 0.9004 | 0.8941 | 0.9067 | 0.9844 | | 7 | OpenMed-NER-AnatomyDetect-SuperMedical-355M | 0.9002 | 0.8974 | 0.9029 | 0.9815 | | 8 | OpenMed-NER-AnatomyDetect-BigMed-560M | 0.8980 | 0.9007 | 0.8954 | 0.9814 | | 9 | OpenMed-NER-AnatomyDetect-BioMed-335M | 0.8961 | 0.8941 | 0.8982 | 0.9830 | | 10 | OpenMed-NER-AnatomyDetect-BioClinical-108M | 0.8961 | 0.8960 | 0.8962 | 0.9768 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: ANATOMY - Description: Anatomical Entity Recognition - Anatomical structures and body parts Training Details - Base Model: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,561
0

OpenMed-NER-GenomicDetect-MultiMed-335M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.99` - Precision: `0.98` - Recall: `0.99` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: bge-large-en-v1.5 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-large-en-v1.5 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,558
0

OpenMed-NER-GenomeDetect-SnowMed-568M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.87` - Precision: `0.86` - Recall: `0.88` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,553
0

OpenMed-NER-BloodCancerDetect-SnowMed-568M

Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for clinical entity recognition - clinical entities related to chronic lymphocytic leukemia. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. Current Model Performance - F1 Score: `0.84` - Precision: `0.98` - Recall: `0.74` - Accuracy: `0.94` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-BloodCancerDetect-ElectraMed-560M | 0.9575 | 0.9264 | 0.9907 | 0.9843 | | 🥈 2 | OpenMed-NER-BloodCancerDetect-SuperClinical-434M | 0.8902 | 0.8652 | 0.9167 | 0.9701 | | 🥉 3 | OpenMed-NER-BloodCancerDetect-TinyMed-82M | 0.8793 | 0.7904 | 0.9908 | 0.9449 | | 4 | OpenMed-NER-BloodCancerDetect-TinyMed-135M | 0.8792 | 0.8750 | 0.8835 | 0.9668 | | 5 | OpenMed-NER-BloodCancerDetect-TinyMed-65M | 0.8547 | 0.7812 | 0.9434 | 0.9686 | | 6 | OpenMed-NER-BloodCancerDetect-SuperMedical-125M | 0.8488 | 1.0000 | 0.7373 | 0.9274 | | 7 | OpenMed-NER-BloodCancerDetect-SnowMed-568M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 8 | OpenMed-NER-BloodCancerDetect-BigMed-278M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 9 | OpenMed-NER-BloodCancerDetect-SuperMedical-355M | 0.8421 | 0.9816 | 0.7373 | 0.9248 | | 10 | OpenMed-NER-BloodCancerDetect-ElectraMed-335M | 0.8364 | 0.7302 | 0.9787 | 0.9581 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,550
0

OpenMed-NER-GenomicDetect-SuperMedical-125M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.99` - Precision: `1.00` - Recall: `0.99` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: roberta-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: roberta-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,534
0

OpenMed-NER-ProteinDetect-BioClinical-108M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.91` - Precision: `0.90` - Recall: `0.91` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: BioClinicalBERT - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinicalBERT - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,442
0

OpenMed-NER-ChemicalDetect-BigMed-278M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.94` - Precision: `0.93` - Recall: `0.94` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: xlm-roberta-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,438
0

OpenMed-NER-OrganismDetect-SuperClinical-434M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species names from the species-800 dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. Current Model Performance - F1 Score: `0.84` - Precision: `0.83` - Recall: `0.86` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OrganismDetect-BioMed-335M | 0.8639 | 0.8557 | 0.8722 | 0.9715 | | 🥈 2 | OpenMed-NER-OrganismDetect-PubMed-335M | 0.8550 | 0.8370 | 0.8737 | 0.9698 | | 🥉 3 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8458 | 0.8287 | 0.8637 | 0.9690 | | 4 | OpenMed-NER-OrganismDetect-MultiMed-335M | 0.8441 | 0.8352 | 0.8532 | 0.9670 | | 5 | OpenMed-NER-OrganismDetect-SuperClinical-434M | 0.8435 | 0.8291 | 0.8585 | 0.9670 | | 6 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8349 | 0.8082 | 0.8634 | 0.9685 | | 7 | OpenMed-NER-OrganismDetect-MultiMed-568M | 0.8313 | 0.8053 | 0.8592 | 0.9703 | | 8 | OpenMed-NER-OrganismDetect-ElectraMed-335M | 0.8288 | 0.8176 | 0.8404 | 0.9631 | | 9 | OpenMed-NER-OrganismDetect-BioPatient-108M | 0.8154 | 0.8140 | 0.8169 | 0.9591 | | 10 | OpenMed-NER-OrganismDetect-ElectraMed-33M | 0.8121 | 0.7772 | 0.8503 | 0.9600 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: deberta-v3-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,258
0

OpenMed-NER-OrganismDetect-SuperMedical-355M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species names from the species-800 dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. Current Model Performance - F1 Score: `0.77` - Precision: `0.73` - Recall: `0.83` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OrganismDetect-BioMed-335M | 0.8639 | 0.8557 | 0.8722 | 0.9715 | | 🥈 2 | OpenMed-NER-OrganismDetect-PubMed-335M | 0.8550 | 0.8370 | 0.8737 | 0.9698 | | 🥉 3 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8458 | 0.8287 | 0.8637 | 0.9690 | | 4 | OpenMed-NER-OrganismDetect-MultiMed-335M | 0.8441 | 0.8352 | 0.8532 | 0.9670 | | 5 | OpenMed-NER-OrganismDetect-SuperClinical-434M | 0.8435 | 0.8291 | 0.8585 | 0.9670 | | 6 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8349 | 0.8082 | 0.8634 | 0.9685 | | 7 | OpenMed-NER-OrganismDetect-MultiMed-568M | 0.8313 | 0.8053 | 0.8592 | 0.9703 | | 8 | OpenMed-NER-OrganismDetect-ElectraMed-335M | 0.8288 | 0.8176 | 0.8404 | 0.9631 | | 9 | OpenMed-NER-OrganismDetect-BioPatient-108M | 0.8154 | 0.8140 | 0.8169 | 0.9591 | | 10 | OpenMed-NER-OrganismDetect-ElectraMed-33M | 0.8121 | 0.7772 | 0.8503 | 0.9600 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,121
0

OpenMed-NER-GenomeDetect-SuperClinical-184M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.88` - Precision: `0.86` - Recall: `0.89` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: deberta-v3-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,031
0

OpenMed-NER-BloodCancerDetect-BioMed-335M

Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for clinical entity recognition - clinical entities related to chronic lymphocytic leukemia. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. Current Model Performance - F1 Score: `0.76` - Precision: `0.71` - Recall: `0.83` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-BloodCancerDetect-ElectraMed-560M | 0.9575 | 0.9264 | 0.9907 | 0.9843 | | 🥈 2 | OpenMed-NER-BloodCancerDetect-SuperClinical-434M | 0.8902 | 0.8652 | 0.9167 | 0.9701 | | 🥉 3 | OpenMed-NER-BloodCancerDetect-TinyMed-82M | 0.8793 | 0.7904 | 0.9908 | 0.9449 | | 4 | OpenMed-NER-BloodCancerDetect-TinyMed-135M | 0.8792 | 0.8750 | 0.8835 | 0.9668 | | 5 | OpenMed-NER-BloodCancerDetect-TinyMed-65M | 0.8547 | 0.7812 | 0.9434 | 0.9686 | | 6 | OpenMed-NER-BloodCancerDetect-SuperMedical-125M | 0.8488 | 1.0000 | 0.7373 | 0.9274 | | 7 | OpenMed-NER-BloodCancerDetect-SnowMed-568M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 8 | OpenMed-NER-BloodCancerDetect-BigMed-278M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 9 | OpenMed-NER-BloodCancerDetect-SuperMedical-355M | 0.8421 | 0.9816 | 0.7373 | 0.9248 | | 10 | OpenMed-NER-BloodCancerDetect-ElectraMed-335M | 0.8364 | 0.7302 | 0.9787 | 0.9581 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
34,018
0

OpenMed-NER-SpeciesDetect-TinyMed-66M

license:apache-2.0
33,999
0

OpenMed-NER-GenomeDetect-ElectraMed-33M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.80` - Precision: `0.77` - Recall: `0.84` - Accuracy: `0.94` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: e5-small-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-small-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,951
0

OpenMed-NER-ProteinDetect-BigMed-560M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.96` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: xlm-roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,838
0

OpenMed-NER-PharmaDetect-PubMed-335M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.96` - Precision: `0.95` - Recall: `0.97` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: BiomedNLP-BiomedBERT-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,825
0

OpenMed-NER-GenomicDetect-SuperClinical-141M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. Current Model Performance - F1 Score: `0.99` - Precision: `0.99` - Recall: `0.99` - Accuracy: `1.00` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomicDetect-SnowMed-568M | 0.9976 | 0.9977 | 0.9975 | 0.9989 | | 🥈 2 | OpenMed-NER-GenomicDetect-SuperMedical-355M | 0.9970 | 0.9960 | 0.9981 | 0.9986 | | 🥉 3 | OpenMed-NER-GenomicDetect-BigMed-560M | 0.9968 | 0.9967 | 0.9969 | 0.9986 | | 4 | OpenMed-NER-GenomicDetect-MultiMed-568M | 0.9967 | 0.9974 | 0.9960 | 0.9985 | | 5 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9964 | 0.9957 | 0.9970 | 0.9992 | | 6 | OpenMed-NER-GenomicDetect-PubMed-335M | 0.9963 | 0.9961 | 0.9965 | 0.9991 | | 7 | OpenMed-NER-GenomicDetect-PubMed-109M | 0.9951 | 0.9948 | 0.9953 | 0.9991 | | 8 | OpenMed-NER-GenomicDetect-BioMed-109M | 0.9941 | 0.9934 | 0.9949 | 0.9988 | | 9 | OpenMed-NER-GenomicDetect-TinyMed-82M | 0.9940 | 0.9997 | 0.9884 | 0.9961 | | 10 | OpenMed-NER-GenomicDetect-SuperMedical-125M | 0.9934 | 0.9999 | 0.9870 | 0.9958 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: deberta-v3-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-small - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,811
0

OpenMed-NER-DiseaseDetect-TinyMed-66M

license:apache-2.0
33,811
0

OpenMed-NER-PharmaDetect-SnowMed-568M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.94` - Precision: `0.94` - Recall: `0.95` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,766
0

OpenMed-NER-OncologyDetect-ElectraMed-33M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.00` - Precision: `0.01` - Recall: `0.00` - Accuracy: `0.63` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: e5-small-v2 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: e5-small-v2 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,760
0

OpenMed-NER-ChemicalDetect-SuperClinical-141M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.93` - Precision: `0.92` - Recall: `0.94` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: deberta-v3-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-small - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,738
0

OpenMed-NER-DNADetect-BioMed-335M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.80` - Precision: `0.76` - Recall: `0.85` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-large-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,515
0

OpenMed-NER-PharmaDetect-ModernClinical-395M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - chemical entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. Current Model Performance - F1 Score: `0.95` - Precision: `0.95` - Recall: `0.96` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PharmaDetect-SuperClinical-434M | 0.9614 | 0.9520 | 0.9710 | 0.9892 | | 🥈 2 | OpenMed-NER-PharmaDetect-MultiMed-335M | 0.9610 | 0.9585 | 0.9634 | 0.9871 | | 🥉 3 | OpenMed-NER-PharmaDetect-ElectraMed-335M | 0.9594 | 0.9539 | 0.9649 | 0.9863 | | 4 | OpenMed-NER-PharmaDetect-PubMed-335M | 0.9587 | 0.9521 | 0.9654 | 0.9902 | | 5 | OpenMed-NER-PharmaDetect-SuperMedical-355M | 0.9585 | 0.9520 | 0.9651 | 0.9881 | | 6 | OpenMed-NER-PharmaDetect-BioPatient-108M | 0.9583 | 0.9511 | 0.9656 | 0.9857 | | 7 | OpenMed-NER-PharmaDetect-ElectraMed-560M | 0.9562 | 0.9483 | 0.9642 | 0.9888 | | 8 | OpenMed-NER-PharmaDetect-BioClinical-108M | 0.9560 | 0.9504 | 0.9617 | 0.9849 | | 9 | OpenMed-NER-PharmaDetect-PubMed-109M | 0.9555 | 0.9417 | 0.9697 | 0.9889 | | 10 | OpenMed-NER-PharmaDetect-SuperMedical-125M | 0.9550 | 0.9442 | 0.9662 | 0.9871 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: BioClinical-ModernBERT-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,442
0

OpenMed-NER-SpeciesDetect-TinyMed-82M

license:apache-2.0
33,437
0

OpenMed-NER-SpeciesDetect-ModernMed-395M

Specialized model for Species Entity Recognition - Species and organism names [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species and organism names. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated LINNAEUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Linnaeus corpus is designed for species name identification and taxonomic entity recognition in biomedical literature. The Linnaeus corpus is a specialized biomedical NER dataset focused on species name identification and organism recognition in scientific literature. Named after Carl Linnaeus who established modern taxonomic nomenclature, this corpus contains annotations for species mentions that are normalized to NCBI Taxonomy identifiers. The dataset is crucial for biodiversity informatics, ecological research, and biological literature mining where accurate organism identification is essential. It supports the development of text mining systems for taxonomic studies, species distribution research, and comparative genomics applications. The corpus addresses the challenge of recognizing both scientific names and common names of organisms across diverse biological texts. Current Model Performance - F1 Score: `0.90` - Precision: `0.93` - Recall: `0.87` - Accuracy: `0.99` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-SpeciesDetect-PubMed-335M | 0.9649 | 0.9582 | 0.9718 | 0.9967 | | 🥈 2 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9543 | 0.9422 | 0.9667 | 0.9956 | | 🥉 3 | OpenMed-NER-SpeciesDetect-BioMed-335M | 0.9539 | 0.9441 | 0.9638 | 0.9957 | | 4 | OpenMed-NER-SpeciesDetect-SuperClinical-434M | 0.9534 | 0.9369 | 0.9704 | 0.9959 | | 5 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9502 | 0.9317 | 0.9695 | 0.9951 | | 6 | OpenMed-NER-SpeciesDetect-MultiMed-335M | 0.9479 | 0.9286 | 0.9680 | 0.9955 | | 7 | OpenMed-NER-SpeciesDetect-MultiMed-568M | 0.9460 | 0.9312 | 0.9613 | 0.9957 | | 8 | OpenMed-NER-SpeciesDetect-SuperMedical-355M | 0.9433 | 0.9221 | 0.9655 | 0.9953 | | 9 | OpenMed-NER-SpeciesDetect-SuperClinical-141M | 0.9406 | 0.9290 | 0.9525 | 0.9950 | | 10 | OpenMed-NER-SpeciesDetect-ModernClinical-395M | 0.9385 | 0.9379 | 0.9392 | 0.9940 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: LINNAEUS - Description: Species Entity Recognition - Species and organism names Training Details - Base Model: ModernBERT-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: ModernBERT-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,433
0

OpenMed-NER-DNADetect-ElectraMed-560M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.79` - Precision: `0.75` - Recall: `0.84` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,382
0

OpenMed-NER-ProteinDetect-BioMed-109M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.93` - Precision: `0.92` - Recall: `0.94` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedELECTRA-base-uncased-abstract - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,325
0

OpenMed-NER-GenomeDetect-TinyMed-66M

license:apache-2.0
33,059
0

OpenMed-NER-ChemicalDetect-BioPatient-108M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for chemical entity recognition - identifies chemical compounds and substances in biomedical literature. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. Current Model Performance - F1 Score: `0.94` - Precision: `0.94` - Recall: `0.94` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ChemicalDetect-PubMed-335M | 0.9540 | 0.9498 | 0.9582 | 0.9902 | | 🥈 2 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9490 | 0.9447 | 0.9534 | 0.9891 | | 🥉 3 | OpenMed-NER-ChemicalDetect-PubMed-109M | 0.9487 | 0.9418 | 0.9557 | 0.9892 | | 4 | OpenMed-NER-ChemicalDetect-SnowMed-568M | 0.9485 | 0.9469 | 0.9502 | 0.9891 | | 5 | OpenMed-NER-ChemicalDetect-ElectraMed-560M | 0.9480 | 0.9455 | 0.9505 | 0.9890 | | 6 | OpenMed-NER-ChemicalDetect-SuperClinical-434M | 0.9469 | 0.9427 | 0.9512 | 0.9881 | | 7 | OpenMed-NER-ChemicalDetect-SuperMedical-355M | 0.9462 | 0.9418 | 0.9507 | 0.9875 | | 8 | OpenMed-NER-ChemicalDetect-MultiMed-335M | 0.9460 | 0.9435 | 0.9485 | 0.9857 | | 9 | OpenMed-NER-ChemicalDetect-MultiMed-568M | 0.9459 | 0.9437 | 0.9481 | 0.9885 | | 10 | OpenMed-NER-ChemicalDetect-BigMed-560M | 0.9454 | 0.9376 | 0.9534 | 0.9888 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: BioDischargeSummaryBERT - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioDischargeSummaryBERT - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,047
0

OpenMed-NER-PathologyDetect-BigMed-278M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the ncbi dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. Current Model Performance - F1 Score: `0.87` - Precision: `0.85` - Recall: `0.89` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9110 | 0.8918 | 0.9310 | 0.9792 | | 🥈 2 | OpenMed-NER-PathologyDetect-PubMed-335M | 0.9086 | 0.8913 | 0.9266 | 0.9781 | | 🥉 3 | OpenMed-NER-PathologyDetect-BioMed-335M | 0.9052 | 0.8867 | 0.9244 | 0.9780 | | 4 | OpenMed-NER-PathologyDetect-SuperClinical-434M | 0.9035 | 0.8772 | 0.9314 | 0.9760 | | 5 | OpenMed-NER-PathologyDetect-PubMed-109M | 0.9022 | 0.8825 | 0.9227 | 0.9769 | | 6 | OpenMed-NER-PathologyDetect-ElectraMed-335M | 0.8977 | 0.8884 | 0.9073 | 0.9719 | | 7 | OpenMed-NER-PathologyDetect-ElectraMed-560M | 0.8950 | 0.8749 | 0.9161 | 0.9747 | | 8 | OpenMed-NER-PathologyDetect-MultiMed-335M | 0.8903 | 0.8749 | 0.9063 | 0.9692 | | 9 | OpenMed-NER-PathologyDetect-SnowMed-568M | 0.8903 | 0.8684 | 0.9133 | 0.9731 | | 10 | OpenMed-NER-PathologyDetect-SuperClinical-141M | 0.8894 | 0.8633 | 0.9172 | 0.9744 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: xlm-roberta-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
33,042
1

OpenMed-NER-OncologyDetect-SuperClinical-141M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.77` - Precision: `0.75` - Recall: `0.79` - Accuracy: `0.91` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: deberta-v3-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: deberta-v3-small - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
32,641
0

OpenMed-NER-PathologyDetect-BioMed-335M

license:apache-2.0
32,466
0

OpenMed-NER-AnatomyDetect-SuperClinical-184M

license:apache-2.0
32,422
0

OpenMed-NER-GenomeDetect-SuperMedical-125M

license:apache-2.0
32,385
0

OpenMed-NER-SpeciesDetect-PubMed-109M

license:apache-2.0
32,385
0

OpenMed-NER-DNADetect-SuperClinical-141M

license:apache-2.0
32,377
0

OpenMed-NER-OrganismDetect-SuperClinical-184M

license:apache-2.0
32,342
0

OpenMed-NER-BloodCancerDetect-ModernMed-149M

license:apache-2.0
32,338
0

OpenMed-NER-ChemicalDetect-SuperClinical-184M

license:apache-2.0
32,335
0

OpenMed-NER-DNADetect-ModernClinical-149M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.77` - Precision: `0.73` - Recall: `0.81` - Accuracy: `0.92` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: BioClinical-ModernBERT-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BioClinical-ModernBERT-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
32,314
0

OpenMed-NER-OrganismDetect-TinyMed-135M

license:apache-2.0
32,313
0

OpenMed-NER-BloodCancerDetect-MultiMed-568M

Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for clinical entity recognition - clinical entities related to chronic lymphocytic leukemia. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. Current Model Performance - F1 Score: `0.82` - Precision: `0.90` - Recall: `0.76` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-BloodCancerDetect-ElectraMed-560M | 0.9575 | 0.9264 | 0.9907 | 0.9843 | | 🥈 2 | OpenMed-NER-BloodCancerDetect-SuperClinical-434M | 0.8902 | 0.8652 | 0.9167 | 0.9701 | | 🥉 3 | OpenMed-NER-BloodCancerDetect-TinyMed-82M | 0.8793 | 0.7904 | 0.9908 | 0.9449 | | 4 | OpenMed-NER-BloodCancerDetect-TinyMed-135M | 0.8792 | 0.8750 | 0.8835 | 0.9668 | | 5 | OpenMed-NER-BloodCancerDetect-TinyMed-65M | 0.8547 | 0.7812 | 0.9434 | 0.9686 | | 6 | OpenMed-NER-BloodCancerDetect-SuperMedical-125M | 0.8488 | 1.0000 | 0.7373 | 0.9274 | | 7 | OpenMed-NER-BloodCancerDetect-SnowMed-568M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 8 | OpenMed-NER-BloodCancerDetect-BigMed-278M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 9 | OpenMed-NER-BloodCancerDetect-SuperMedical-355M | 0.8421 | 0.9816 | 0.7373 | 0.9248 | | 10 | OpenMed-NER-BloodCancerDetect-ElectraMed-335M | 0.8364 | 0.7302 | 0.9787 | 0.9581 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: bge-m3 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-m3 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
32,310
0

OpenMed-NER-SpeciesDetect-BigMed-278M

license:apache-2.0
32,299
0

OpenMed-NER-AnatomyDetect-TinyMed-65M

license:apache-2.0
32,288
0

OpenMed-NER-AnatomyDetect-TinyMed-82M

license:apache-2.0
32,280
0

OpenMed-NER-BloodCancerDetect-PubMed-v2-109M

license:apache-2.0
32,269
0

OpenMed-NER-OrganismDetect-ModernMed-395M

license:apache-2.0
32,268
0

OpenMed-NER-ProteinDetect-TinyMed-65M

license:apache-2.0
32,083
0

OpenMed-NER-GenomicDetect-ElectraMed-33M

license:apache-2.0
32,066
0

OpenMed-NER-OrganismDetect-SnowMed-568M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species names from the species-800 dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. Current Model Performance - F1 Score: `0.79` - Precision: `0.76` - Recall: `0.82` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OrganismDetect-BioMed-335M | 0.8639 | 0.8557 | 0.8722 | 0.9715 | | 🥈 2 | OpenMed-NER-OrganismDetect-PubMed-335M | 0.8550 | 0.8370 | 0.8737 | 0.9698 | | 🥉 3 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8458 | 0.8287 | 0.8637 | 0.9690 | | 4 | OpenMed-NER-OrganismDetect-MultiMed-335M | 0.8441 | 0.8352 | 0.8532 | 0.9670 | | 5 | OpenMed-NER-OrganismDetect-SuperClinical-434M | 0.8435 | 0.8291 | 0.8585 | 0.9670 | | 6 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8349 | 0.8082 | 0.8634 | 0.9685 | | 7 | OpenMed-NER-OrganismDetect-MultiMed-568M | 0.8313 | 0.8053 | 0.8592 | 0.9703 | | 8 | OpenMed-NER-OrganismDetect-ElectraMed-335M | 0.8288 | 0.8176 | 0.8404 | 0.9631 | | 9 | OpenMed-NER-OrganismDetect-BioPatient-108M | 0.8154 | 0.8140 | 0.8169 | 0.9591 | | 10 | OpenMed-NER-OrganismDetect-ElectraMed-33M | 0.8121 | 0.7772 | 0.8503 | 0.9600 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
32,033
0

OpenMed-NER-PharmaDetect-ElectraMed-109M

license:apache-2.0
32,029
0

OpenMed-NER-OrganismDetect-TinyMed-66M

license:apache-2.0
32,012
0

OpenMed-NER-DNADetect-BigMed-560M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.81` - Precision: `0.77` - Recall: `0.86` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: xlm-roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
31,971
0

OpenMed-NER-DNADetect-PubMed-109M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.79` - Precision: `0.74` - Recall: `0.84` - Accuracy: `0.93` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
31,965
0

OpenMed-NER-DiseaseDetect-SuperMedical-355M

license:apache-2.0
30,978
2

OpenMed-NER-AnatomyDetect-SuperClinical-141M

license:apache-2.0
30,916
0

OpenMed-NER-ProteinDetect-PubMed-335M

license:apache-2.0
30,899
0

OpenMed-NER-SpeciesDetect-TinyMed-135M

license:apache-2.0
29,587
0

OpenMed-NER-PathologyDetect-TinyMed-82M

license:apache-2.0
29,495
0

OpenMed-NER-BloodCancerDetect-BioClinical-108M

license:apache-2.0
29,168
0

OpenMed-NER-DiseaseDetect-ModernClinical-395M

license:apache-2.0
29,109
0

OpenMed-NER-ProteinDetect-MultiMed-568M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - various biomedical entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-protein` - `B-proteincomplex` - `B-proteinenum` - `B-proteinfamiliyorgroup` - `B-proteinvariant` - `I-protein` - `I-proteincomplex` - `I-proteinenum` - `I-proteinfamiliyorgroup` - `I-proteinvariant` FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. Current Model Performance - F1 Score: `0.96` - Precision: `0.96` - Recall: `0.96` - Accuracy: `0.98` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-ProteinDetect-SnowMed-568M | 0.9609 | 0.9576 | 0.9642 | 0.9803 | | 🥈 2 | OpenMed-NER-ProteinDetect-ElectraMed-560M | 0.9609 | 0.9581 | 0.9636 | 0.9802 | | 🥉 3 | OpenMed-NER-ProteinDetect-MultiMed-568M | 0.9579 | 0.9564 | 0.9595 | 0.9788 | | 4 | OpenMed-NER-ProteinDetect-BigMed-560M | 0.9549 | 0.9520 | 0.9578 | 0.9778 | | 5 | OpenMed-NER-ProteinDetect-SuperMedical-355M | 0.9547 | 0.9517 | 0.9576 | 0.9749 | | 6 | OpenMed-NER-ProteinDetect-EuroMed-212M | 0.9482 | 0.9482 | 0.9482 | 0.9770 | | 7 | OpenMed-NER-ProteinDetect-BigMed-278M | 0.9466 | 0.9434 | 0.9499 | 0.9738 | | 8 | OpenMed-NER-ProteinDetect-SuperMedical-125M | 0.9465 | 0.9423 | 0.9507 | 0.9714 | | 9 | OpenMed-NER-ProteinDetect-SuperClinical-434M | 0.9412 | 0.9351 | 0.9474 | 0.9802 | | 10 | OpenMed-NER-ProteinDetect-TinyMed-82M | 0.9398 | 0.9331 | 0.9467 | 0.9680 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: bge-m3 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-m3 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
26,934
1

OpenMed-NER-ProteinDetect-PubMed-109M

license:apache-2.0
25,962
0

OpenMed-NER-SpeciesDetect-BigMed-560M

Specialized model for Species Entity Recognition - Species and organism names [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species and organism names. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated LINNAEUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Linnaeus corpus is designed for species name identification and taxonomic entity recognition in biomedical literature. The Linnaeus corpus is a specialized biomedical NER dataset focused on species name identification and organism recognition in scientific literature. Named after Carl Linnaeus who established modern taxonomic nomenclature, this corpus contains annotations for species mentions that are normalized to NCBI Taxonomy identifiers. The dataset is crucial for biodiversity informatics, ecological research, and biological literature mining where accurate organism identification is essential. It supports the development of text mining systems for taxonomic studies, species distribution research, and comparative genomics applications. The corpus addresses the challenge of recognizing both scientific names and common names of organisms across diverse biological texts. Current Model Performance - F1 Score: `0.72` - Precision: `0.66` - Recall: `0.79` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-SpeciesDetect-PubMed-335M | 0.9649 | 0.9582 | 0.9718 | 0.9967 | | 🥈 2 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9543 | 0.9422 | 0.9667 | 0.9956 | | 🥉 3 | OpenMed-NER-SpeciesDetect-BioMed-335M | 0.9539 | 0.9441 | 0.9638 | 0.9957 | | 4 | OpenMed-NER-SpeciesDetect-SuperClinical-434M | 0.9534 | 0.9369 | 0.9704 | 0.9959 | | 5 | OpenMed-NER-SpeciesDetect-PubMed-109M | 0.9502 | 0.9317 | 0.9695 | 0.9951 | | 6 | OpenMed-NER-SpeciesDetect-MultiMed-335M | 0.9479 | 0.9286 | 0.9680 | 0.9955 | | 7 | OpenMed-NER-SpeciesDetect-MultiMed-568M | 0.9460 | 0.9312 | 0.9613 | 0.9957 | | 8 | OpenMed-NER-SpeciesDetect-SuperMedical-355M | 0.9433 | 0.9221 | 0.9655 | 0.9953 | | 9 | OpenMed-NER-SpeciesDetect-SuperClinical-141M | 0.9406 | 0.9290 | 0.9525 | 0.9950 | | 10 | OpenMed-NER-SpeciesDetect-ModernClinical-395M | 0.9385 | 0.9379 | 0.9392 | 0.9940 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: LINNAEUS - Description: Species Entity Recognition - Species and organism names Training Details - Base Model: xlm-roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
25,913
1

OpenMed-NER-OncologyDetect-SnowMed-568M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.88` - Precision: `0.88` - Recall: `0.88` - Accuracy: `0.94` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: snowflake-arctic-embed-l-v2.0 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: snowflake-arctic-embed-l-v2.0 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
25,880
0

OpenMed-NER-GenomeDetect-BioMed-335M

license:apache-2.0
25,855
0

OpenMed-NER-GenomeDetect-BigMed-278M

license:apache-2.0
23,795
0

OpenMed-NER-ChemicalDetect-ElectraMed-109M

license:apache-2.0
23,757
0

OpenMed-NER-BloodCancerDetect-SuperMedical-125M

license:apache-2.0
23,744
0

OpenMed-NER-PharmaDetect-ModernMed-149M

license:apache-2.0
23,716
0

OpenMed-NER-BloodCancerDetect-ElectraMed-109M

license:apache-2.0
23,715
0

OpenMed-NER-GenomeDetect-TinyMed-82M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.85` - Precision: `0.82` - Recall: `0.87` - Accuracy: `0.95` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: distilroberta-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: distilroberta-base - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
23,715
0

OpenMed-NER-DiseaseDetect-SuperClinical-184M

license:apache-2.0
23,697
0

OpenMed-NER-GenomicDetect-SuperClinical-184M

license:apache-2.0
23,667
0

OpenMed-NER-DNADetect-SuperMedical-355M

license:apache-2.0
23,650
0

OpenMed-NER-DNADetect-EuroMed-212M

license:apache-2.0
23,239
0

OpenMed-NER-OncologyDetect-ElectraMed-560M

Specialized model for Cancer Genetics - Cancer-related genetic entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for cancer genetics - cancer-related genetic entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BIONLP2013CG dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-Aminoacid` - `B-Anatomicalsystem` - `B-Cancer` - `B-Cell` - `B-Cellularcomponent` - `B-Developinganatomicalstructure` - `B-Geneorgeneproduct` - `B-Immaterialanatomicalentity` - `B-Multi-tissuestructure` - `B-Organ` - `B-Organism` - `B-Organismsubdivision` - `B-Organismsubstance` - `B-Pathologicalformation` - `B-Simplechemical` - `B-Tissue` - `I-Aminoacid` - `I-Anatomicalsystem` - `I-Cancer` - `I-Cell` - `I-Cellularcomponent` - `I-Developinganatomicalstructure` - `I-Geneorgeneproduct` - `I-Immaterialanatomicalentity` - `I-Multi-tissuestructure` - `I-Organ` - `I-Organism` - `I-Organismsubdivision` - `I-Organismsubstance` - `I-Pathologicalformation` - `I-Simplechemical` - `I-Tissue` BioNLP 2013 CG corpus targets cancer genetics entities for oncology research and cancer genomics. The BioNLP 2013 CG (Cancer Genetics) corpus is a specialized dataset focusing on cancer genetics entities and gene regulation in oncology research. This corpus contains annotations for genes, proteins, and molecular processes specifically related to cancer biology and tumor genetics. Developed for the BioNLP Shared Task 2013, it supports the development of text mining systems for cancer research, oncological studies, and precision medicine applications. The dataset is particularly valuable for identifying cancer-related biomarkers, tumor suppressor genes, oncogenes, and therapeutic targets mentioned in cancer research literature. It serves as a benchmark for evaluating NER systems used in cancer genomics, personalized medicine, and oncology informatics. Current Model Performance - F1 Score: `0.88` - Precision: `0.88` - Recall: `0.89` - Accuracy: `0.94` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OncologyDetect-SuperMedical-355M | 0.8990 | 0.8926 | 0.9056 | 0.9416 | | 🥈 2 | OpenMed-NER-OncologyDetect-ElectraMed-560M | 0.8841 | 0.8788 | 0.8895 | 0.9390 | | 🥉 3 | OpenMed-NER-OncologyDetect-SnowMed-568M | 0.8801 | 0.8774 | 0.8828 | 0.9366 | | 4 | OpenMed-NER-OncologyDetect-PubMed-335M | 0.8782 | 0.8834 | 0.8730 | 0.9539 | | 5 | OpenMed-NER-OncologyDetect-MultiMed-568M | 0.8766 | 0.8749 | 0.8784 | 0.9351 | | 6 | OpenMed-NER-OncologyDetect-SuperClinical-434M | 0.8684 | 0.8602 | 0.8768 | 0.9495 | | 7 | OpenMed-NER-OncologyDetect-BioMed-335M | 0.8660 | 0.8540 | 0.8783 | 0.9516 | | 8 | OpenMed-NER-OncologyDetect-PubMed-109M | 0.8606 | 0.8604 | 0.8608 | 0.9503 | | 9 | OpenMed-NER-OncologyDetect-BigMed-560M | 0.8556 | 0.8582 | 0.8530 | 0.9250 | | 10 | OpenMed-NER-OncologyDetect-ModernClinical-395M | 0.8471 | 0.8465 | 0.8476 | 0.9411 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BIONLP2013CG - Description: Cancer Genetics - Cancer-related genetic entities Training Details - Base Model: multilingual-e5-large-instruct - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: multilingual-e5-large-instruct - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
23,210
0

OpenMed-NER-GenomicDetect-ElectraMed-109M

license:apache-2.0
23,172
0

OpenMed-NER-ProteinDetect-SuperMedical-125M

license:apache-2.0
23,140
0

OpenMed-NER-ProteinDetect-TinyMed-66M

license:apache-2.0
23,131
0

OpenMed-NER-GenomicDetect-SuperMedical-355M

license:apache-2.0
23,018
0

OpenMed-NER-ProteinDetect-ModernMed-149M

license:apache-2.0
22,972
0

OpenMed-NER-SpeciesDetect-BioClinical-108M

license:apache-2.0
22,717
0

OpenMed-NER-OncologyDetect-ModernClinical-395M

license:apache-2.0
22,650
0

OpenMed-NER-BloodCancerDetect-SuperMedical-355M

license:apache-2.0
22,628
0

OpenMed-NER-OncologyDetect-TinyMed-135M

license:apache-2.0
22,616
0

OpenMed-NER-PathologyDetect-ElectraMed-335M

license:apache-2.0
22,581
0

OpenMed-NER-AnatomyDetect-BioMed-335M

license:apache-2.0
21,956
0

OpenMed-NER-AnatomyDetect-BioClinical-108M

license:apache-2.0
21,900
0

OpenMed-NER-AnatomyDetect-ModernClinical-395M

license:apache-2.0
21,880
0

OpenMed-NER-BloodCancerDetect-ModernMed-395M

license:apache-2.0
21,880
0

OpenMed-NER-SpeciesDetect-BioMed-109M

license:apache-2.0
21,877
0

OpenMed-NER-DiseaseDetect-ElectraMed-335M

license:apache-2.0
21,800
0

OpenMed-NER-ChemicalDetect-SuperMedical-125M

license:apache-2.0
21,795
0

OpenMed-NER-AnatomyDetect-BigMed-278M

license:apache-2.0
21,790
0

OpenMed-NER-OncologyDetect-ModernClinical-149M

license:apache-2.0
21,747
0

OpenMed-NER-GenomeDetect-BigMed-560M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for gene/protein entity recognition - gene and protein mentions. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. Current Model Performance - F1 Score: `0.88` - Precision: `0.87` - Recall: `0.88` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-GenomeDetect-SuperClinical-434M | 0.9010 | 0.8954 | 0.9066 | 0.9683 | | 🥈 2 | OpenMed-NER-GenomeDetect-PubMed-335M | 0.8963 | 0.8924 | 0.9002 | 0.9719 | | 🥉 3 | OpenMed-NER-GenomeDetect-BioMed-335M | 0.8943 | 0.8887 | 0.8999 | 0.9704 | | 4 | OpenMed-NER-GenomeDetect-MultiMed-335M | 0.8905 | 0.8870 | 0.8940 | 0.9631 | | 5 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8894 | 0.8850 | 0.8937 | 0.9706 | | 6 | OpenMed-NER-GenomeDetect-BioPatient-108M | 0.8865 | 0.8850 | 0.8881 | 0.9590 | | 7 | OpenMed-NER-GenomeDetect-SuperMedical-355M | 0.8852 | 0.8802 | 0.8902 | 0.9668 | | 8 | OpenMed-NER-GenomeDetect-BioClinical-108M | 0.8851 | 0.8767 | 0.8937 | 0.9582 | | 9 | OpenMed-NER-GenomeDetect-MultiMed-568M | 0.8834 | 0.8770 | 0.8898 | 0.9671 | | 10 | OpenMed-NER-GenomeDetect-PubMed-109M | 0.8833 | 0.8781 | 0.8886 | 0.9706 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: xlm-roberta-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: xlm-roberta-large - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
21,723
0

OpenMed-NER-OncologyDetect-BioClinical-108M

license:apache-2.0
20,695
0

OpenMed-NER-DNADetect-PubMed-v2-109M

license:apache-2.0
20,688
0

OpenMed-NER-OncologyDetect-BioPatient-108M

license:apache-2.0
20,666
0

OpenMed-NER-DNADetect-MultiMed-568M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for biomedical entity recognition - proteins, dna, rna, cell lines, and cell types. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: - `B-DNA` - `B-RNA` - `B-cellline` - `B-celltype` - `B-protein` - `I-DNA` - `I-RNA` - `I-cellline` - `I-celltype` - `I-protein` JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. Current Model Performance - F1 Score: `0.82` - Precision: `0.78` - Recall: `0.86` - Accuracy: `0.94` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DNADetect-SuperClinical-434M | 0.8188 | 0.7778 | 0.8643 | 0.9320 | | 🥈 2 | OpenMed-NER-DNADetect-SuperMedical-355M | 0.8177 | 0.7716 | 0.8697 | 0.9318 | | 🥉 3 | OpenMed-NER-DNADetect-MultiMed-568M | 0.8157 | 0.7758 | 0.8599 | 0.9354 | | 4 | OpenMed-NER-DNADetect-BigMed-560M | 0.8134 | 0.7723 | 0.8591 | 0.9346 | | 5 | OpenMed-NER-DNADetect-BioClinical-108M | 0.8071 | 0.7632 | 0.8562 | 0.9147 | | 6 | OpenMed-NER-DNADetect-MultiMed-335M | 0.8069 | 0.7642 | 0.8547 | 0.9185 | | 7 | OpenMed-NER-DNADetect-PubMed-335M | 0.8056 | 0.7611 | 0.8556 | 0.9344 | | 8 | OpenMed-NER-DNADetect-SuperClinical-184M | 0.8053 | 0.7548 | 0.8630 | 0.9259 | | 9 | OpenMed-NER-DNADetect-BioPatient-108M | 0.8052 | 0.7605 | 0.8555 | 0.9137 | | 10 | OpenMed-NER-DNADetect-SuperMedical-125M | 0.8044 | 0.7589 | 0.8557 | 0.9284 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: bge-m3 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-m3 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
20,632
0

OpenMed-NER-PathologyDetect-ModernMed-395M

license:apache-2.0
20,517
0

OpenMed-NER-AnatomyDetect-ElectraMed-335M

license:apache-2.0
20,512
0

OpenMed-NER-OncologyDetect-BioMed-335M

license:apache-2.0
20,502
0

OpenMed-NER-OrganismDetect-MultiMed-568M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for species entity recognition - species names from the species-800 dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. Current Model Performance - F1 Score: `0.83` - Precision: `0.81` - Recall: `0.86` - Accuracy: `0.97` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-OrganismDetect-BioMed-335M | 0.8639 | 0.8557 | 0.8722 | 0.9715 | | 🥈 2 | OpenMed-NER-OrganismDetect-PubMed-335M | 0.8550 | 0.8370 | 0.8737 | 0.9698 | | 🥉 3 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8458 | 0.8287 | 0.8637 | 0.9690 | | 4 | OpenMed-NER-OrganismDetect-MultiMed-335M | 0.8441 | 0.8352 | 0.8532 | 0.9670 | | 5 | OpenMed-NER-OrganismDetect-SuperClinical-434M | 0.8435 | 0.8291 | 0.8585 | 0.9670 | | 6 | OpenMed-NER-OrganismDetect-PubMed-109M | 0.8349 | 0.8082 | 0.8634 | 0.9685 | | 7 | OpenMed-NER-OrganismDetect-MultiMed-568M | 0.8313 | 0.8053 | 0.8592 | 0.9703 | | 8 | OpenMed-NER-OrganismDetect-ElectraMed-335M | 0.8288 | 0.8176 | 0.8404 | 0.9631 | | 9 | OpenMed-NER-OrganismDetect-BioPatient-108M | 0.8154 | 0.8140 | 0.8169 | 0.9591 | | 10 | OpenMed-NER-OrganismDetect-ElectraMed-33M | 0.8121 | 0.7772 | 0.8503 | 0.9600 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: bge-m3 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: bge-m3 - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
20,477
0

OpenMed-NER-ProteinDetect-ElectraMed-335M

license:apache-2.0
20,476
0

OpenMed-NER-GenomeDetect-ElectraMed-109M

license:apache-2.0
20,467
0

OpenMed-NER-DiseaseDetect-TinyMed-65M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for disease entity recognition - disease entities from the bc5cdr dataset. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. Current Model Performance - F1 Score: `0.88` - Precision: `0.86` - Recall: `0.89` - Accuracy: `0.97` 🏆 Comparative Performance on BC5CDRDISEASE Dataset | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-DiseaseDetect-SuperClinical-434M | 0.9118 | 0.9028 | 0.9211 | 0.9839 | | 🥈 2 | OpenMed-NER-DiseaseDetect-PubMed-335M | 0.9097 | 0.8932 | 0.9268 | 0.9849 | | 🥉 3 | OpenMed-NER-DiseaseDetect-MultiMed-335M | 0.9022 | 0.8890 | 0.9159 | 0.9758 | | 4 | OpenMed-NER-DiseaseDetect-BioMed-335M | 0.9005 | 0.8887 | 0.9126 | 0.9838 | | 5 | OpenMed-NER-DiseaseDetect-BioClinical-108M | 0.8999 | 0.8862 | 0.9140 | 0.9723 | | 6 | OpenMed-NER-DiseaseDetect-PubMed-109M | 0.8994 | 0.8899 | 0.9091 | 0.9839 | | 7 | OpenMed-NER-DiseaseDetect-BioPatient-108M | 0.8991 | 0.8864 | 0.9121 | 0.9721 | | 8 | OpenMed-NER-DiseaseDetect-SuperClinical-184M | 0.8943 | 0.8687 | 0.9214 | 0.9812 | | 9 | OpenMed-NER-DiseaseDetect-SuperClinical-141M | 0.8921 | 0.8686 | 0.9170 | 0.9809 | | 10 | OpenMed-NER-DiseaseDetect-MultiMed-568M | 0.8909 | 0.8803 | 0.9017 | 0.9776 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: distilbert-base-cased - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: distilbert-base-cased - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
20,446
0

OpenMed-NER-AnatomyDetect-ElectraMed-33M

license:apache-2.0
20,420
0

OpenMed-NER-PathologyDetect-ModernMed-149M

license:apache-2.0
19,449
0

OpenMed-NER-GenomicDetect-ModernClinical-395M

license:apache-2.0
19,434
0

OpenMed-NER-DiseaseDetect-PubMed-109M

license:apache-2.0
19,426
0

OpenMed-NER-GenomeDetect-PubMed-335M

license:apache-2.0
19,426
0

OpenMed-NER-GenomeDetect-ElectraMed-335M

license:apache-2.0
19,419
0

OpenMed-NER-BloodCancerDetect-ElectraMed-33M

license:apache-2.0
19,414
0

OpenMed-NER-AnatomyDetect-MultiMed-335M

license:apache-2.0
19,412
0

OpenMed-NER-DiseaseDetect-BioMed-109M

license:apache-2.0
19,409
0

OpenMed-NER-GenomicDetect-ElectraMed-335M

license:apache-2.0
19,407
0

OpenMed-NER-OrganismDetect-BioClinical-108M

license:apache-2.0
19,404
0

OpenMed-NER-DiseaseDetect-ModernMed-149M

license:apache-2.0
19,403
0

OpenMed-NER-GenomeDetect-MultiMed-335M

license:apache-2.0
19,402
0

OpenMed-NER-OrganismDetect-ElectraMed-109M

license:apache-2.0
19,401
0

OpenMed-NER-OrganismDetect-ModernMed-149M

license:apache-2.0
19,391
0

OpenMed-NER-GenomicDetect-MultiMed-568M

license:apache-2.0
19,368
0

OpenMed-NER-BloodCancerDetect-TinyMed-66M

Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) This model is a state-of-the-art fine-tuned transformer engineered to deliver enterprise-grade accuracy for clinical entity recognition - clinical entities related to chronic lymphocytic leukemia. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as drug interaction detection, medication extraction from patient records, adverse event monitoring, literature mining for drug discovery, and biomedical knowledge graph construction with production-ready reliability for clinical and research applications. 🎯 Key Features - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Trained on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem This model can identify and classify the following biomedical entities: CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. Current Model Performance - F1 Score: `0.80` - Precision: `0.73` - Recall: `0.88` - Accuracy: `0.96` | Rank | Model | F1 Score | Precision | Recall | Accuracy | |------|-------|----------|-----------|--------|-----------| | 🥇 1 | OpenMed-NER-BloodCancerDetect-ElectraMed-560M | 0.9575 | 0.9264 | 0.9907 | 0.9843 | | 🥈 2 | OpenMed-NER-BloodCancerDetect-SuperClinical-434M | 0.8902 | 0.8652 | 0.9167 | 0.9701 | | 🥉 3 | OpenMed-NER-BloodCancerDetect-TinyMed-82M | 0.8793 | 0.7904 | 0.9908 | 0.9449 | | 4 | OpenMed-NER-BloodCancerDetect-TinyMed-135M | 0.8792 | 0.8750 | 0.8835 | 0.9668 | | 5 | OpenMed-NER-BloodCancerDetect-TinyMed-65M | 0.8547 | 0.7812 | 0.9434 | 0.9686 | | 6 | OpenMed-NER-BloodCancerDetect-SuperMedical-125M | 0.8488 | 1.0000 | 0.7373 | 0.9274 | | 7 | OpenMed-NER-BloodCancerDetect-SnowMed-568M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 8 | OpenMed-NER-BloodCancerDetect-BigMed-278M | 0.8443 | 0.9816 | 0.7407 | 0.9372 | | 9 | OpenMed-NER-BloodCancerDetect-SuperMedical-355M | 0.8421 | 0.9816 | 0.7373 | 0.9248 | | 10 | OpenMed-NER-BloodCancerDetect-ElectraMed-335M | 0.8364 | 0.7302 | 0.9787 | 0.9581 | Rankings based on F1-score performance across all models trained on this dataset. Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets. NOTE: The `aggregationstrategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the Hugging Face documentation. Here is a summary of the available strategies: - `none`: Returns raw token predictions without any aggregation. - `simple`: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`). - `first`: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word. - `average`: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score. - `max`: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word. For efficient processing of large datasets, use proper batching with the `batchsize` parameter: Batch Size Guidelines: - CPU: Start with batchsize=1-4 - Single GPU: Try batchsize=8-32 depending on GPU memory - High-end GPU: Can handle batchsize=64 or higher - Monitor GPU utilization to find the optimal batch size for your hardware - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: distilbert-base-uncased - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set - Base Architecture: distilbert-base-uncased - Task: Token Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Tokenized biomedical text - Output: BIO-tagged entity predictions This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research Licensed under the Apache License 2.0. See LICENSE for details. We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
19,363
0

OpenMed-NER-PharmaDetect-EuroMed-212M

license:apache-2.0
18,713
0

OpenMed-NER-PharmaDetect-BioClinical-108M

license:apache-2.0
18,704
0

OpenMed-NER-ChemicalDetect-SuperClinical-434M

license:apache-2.0
18,700
0

OpenMed-NER-PathologyDetect-EuroMed-212M

license:apache-2.0
18,694
0

OpenMed-NER-BloodCancerDetect-TinyMed-82M

license:apache-2.0
18,685
0

OpenMed-NER-GenomicDetect-TinyMed-82M

license:apache-2.0
18,616
0

OpenMed-NER-GenomeDetect-SuperMedical-355M

license:apache-2.0
18,587
0

OpenMed-NER-DNADetect-ModernMed-149M

license:apache-2.0
18,574
0

OpenMed-NER-PathologyDetect-SuperMedical-355M

license:apache-2.0
18,546
0

OpenMed-NER-DNADetect-TinyMed-66M

license:apache-2.0
18,531
0

OpenMed-NER-BloodCancerDetect-BigMed-278M

license:apache-2.0
18,521
1

OpenMed-NER-DiseaseDetect-MultiMed-568M

license:apache-2.0
18,520
0

OpenMed-NER-BloodCancerDetect-SuperClinical-184M

license:apache-2.0
18,468
0

OpenMed-NER-PathologyDetect-SuperClinical-434M

license:apache-2.0
18,375
0

OpenMed-NER-DNADetect-BioClinical-108M

license:apache-2.0
18,309
0

OpenMed-NER-OrganismDetect-ElectraMed-560M

license:apache-2.0
18,286
0

OpenMed-NER-AnatomyDetect-PubMed-109M

license:apache-2.0
18,245
0

OpenMed-NER-GenomicDetect-ModernMed-395M

license:apache-2.0
17,449
0

OpenMed-NER-ProteinDetect-TinyMed-82M

license:apache-2.0
17,438
0

OpenMed-NER-DiseaseDetect-BioClinical-108M

license:apache-2.0
17,425
0

OpenMed-NER-AnatomyDetect-BigMed-560M

license:apache-2.0
17,422
0

OpenMed-NER-OrganismDetect-ModernClinical-395M

license:apache-2.0
17,389
0

OpenMed-NER-DiseaseDetect-BigMed-560M

license:apache-2.0
17,389
0

OpenMed-NER-AnatomyDetect-PubMed-v2-109M

license:apache-2.0
17,323
0

OpenMed-NER-ChemicalDetect-ModernClinical-395M

license:apache-2.0
17,316
0

OpenMed-NER-AnatomyDetect-EuroMed-212M

license:apache-2.0
17,278
0

OpenMed-NER-PathologyDetect-MultiMed-335M

license:apache-2.0
17,273
0

OpenMed-NER-ChemicalDetect-TinyMed-82M

license:apache-2.0
17,268
0

OpenMed-NER-AnatomyDetect-ModernMed-149M

license:apache-2.0
17,266
0

OpenMed-NER-OrganismDetect-MultiMed-335M

license:apache-2.0
17,252
0

OpenMed-NER-ProteinDetect-SuperClinical-434M

license:apache-2.0
17,238
0

OpenMed-NER-GenomicDetect-TinyMed-66M

license:apache-2.0
17,219
0

OpenMed-NER-DiseaseDetect-ModernMed-395M

license:apache-2.0
16,207
0

OpenMed-NER-ProteinDetect-PubMed-v2-109M

license:apache-2.0
16,204
0

OpenMed-NER-ChemicalDetect-MultiMed-335M

license:apache-2.0
16,203
0

OpenMed-NER-ChemicalDetect-BioMed-109M

license:apache-2.0
16,203
0

OpenMed-NER-AnatomyDetect-TinyMed-66M

license:apache-2.0
16,203
0

OpenMed-NER-OncologyDetect-TinyMed-66M

license:apache-2.0
16,192
0

OpenMed-NER-ProteinDetect-MultiMed-335M

license:apache-2.0
16,191
0

OpenMed-NER-DNADetect-MultiMed-335M

license:apache-2.0
16,182
0

OpenMed-NER-SpeciesDetect-MultiMed-335M

license:apache-2.0
16,179
0

OpenMed-NER-BloodCancerDetect-MultiMed-335M

license:apache-2.0
16,179
0

OpenMed-NER-BloodCancerDetect-ElectraMed-335M

license:apache-2.0
16,166
0

OpenMed-NER-PathologyDetect-ElectraMed-109M

license:apache-2.0
16,163
0

OpenMed-NER-AnatomyDetect-SuperMedical-355M

license:apache-2.0
16,158
0

OpenMed-NER-OncologyDetect-EuroMed-212M

license:apache-2.0
16,152
0

OpenMed-NER-GenomeDetect-ModernClinical-395M

license:apache-2.0
16,143
3

OpenMed-NER-DNADetect-TinyMed-65M

license:apache-2.0
16,138
0

OpenMed-NER-BloodCancerDetect-BigMed-560M

license:apache-2.0
16,111
0

OpenMed-NER-GenomicDetect-TinyMed-65M

license:apache-2.0
15,502
0

OpenMed-NER-BloodCancerDetect-BioMed-109M

license:apache-2.0
15,488
0

OpenMed-NER-GenomicDetect-BioMed-335M

license:apache-2.0
15,477
0

OpenMed-NER-OrganismDetect-PubMed-109M

license:apache-2.0
15,474
0

OpenMed-NER-PharmaDetect-SuperClinical-141M

license:apache-2.0
15,436
0

OpenMed-NER-SpeciesDetect-SuperClinical-434M

license:apache-2.0
15,435
0

OpenMed-NER-OrganismDetect-SuperMedical-125M

license:apache-2.0
15,424
0

OpenMed-NER-SpeciesDetect-SuperClinical-184M

license:apache-2.0
15,411
0

OpenMed-NER-SpeciesDetect-BioPatient-108M

license:apache-2.0
15,345
0

OpenMed-NER-ChemicalDetect-TinyMed-135M

license:apache-2.0
15,139
0

OpenMed-NER-OrganismDetect-ElectraMed-335M

license:apache-2.0
15,134
0

OpenMed-NER-GenomicDetect-SuperClinical-434M

license:apache-2.0
15,118
0

OpenMed-NER-AnatomyDetect-MultiMed-568M

license:apache-2.0
15,106
0

OpenMed-NER-GenomeDetect-BioClinical-108M

license:apache-2.0
15,097
0

OpenMed-NER-DNADetect-ElectraMed-33M

license:apache-2.0
15,095
0

OpenMed-NER-SpeciesDetect-TinyMed-65M

license:apache-2.0
15,088
0

OpenMed-NER-AnatomyDetect-SuperMedical-125M

license:apache-2.0
15,074
0

OpenMed-NER-OncologyDetect-TinyMed-82M

license:apache-2.0
15,068
0

OpenMed-NER-PathologyDetect-SuperMedical-125M

license:apache-2.0
15,067
0

OpenMed-NER-GenomeDetect-MultiMed-568M

license:apache-2.0
15,052
0

OpenMed-NER-DNADetect-ModernMed-395M

license:apache-2.0
14,994
0

OpenMed-NER-SpeciesDetect-SuperMedical-355M

license:apache-2.0
14,923
0

OpenMed-NER-SpeciesDetect-MultiMed-568M

license:apache-2.0
14,919
0

OpenMed-NER-DNADetect-ElectraMed-335M

license:apache-2.0
14,416
0

OpenMed-NER-GenomicDetect-TinyMed-135M

license:apache-2.0
14,394
0

OpenMed-NER-PathologyDetect-BioPatient-108M

license:apache-2.0
14,390
0

OpenMed-NER-SpeciesDetect-SuperMedical-125M

license:apache-2.0
14,370
0

OpenMed-NER-GenomicDetect-EuroMed-212M

license:apache-2.0
14,356
0

OpenMed-NER-SpeciesDetect-ElectraMed-335M

license:apache-2.0
14,350
0

OpenMed-NER-GenomeDetect-BioPatient-108M

license:apache-2.0
14,348
0

OpenMed-NER-ProteinDetect-SuperClinical-184M

license:apache-2.0
14,346
0

OpenMed-NER-BloodCancerDetect-BioPatient-108M

license:apache-2.0
14,341
0

OpenMed-NER-ProteinDetect-ModernClinical-395M

license:apache-2.0
14,340
0

OpenMed-NER-GenomicDetect-ModernMed-149M

license:apache-2.0
14,338
0

OpenMed-NER-OncologyDetect-SuperMedical-125M

license:apache-2.0
14,334
0

OpenMed-NER-PharmaDetect-TinyMed-82M

license:apache-2.0
14,327
0

OpenMed-NER-OrganismDetect-ElectraMed-33M

license:apache-2.0
14,322
0

OpenMed-NER-DNADetect-PubMed-335M

license:apache-2.0
14,312
0

OpenMed-NER-ProteinDetect-ElectraMed-109M

license:apache-2.0
14,311
0

OpenMed-NER-SpeciesDetect-PubMed-v2-109M

license:apache-2.0
14,305
0

OpenMed-NER-DNADetect-ModernClinical-395M

license:apache-2.0
14,301
0

OpenMed-NER-AnatomyDetect-BioPatient-108M

license:apache-2.0
14,290
0

OpenMed-NER-SpeciesDetect-ModernClinical-149M

license:apache-2.0
14,290
0

OpenMed-NER-BloodCancerDetect-TinyMed-135M

license:apache-2.0
14,289
2

OpenMed-NER-OrganismDetect-ModernClinical-149M

license:apache-2.0
14,281
0

OpenMed-NER-OncologyDetect-BioMed-109M

license:apache-2.0
14,276
0

OpenMed-NER-ProteinDetect-ModernClinical-149M

license:apache-2.0
14,274
0

OpenMed-NER-GenomeDetect-PubMed-v2-109M

license:apache-2.0
14,248
0

OpenMed-NER-OncologyDetect-SuperClinical-184M

license:apache-2.0
14,236
0

OpenMed-NER-GenomeDetect-TinyMed-135M

license:apache-2.0
14,154
0

OpenMed-NER-SpeciesDetect-SuperClinical-141M

license:apache-2.0
14,150
0

OpenMed-NER-DNADetect-TinyMed-135M

license:apache-2.0
13,988
0

OpenMed-NER-DNADetect-BioPatient-108M

license:apache-2.0
13,002
0

OpenMed-NER-DiseaseDetect-BigMed-278M

license:apache-2.0
12,972
0

OpenMed-NER-PharmaDetect-TinyMed-135M

license:apache-2.0
12,970
0

OpenMed-NER-BloodCancerDetect-ModernClinical-149M

license:apache-2.0
12,968
0

OpenMed-NER-OncologyDetect-ElectraMed-335M

license:apache-2.0
12,966
0

OpenMed-NER-AnatomyDetect-ModernMed-395M

license:apache-2.0
12,962
0

OpenMed-NER-OncologyDetect-MultiMed-335M

license:apache-2.0
12,959
0

OpenMed-NER-PathologyDetect-TinyMed-65M

license:apache-2.0
12,947
0

OpenMed-NER-PharmaDetect-TinyMed-66M

license:apache-2.0
12,938
0

OpenMed-NER-AnatomyDetect-SnowMed-568M

license:apache-2.0
12,929
0

OpenMed-NER-PathologyDetect-TinyMed-66M

license:apache-2.0
12,925
0

OpenMed-NER-BloodCancerDetect-ModernClinical-395M

license:apache-2.0
12,924
0

OpenMed-NER-OrganismDetect-BigMed-278M

license:apache-2.0
12,917
0

OpenMed-NER-OrganismDetect-PubMed-v2-109M

license:apache-2.0
12,908
0

OpenMed-NER-GenomeDetect-SuperClinical-141M

license:apache-2.0
12,905
0

OpenMed-NER-DNADetect-BigMed-278M

license:apache-2.0
12,899
0

OpenMed-NER-SpeciesDetect-ElectraMed-33M

license:apache-2.0
12,862
0

OpenMed-NER-BloodCancerDetect-SuperClinical-141M

license:apache-2.0
11,891
0

OpenMed-NER-PharmaDetect-TinyMed-65M

license:apache-2.0
11,888
0

OpenMed-NER-PathologyDetect-BioClinical-108M

license:apache-2.0
11,888
0

OpenMed-NER-ProteinDetect-ElectraMed-33M

license:apache-2.0
11,887
0

OpenMed-NER-OrganismDetect-SuperClinical-141M

license:apache-2.0
11,869
0

OpenMed-NER-GenomicDetect-BioClinical-108M

license:apache-2.0
11,860
0

OpenMed-NER-ProteinDetect-ModernMed-395M

license:apache-2.0
11,859
0

OpenMed-NER-ChemicalDetect-PubMed-v2-109M

license:apache-2.0
11,859
0

OpenMed-NER-ChemicalDetect-TinyMed-65M

license:apache-2.0
11,859
0

OpenMed-NER-PathologyDetect-BioMed-109M

license:apache-2.0
11,851
0

OpenMed-NER-GenomeDetect-BioMed-109M

license:apache-2.0
11,846
0

OpenMed-NER-PharmaDetect-SuperClinical-184M

license:apache-2.0
11,842
0

OpenMed-NER-OrganismDetect-TinyMed-65M

license:apache-2.0
11,842
0

OpenMed-NER-BloodCancerDetect-PubMed-109M

license:apache-2.0
11,816
0

OpenMed-NER-SpeciesDetect-SnowMed-568M

license:apache-2.0
10,815
0

OpenMed-NER-GenomeDetect-TinyMed-65M

license:apache-2.0
10,788
0

OpenMed-NER-ProteinDetect-BioPatient-108M

license:apache-2.0
10,781
0

OpenMed-NER-OrganismDetect-BigMed-560M

license:apache-2.0
10,768
0

OpenMed-NER-PharmaDetect-BioMed-109M

license:apache-2.0
10,751
0

OpenMed-NER-ChemicalDetect-BioClinical-108M

license:apache-2.0
9,679
0

OpenMed-PII-SuperClinical-Large-434M-v1

license:apache-2.0
147
0

OpenMed-PII-Spanish-SnowflakeMed-Large-568M-v1

NaNK
license:apache-2.0
81
0

OpenMed-PII-Dutch-SuperClinical-Large-434M-v1

license:apache-2.0
26
0

OpenMed-PII-Dutch-mLiteClinical-Base-135M-v1

license:apache-2.0
17
0

OpenMed-PII-Dutch-LiteClinicalU-Small-66M-v1

license:apache-2.0
16
0

OpenMed-PII-Dutch-mClinicalE5-Large-560M-v1

license:apache-2.0
16
0

OpenMed-PII-Dutch-ModernMed-Large-395M-v1

license:apache-2.0
16
0

OpenMed-PII-Dutch-BigMed-Large-278M-v1

license:apache-2.0
16
0

OpenMed-PII-Dutch-SnowflakeMed-Large-568M-v1

NaNK
license:apache-2.0
15
0

OpenMed-PII-Dutch-LiteClinical-Small-66M-v1

license:apache-2.0
15
0

OpenMed-PII-Dutch-FastClinical-Small-82M-v1

license:apache-2.0
15
0

OpenMed-PII-Dutch-ClinicalE5-Base-109M-v1

NaNK
license:apache-2.0
15
0

OpenMed-PII-Dutch-GTEMed-Base-149M-v1

license:apache-2.0
15
0

OpenMed-PII-Dutch-ModernMed-Base-149M-v1

license:apache-2.0
15
0

OpenMed-PII-Dutch-SuperMedical-Large-355M-v1

license:apache-2.0
15
0

OpenMed-PII-Dutch-ClinicalBGE-Large-568M-v1

NaNK
license:apache-2.0
14
0

OpenMed-PII-Dutch-BioClinicalBERT-Base-110M-v1

license:apache-2.0
14
0

OpenMed-PII-Dutch-BioClinicalModern-Large-395M-v1

license:apache-2.0
14
0

OpenMed-PII-Dutch-BiomedELECTRA-Large-335M-v1

license:apache-2.0
14
0

OpenMed-PII-Dutch-ClinicalE5-Large-335M-v1

NaNK
license:apache-2.0
14
0

OpenMed-PII-Dutch-ClinicalE5-Small-33M-v1

NaNK
license:apache-2.0
14
0

OpenMed-PII-Dutch-NomicMed-Large-395M-v1

license:apache-2.0
14
0

OpenMed-PII-Spanish-ClinicalBGE-Large-568M-v1

NaNK
license:apache-2.0
14
0

OpenMed-PII-SuperClinical-Small-44M-v1-mlx

NaNK
license:apache-2.0
13
0

OpenMed-PII-Dutch-ClinicDischarge-Base-110M-v1

license:apache-2.0
13
0

OpenMed-PII-Dutch-BiomedBERTFull-Base-110M-v1

license:apache-2.0
13
0

OpenMed-PII-Dutch-BiomedELECTRA-Base-110M-v1

license:apache-2.0
13
0

OpenMed-PII-Dutch-SuperClinical-Base-184M-v1

license:apache-2.0
13
0

OpenMed-PII-Dutch-SuperMedical-Base-125M-v1

license:apache-2.0
13
0

OpenMed-PII-Dutch-BigMed-Large-560M-v1

license:apache-2.0
13
0

OpenMed-PII-Dutch-EuroMed-Large-210M-v1

license:apache-2.0
13
0

OpenMed-PII-Spanish-BioClinicalModern-Large-395M-v1

license:apache-2.0
13
0

OpenMed-PII-Spanish-ClinicalLongformer-Base-149M-v1

license:apache-2.0
13
0

OpenMed-PII-Spanish-SuperClinical-Large-434M-v1

license:apache-2.0
12
3

OpenMed-PII-Dutch-BioClinicalModern-Base-149M-v1

license:apache-2.0
12
0

OpenMed-PII-Dutch-BiomedBERT-Base-110M-v1

license:apache-2.0
12
0

OpenMed-PII-Dutch-QwenMed-XLarge-600M-v1

NaNK
license:apache-2.0
12
0

OpenMed-PII-Dutch-SuperClinical-Small-44M-v1

license:apache-2.0
12
0

OpenMed-PII-Dutch-BiomedBERT-Large-340M-v1

license:apache-2.0
12
0

OpenMed-PII-Spanish-BiomedELECTRA-Large-335M-v1

license:apache-2.0
12
0

OpenMed-PII-Spanish-LiteClinicalU-Small-66M-v1

license:apache-2.0
12
0

OpenMed-ZeroShot-NER-Anatomy-XLarge-770M

Specialized model for Anatomical Entity Recognition - Anatomical structures and body parts [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Tailored to anatomical structure recognition, including organs, tissues, and substructures in clinical narratives.Supports radiology and surgical note parsing, site-of-disease extraction, and anatomy-aware analytics. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated ANATOMY dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Anatomy corpus focuses on anatomical entity recognition for medical terminology and healthcare applications. The Anatomy corpus is a specialized biomedical NER dataset designed for recognizing anatomical entities and medical terminology in clinical and biomedical texts. This corpus contains annotations for anatomical structures, body parts, organs, and physiological systems mentioned in medical literature. It is essential for developing clinical NLP systems, medical education tools, and healthcare informatics applications where accurate anatomical entity identification is crucial. The dataset supports the development of automated systems for medical coding, clinical decision support, and anatomical knowledge extraction from medical records and literature. It serves as a valuable resource for training NER models used in medical imaging, surgical planning, and clinical documentation. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.90` - F1 Improvement vs Base Model: `211.3%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Anatomy-Large-459M | 0.2978 | 0.9271 | 0.6293 | 211.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Anatomy-Medium-209M | 0.3172 | 0.9114 | 0.5942 | 187.3% | | 🥉 3 | OpenMed-ZeroShot-NER-Anatomy-XLarge-770M | 0.3780 | 0.9021 | 0.5241 | 138.7% | | 4 | OpenMed-ZeroShot-NER-Anatomy-Base-220M | 0.2804 | 0.8627 | 0.5823 | 207.7% | | 5 | OpenMed-ZeroShot-NER-Anatomy-Multi-209M | 0.3121 | 0.8091 | 0.4969 | 159.2% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: ANATOMY - Description: Anatomical Entity Recognition - Anatomical structures and body parts Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
12
0

OpenMed-ZeroShot-NER-Disease-Large-459M

license:apache-2.0
11
1

OpenMed-PII-Dutch-ClinicalBGE-Large-335M-v1

NaNK
license:apache-2.0
11
0

OpenMed-PII-Spanish-ClinicDischarge-Base-110M-v1

license:apache-2.0
11
0

OpenMed-PII-Spanish-BiomedBERT-Base-110M-v1

license:apache-2.0
11
0

OpenMed-PII-Dutch-ClinicalLongformer-Base-149M-v1

license:apache-2.0
10
0

OpenMed-PII-Dutch-mSuperClinical-Large-279M-v1

license:apache-2.0
10
0

OpenMed-PII-Spanish-BiomedBERTFull-Base-110M-v1

license:apache-2.0
10
0

OpenMed-PII-Spanish-BiomedBERT-Large-340M-v1

license:apache-2.0
10
0

OpenMed-PII-Spanish-SuperClinical-Base-184M-v1

license:apache-2.0
10
0

OpenMed-ZeroShot-NER-Oncology-Small-166M

license:apache-2.0
10
0

OpenMed-PII-Spanish-QwenMed-XLarge-600M-v1

NaNK
license:apache-2.0
9
0

OpenMed-PII-Spanish-SuperMedical-Large-355M-v1

license:apache-2.0
9
0

OpenMed-PII-Spanish-BigMed-Large-560M-v1

license:apache-2.0
9
0

OpenMed-PII-Spanish-mSuperClinical-Large-279M-v1

license:apache-2.0
9
0

OpenMed-PII-Spanish-BiomedELECTRA-Base-110M-v1

license:apache-2.0
9
0

OpenMed-PII-Spanish-SuperClinical-Small-44M-v1

license:apache-2.0
9
0

OpenMed-PII-Spanish-FastClinical-Small-82M-v1

license:apache-2.0
9
0

OpenMed-PII-Spanish-GTEMed-Base-149M-v1

license:apache-2.0
9
0

OpenMed-ZeroShot-NER-Pharma-XLarge-770M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Focused on chemical mentions in the BC5CDR domain, capturing pharmaceutical compounds and therapeutic agents in context with diseases.Enables pharmacovigilance, adverse event mining, and chemical–disease relation pipelines when paired with downstream relation extraction. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.95` - F1 Improvement vs Base Model: `26.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Pharma-Large-459M | 0.7537 | 0.9542 | 0.2005 | 26.6% | | 🥈 2 | OpenMed-ZeroShot-NER-Pharma-XLarge-770M | 0.7299 | 0.9463 | 0.2164 | 29.7% | | 🥉 3 | OpenMed-ZeroShot-NER-Pharma-Medium-209M | 0.6358 | 0.9457 | 0.3100 | 48.8% | | 4 | OpenMed-ZeroShot-NER-Pharma-Base-220M | 0.6554 | 0.9197 | 0.2643 | 40.3% | | 5 | OpenMed-ZeroShot-NER-Pharma-Multi-209M | 0.6548 | 0.8931 | 0.2383 | 36.4% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
8
1

OpenMed-PII-Spanish-SuperMedical-Base-125M-v1

license:apache-2.0
8
0

OpenMed-PII-Spanish-NomicMed-Large-395M-v1

license:apache-2.0
8
0

OpenMed-PII-Spanish-ClinicalBGE-Large-335M-v1

NaNK
license:apache-2.0
8
0

OpenMed-PII-Spanish-BioClinicalModern-Base-149M-v1

license:apache-2.0
8
0

OpenMed-PII-Spanish-ClinicalE5-Base-109M-v1

NaNK
license:apache-2.0
8
0

OpenMed-ZeroShot-NER-Species-XLarge-770M

Specialized model for Species Entity Recognition - Species and organism names [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Specialized in species and organism mentions with robust handling of scientific/common names and abbreviations.Applies to biodiversity mining, metagenomics reporting, and taxonomy-aware literature curation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated LINNAEUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Linnaeus corpus is designed for species name identification and taxonomic entity recognition in biomedical literature. The Linnaeus corpus is a specialized biomedical NER dataset focused on species name identification and organism recognition in scientific literature. Named after Carl Linnaeus who established modern taxonomic nomenclature, this corpus contains annotations for species mentions that are normalized to NCBI Taxonomy identifiers. The dataset is crucial for biodiversity informatics, ecological research, and biological literature mining where accurate organism identification is essential. It supports the development of text mining systems for taxonomic studies, species distribution research, and comparative genomics applications. The corpus addresses the challenge of recognizing both scientific names and common names of organisms across diverse biological texts. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.95` - F1 Improvement vs Base Model: `410.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Species-Medium-209M | 0.1565 | 0.9751 | 0.8185 | 523.0% | | 🥈 2 | OpenMed-ZeroShot-NER-Species-XLarge-770M | 0.2801 | 0.9548 | 0.6747 | 240.9% | | 🥉 3 | OpenMed-ZeroShot-NER-Species-Large-459M | 0.1864 | 0.9520 | 0.7655 | 410.6% | | 4 | OpenMed-ZeroShot-NER-Species-Base-220M | 0.1829 | 0.9386 | 0.7557 | 413.2% | | 5 | OpenMed-ZeroShot-NER-Species-Multi-209M | 0.1461 | 0.8323 | 0.6862 | 469.6% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: LINNAEUS - Description: Species Entity Recognition - Species and organism names Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
8
0

OpenMed-ZeroShot-NER-Chemical-XLarge-770M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Purpose-built for chemical entity recognition in biomedical literature. It robustly identifies small molecules, drugs, reagents, and chemical synonyms across abstracts and full-text articles.Ideal for drug discovery workflows, compound indexing, entity linking to ChEBI/DrugBank, and pharmacology literature curation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.92` - F1 Improvement vs Base Model: `38.5%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Chemical-Large-459M | 0.6766 | 0.9369 | 0.2603 | 38.5% | | 🥈 2 | OpenMed-ZeroShot-NER-Chemical-Medium-209M | 0.6113 | 0.9343 | 0.3229 | 52.8% | | 🥉 3 | OpenMed-ZeroShot-NER-Chemical-XLarge-770M | 0.6063 | 0.9247 | 0.3184 | 52.5% | | 4 | OpenMed-ZeroShot-NER-Chemical-Base-220M | 0.5269 | 0.9047 | 0.3778 | 71.7% | | 5 | OpenMed-ZeroShot-NER-Chemical-Multi-209M | 0.5490 | 0.8745 | 0.3255 | 59.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
8
0

OpenMed ZeroShot NER DNA Multi 209M

Specialized model for Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Targets molecular biology entities: proteins, DNA/RNA, cell lines, and cell types in biomedical research content.Great for pathway curation, molecular interaction mining, and omics-aware information extraction. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated JNLPBA dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: - `DNA` - `RNA` - `cellline` - `celltyle` - `protein` 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. JNLPBA corpus focuses on biomedical named entity recognition for protein, DNA, RNA, cell line, and cell type entities. The JNLPBA (Joint Workshop on Natural Language Processing in Biomedicine and its Applications) corpus is a widely-used biomedical NER dataset derived from the GENIA corpus for the 2004 bio-entity recognition task. It contains annotations for five entity types: protein, DNA, RNA, cell line, and cell type, making it essential for molecular biology and genomics research applications. The corpus consists of MEDLINE abstracts annotated with biomedical entities relevant to gene and protein recognition tasks. It has been extensively used as a benchmark for evaluating biomedical NER systems and continues to be a standard evaluation dataset for developing machine learning models in computational biology and bioinformatics. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.78` - F1 Improvement vs Base Model: `16.4%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-DNA-Large-459M | 0.7006 | 0.8220 | 0.1214 | 17.3% | | 🥈 2 | OpenMed-ZeroShot-NER-DNA-Medium-209M | 0.6928 | 0.8208 | 0.1280 | 18.5% | | 🥉 3 | OpenMed-ZeroShot-NER-DNA-XLarge-770M | 0.5271 | 0.8106 | 0.2835 | 53.8% | | 4 | OpenMed-ZeroShot-NER-DNA-Base-220M | 0.4896 | 0.7907 | 0.3011 | 61.5% | | 5 | OpenMed-ZeroShot-NER-DNA-Multi-209M | 0.6660 | 0.7750 | 0.1090 | 16.4% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: JNLPBA - Description: Biomedical Entity Recognition - Proteins, DNA, RNA, cell lines, and cell types Training Details - Base Model: glinermulti-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
7
1

OpenMed-PII-Spanish-BigMed-Large-278M-v1

license:apache-2.0
7
0

OpenMed-PII-Spanish-BioClinicalBERT-Base-110M-v1

license:apache-2.0
7
0

OpenMed-PII-Spanish-LiteClinical-Small-66M-v1

license:apache-2.0
7
0

OpenMed-PII-Spanish-ClinicalE5-Large-335M-v1

NaNK
license:apache-2.0
7
0

OpenMed-PII-Spanish-mClinicalE5-Large-560M-v1

license:apache-2.0
7
0

OpenMed-PII-Spanish-ClinicalE5-Small-33M-v1

NaNK
license:apache-2.0
7
0

OpenMed-PII-Spanish-EuroMed-Large-210M-v1

license:apache-2.0
7
0

OpenMed-ZeroShot-NER-Genomic-XLarge-770M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Targets gene and genetics entities, handling symbol/name variants commonly found in genomics literature.Useful for genetic association studies, variant curation, and genomics informatics. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.90` - F1 Improvement vs Base Model: `82.3%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genomic-Large-459M | 0.5361 | 0.9775 | 0.4414 | 82.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Genomic-Medium-209M | 0.5376 | 0.9674 | 0.4298 | 79.9% | | 🥉 3 | OpenMed-ZeroShot-NER-Genomic-XLarge-770M | 0.6875 | 0.9003 | 0.2128 | 30.9% | | 4 | OpenMed-ZeroShot-NER-Genomic-Small-166M | 0.4694 | 0.8082 | 0.3388 | 72.2% | | 5 | OpenMed-ZeroShot-NER-Genomic-Multi-209M | 0.4000 | 0.7333 | 0.3333 | 83.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
7
0

OpenMed-ZeroShot-NER-Pathology-Base-220M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) High-precision disease NER tuned for research literature, capturing disease mentions suitable for normalization to MeSH/OMIM.Useful for clinical NLP, cohort discovery, and knowledge graph construction, and pairs well with concept-normalization modules. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.86` - F1 Improvement vs Base Model: `33.8%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Pathology-Large-459M | 0.6183 | 0.8983 | 0.2800 | 45.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Pathology-Medium-209M | 0.6039 | 0.8940 | 0.2901 | 48.0% | | 🥉 3 | OpenMed-ZeroShot-NER-Pathology-XLarge-770M | 0.6806 | 0.8872 | 0.2066 | 30.4% | | 4 | OpenMed-ZeroShot-NER-Pathology-Base-220M | 0.6393 | 0.8556 | 0.2163 | 33.8% | | 5 | OpenMed-ZeroShot-NER-Pathology-Multi-209M | 0.5601 | 0.7726 | 0.2125 | 37.9% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: gliner-x-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
7
0

OpenMed-ZeroShot-NER-BloodCancer-Tiny-60M

Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Domain-tuned for Chronic Lymphocytic Leukemia (CLL) terminology, capturing disease descriptors, biomarkers, and therapies.Supports hematology research, treatment response analysis, and clinical evidence tracking. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.68` - F1 Improvement vs Base Model: `23.9%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-BloodCancer-Medium-209M | 0.5068 | 0.9130 | 0.4062 | 80.2% | | 🥈 2 | OpenMed-ZeroShot-NER-BloodCancer-XLarge-770M | 0.7291 | 0.8750 | 0.1459 | 20.0% | | 🥉 3 | OpenMed-ZeroShot-NER-BloodCancer-Large-459M | 0.6009 | 0.7755 | 0.1746 | 29.0% | | 4 | OpenMed-ZeroShot-NER-BloodCancer-Small-166M | 0.5505 | 0.6818 | 0.1314 | 23.9% | | 5 | OpenMed-ZeroShot-NER-BloodCancer-Tiny-60M | 0.5361 | 0.6780 | 0.1419 | 26.5% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: gliner-x-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
7
0

OpenMed-ZeroShot-NER-Genome-Medium-209M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Accurate gene/protein mention recognition, including synonyms and symbol variants from biomedical literature.Enables gene-centric curation, variant/association mining, and network construction. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.86` - F1 Improvement vs Base Model: `45.1%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genome-Large-459M | 0.5538 | 0.8616 | 0.3078 | 55.6% | | 🥈 2 | OpenMed-ZeroShot-NER-Genome-Medium-209M | 0.5893 | 0.8553 | 0.2660 | 45.1% | | 🥉 3 | OpenMed-ZeroShot-NER-Genome-XLarge-770M | 0.5572 | 0.8367 | 0.2795 | 50.2% | | 4 | OpenMed-ZeroShot-NER-Genome-Base-220M | 0.5322 | 0.7986 | 0.2664 | 50.1% | | 5 | OpenMed-ZeroShot-NER-Genome-Multi-209M | 0.5919 | 0.7494 | 0.1576 | 26.6% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: glinermedium-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
7
0

OpenMed-ZeroShot-NER-Genomic-Base-220M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Targets gene and genetics entities, handling symbol/name variants commonly found in genomics literature.Useful for genetic association studies, variant curation, and genomics informatics. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.72` - F1 Improvement vs Base Model: `48.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genomic-Large-459M | 0.5361 | 0.9775 | 0.4414 | 82.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Genomic-Medium-209M | 0.5376 | 0.9674 | 0.4298 | 79.9% | | 🥉 3 | OpenMed-ZeroShot-NER-Genomic-XLarge-770M | 0.6875 | 0.9003 | 0.2128 | 30.9% | | 4 | OpenMed-ZeroShot-NER-Genomic-Small-166M | 0.4694 | 0.8082 | 0.3388 | 72.2% | | 5 | OpenMed-ZeroShot-NER-Genomic-Multi-209M | 0.4000 | 0.7333 | 0.3333 | 83.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: gliner-x-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
7
0

OpenMed-ZeroShot-NER-Oncology-Large-459M

license:apache-2.0
7
0

OpenMed-ZeroShot-NER-Genomic-Medium-209M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Targets gene and genetics entities, handling symbol/name variants commonly found in genomics literature.Useful for genetic association studies, variant curation, and genomics informatics. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.97` - F1 Improvement vs Base Model: `79.9%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genomic-Large-459M | 0.5361 | 0.9775 | 0.4414 | 82.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Genomic-Medium-209M | 0.5376 | 0.9674 | 0.4298 | 79.9% | | 🥉 3 | OpenMed-ZeroShot-NER-Genomic-XLarge-770M | 0.6875 | 0.9003 | 0.2128 | 30.9% | | 4 | OpenMed-ZeroShot-NER-Genomic-Small-166M | 0.4694 | 0.8082 | 0.3388 | 72.2% | | 5 | OpenMed-ZeroShot-NER-Genomic-Multi-209M | 0.4000 | 0.7333 | 0.3333 | 83.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: glinermedium-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
7
0

OpenMed-PII-Spanish-mLiteClinical-Base-135M-v1

license:apache-2.0
6
0

OpenMed-ZeroShot-NER-BloodCancer-Small-166M

Specialized model for Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Domain-tuned for Chronic Lymphocytic Leukemia (CLL) terminology, capturing disease descriptors, biomarkers, and therapies.Supports hematology research, treatment response analysis, and clinical evidence tracking. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated CLL dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. CLL corpus is specialized for chronic lymphocytic leukemia entity recognition in hematology and cancer research. The CLL (Chronic Lymphocytic Leukemia) corpus is a domain-specific biomedical NER dataset focused on entities related to chronic lymphocytic leukemia, a type of blood cancer. This specialized corpus contains annotations for CLL-specific terminology, biomarkers, treatment entities, and clinical concepts relevant to hematology and oncology research. The dataset is designed to support the development of clinical NLP systems for leukemia research, hematological disorder analysis, and cancer informatics applications. It is particularly valuable for identifying disease-specific entities, therapeutic interventions, and prognostic factors mentioned in CLL research literature. The corpus serves as a benchmark for evaluating NER models in specialized medical domains and clinical research. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.68` - F1 Improvement vs Base Model: `23.9%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-BloodCancer-Medium-209M | 0.5068 | 0.9130 | 0.4062 | 80.2% | | 🥈 2 | OpenMed-ZeroShot-NER-BloodCancer-XLarge-770M | 0.7291 | 0.8750 | 0.1459 | 20.0% | | 🥉 3 | OpenMed-ZeroShot-NER-BloodCancer-Large-459M | 0.6009 | 0.7755 | 0.1746 | 29.0% | | 4 | OpenMed-ZeroShot-NER-BloodCancer-Small-166M | 0.5505 | 0.6818 | 0.1314 | 23.9% | | 5 | OpenMed-ZeroShot-NER-BloodCancer-Tiny-60M | 0.5361 | 0.6780 | 0.1419 | 26.5% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: CLL - Description: Clinical Entity Recognition - Clinical entities related to Chronic Lymphocytic Leukemia Training Details - Base Model: glinersmall-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
6
0

OpenMed-ZeroShot-NER-Pathology-Large-459M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) High-precision disease NER tuned for research literature, capturing disease mentions suitable for normalization to MeSH/OMIM.Useful for clinical NLP, cohort discovery, and knowledge graph construction, and pairs well with concept-normalization modules. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.90` - F1 Improvement vs Base Model: `45.3%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Pathology-Large-459M | 0.6183 | 0.8983 | 0.2800 | 45.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Pathology-Medium-209M | 0.6039 | 0.8940 | 0.2901 | 48.0% | | 🥉 3 | OpenMed-ZeroShot-NER-Pathology-XLarge-770M | 0.6806 | 0.8872 | 0.2066 | 30.4% | | 4 | OpenMed-ZeroShot-NER-Pathology-Base-220M | 0.6393 | 0.8556 | 0.2163 | 33.8% | | 5 | OpenMed-ZeroShot-NER-Pathology-Multi-209M | 0.5601 | 0.7726 | 0.2125 | 37.9% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: glinerlarge-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
6
0

OpenMed-ZeroShot-NER-Chemical-Large-459M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Purpose-built for chemical entity recognition in biomedical literature. It robustly identifies small molecules, drugs, reagents, and chemical synonyms across abstracts and full-text articles.Ideal for drug discovery workflows, compound indexing, entity linking to ChEBI/DrugBank, and pharmacology literature curation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.94` - F1 Improvement vs Base Model: `38.5%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Chemical-Large-459M | 0.6766 | 0.9369 | 0.2603 | 38.5% | | 🥈 2 | OpenMed-ZeroShot-NER-Chemical-Medium-209M | 0.6113 | 0.9343 | 0.3229 | 52.8% | | 🥉 3 | OpenMed-ZeroShot-NER-Chemical-XLarge-770M | 0.6063 | 0.9247 | 0.3184 | 52.5% | | 4 | OpenMed-ZeroShot-NER-Chemical-Base-220M | 0.5269 | 0.9047 | 0.3778 | 71.7% | | 5 | OpenMed-ZeroShot-NER-Chemical-Multi-209M | 0.5490 | 0.8745 | 0.3255 | 59.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: glinerlarge-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
6
0

OpenMed-ZeroShot-NER-Disease-Small-166M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Specialized for disease and condition recognition from biomedical texts, covering clinical disorders and pathological states.Supports patient phenotyping, disease indexing, literature triage, and clinical evidence aggregation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.77` - F1 Improvement vs Base Model: `32.7%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Disease-Large-459M | 0.5890 | 0.9029 | 0.3138 | 53.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Disease-Medium-209M | 0.5721 | 0.8848 | 0.3127 | 54.7% | | 🥉 3 | OpenMed-ZeroShot-NER-Disease-XLarge-770M | 0.6969 | 0.8593 | 0.1624 | 23.3% | | 4 | OpenMed-ZeroShot-NER-Disease-Base-220M | 0.5952 | 0.8293 | 0.2341 | 39.3% | | 5 | OpenMed-ZeroShot-NER-Disease-Multi-209M | 0.5323 | 0.7969 | 0.2645 | 49.7% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: glinersmall-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
6
0

OpenMed-ZeroShot-NER-Disease-XLarge-770M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Specialized for disease and condition recognition from biomedical texts, covering clinical disorders and pathological states.Supports patient phenotyping, disease indexing, literature triage, and clinical evidence aggregation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.86` - F1 Improvement vs Base Model: `53.3%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Disease-Large-459M | 0.5890 | 0.9029 | 0.3138 | 53.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Disease-Medium-209M | 0.5721 | 0.8848 | 0.3127 | 54.7% | | 🥉 3 | OpenMed-ZeroShot-NER-Disease-XLarge-770M | 0.6969 | 0.8593 | 0.1624 | 23.3% | | 4 | OpenMed-ZeroShot-NER-Disease-Base-220M | 0.5952 | 0.8293 | 0.2341 | 39.3% | | 5 | OpenMed-ZeroShot-NER-Disease-Multi-209M | 0.5323 | 0.7969 | 0.2645 | 49.7% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
6
0

OpenMed-ZeroShot-NER-Genome-Tiny-60M

license:apache-2.0
6
0

OpenMed-ZeroShot-NER-Oncology-Medium-209M

license:apache-2.0
6
0

OpenMed-PII-ClinicalE5-Small-33M-v1-mlx

NaNK
license:apache-2.0
5
0

OpenMed-PII-Spanish-ModernMed-Large-395M-v1

license:apache-2.0
5
0

OpenMed-ZeroShot-NER-Genome-XLarge-770M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Accurate gene/protein mention recognition, including synonyms and symbol variants from biomedical literature.Enables gene-centric curation, variant/association mining, and network construction. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.84` - F1 Improvement vs Base Model: `55.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genome-Large-459M | 0.5538 | 0.8616 | 0.3078 | 55.6% | | 🥈 2 | OpenMed-ZeroShot-NER-Genome-Medium-209M | 0.5893 | 0.8553 | 0.2660 | 45.1% | | 🥉 3 | OpenMed-ZeroShot-NER-Genome-XLarge-770M | 0.5572 | 0.8367 | 0.2795 | 50.2% | | 4 | OpenMed-ZeroShot-NER-Genome-Base-220M | 0.5322 | 0.7986 | 0.2664 | 50.1% | | 5 | OpenMed-ZeroShot-NER-Genome-Multi-209M | 0.5919 | 0.7494 | 0.1576 | 26.6% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
5
0

OpenMed-ZeroShot-NER-Genomic-Large-459M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Targets gene and genetics entities, handling symbol/name variants commonly found in genomics literature.Useful for genetic association studies, variant curation, and genomics informatics. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.98` - F1 Improvement vs Base Model: `82.3%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genomic-Large-459M | 0.5361 | 0.9775 | 0.4414 | 82.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Genomic-Medium-209M | 0.5376 | 0.9674 | 0.4298 | 79.9% | | 🥉 3 | OpenMed-ZeroShot-NER-Genomic-XLarge-770M | 0.6875 | 0.9003 | 0.2128 | 30.9% | | 4 | OpenMed-ZeroShot-NER-Genomic-Small-166M | 0.4694 | 0.8082 | 0.3388 | 72.2% | | 5 | OpenMed-ZeroShot-NER-Genomic-Multi-209M | 0.4000 | 0.7333 | 0.3333 | 83.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: glinerlarge-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
5
0

OpenMed-ZeroShot-NER-Chemical-Tiny-60M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Purpose-built for chemical entity recognition in biomedical literature. It robustly identifies small molecules, drugs, reagents, and chemical synonyms across abstracts and full-text articles.Ideal for drug discovery workflows, compound indexing, entity linking to ChEBI/DrugBank, and pharmacology literature curation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.68` - F1 Improvement vs Base Model: `39.2%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Chemical-Large-459M | 0.6766 | 0.9369 | 0.2603 | 38.5% | | 🥈 2 | OpenMed-ZeroShot-NER-Chemical-Medium-209M | 0.6113 | 0.9343 | 0.3229 | 52.8% | | 🥉 3 | OpenMed-ZeroShot-NER-Chemical-XLarge-770M | 0.6063 | 0.9247 | 0.3184 | 52.5% | | 4 | OpenMed-ZeroShot-NER-Chemical-Base-220M | 0.5269 | 0.9047 | 0.3778 | 71.7% | | 5 | OpenMed-ZeroShot-NER-Chemical-Multi-209M | 0.5490 | 0.8745 | 0.3255 | 59.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: gliner-x-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
5
0

OpenMed-ZeroShot-NER-Anatomy-Medium-209M

license:apache-2.0
5
0

OpenMed-ZeroShot-NER-Pharma-Base-220M

license:apache-2.0
5
0

OpenMed-ZeroShot-NER-Pharma-Large-459M

Specialized model for Chemical Entity Recognition - Chemical entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Focused on chemical mentions in the BC5CDR domain, capturing pharmaceutical compounds and therapeutic agents in context with diseases.Enables pharmacovigilance, adverse event mining, and chemical–disease relation pipelines when paired with downstream relation extraction. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC5CDRCHEM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC5CDR-Chem focuses on chemical entity recognition from the BioCreative V Chemical-Disease Relation extraction task. The BC5CDR-Chem corpus is part of the BioCreative V Chemical-Disease Relation (CDR) extraction challenge, specifically targeting chemical entity recognition in biomedical texts. This dataset contains 1,500 PubMed abstracts with 4,409 annotated chemical entities, designed to support automated drug discovery and pharmacovigilance applications. The corpus emphasizes chemical compounds, drugs, and therapeutic substances that are relevant for understanding chemical-disease relationships. It serves as a critical resource for developing NER systems that can identify chemical entities for downstream tasks like adverse drug reaction detection and drug repurposing research. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.95` - F1 Improvement vs Base Model: `26.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Pharma-Large-459M | 0.7537 | 0.9542 | 0.2005 | 26.6% | | 🥈 2 | OpenMed-ZeroShot-NER-Pharma-XLarge-770M | 0.7299 | 0.9463 | 0.2164 | 29.7% | | 🥉 3 | OpenMed-ZeroShot-NER-Pharma-Medium-209M | 0.6358 | 0.9457 | 0.3100 | 48.8% | | 4 | OpenMed-ZeroShot-NER-Pharma-Base-220M | 0.6554 | 0.9197 | 0.2643 | 40.3% | | 5 | OpenMed-ZeroShot-NER-Pharma-Multi-209M | 0.6548 | 0.8931 | 0.2383 | 36.4% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC5CDRCHEM - Description: Chemical Entity Recognition - Chemical entities from the BC5CDR dataset Training Details - Base Model: glinerlarge-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
5
0

OpenMed ZeroShot NER Organism XLarge 770M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Optimized for species identification in scientific text, covering a wide range of taxa and naming variants.Useful for ecology studies, organism tagging, and biocuration. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.83` - F1 Improvement vs Base Model: `33.8%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Organism-Large-459M | 0.6329 | 0.8471 | 0.2142 | 33.8% | | 🥈 2 | OpenMed-ZeroShot-NER-Organism-Medium-209M | 0.6140 | 0.8257 | 0.2117 | 34.5% | | 🥉 3 | OpenMed-ZeroShot-NER-Organism-XLarge-770M | 0.6111 | 0.8256 | 0.2145 | 35.1% | | 4 | OpenMed-ZeroShot-NER-Organism-Base-220M | 0.5853 | 0.7717 | 0.1864 | 31.8% | | 5 | OpenMed-ZeroShot-NER-Organism-Small-166M | 0.5931 | 0.7092 | 0.1161 | 19.6% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
4

OpenMed-PII-BigMed-Large-278M-v1-mlx

NaNK
license:apache-2.0
4
0

OpenMed-PII-BiomedELECTRA-Base-110M-v1-mlx

NaNK
license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Genomic-Tiny-60M

Specialized model for Gene Entity Recognition - Gene-related entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Targets gene and genetics entities, handling symbol/name variants commonly found in genomics literature.Useful for genetic association studies, variant curation, and genomics informatics. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated GELLUS dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications. The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.66` - F1 Improvement vs Base Model: `72.2%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genomic-Large-459M | 0.5361 | 0.9775 | 0.4414 | 82.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Genomic-Medium-209M | 0.5376 | 0.9674 | 0.4298 | 79.9% | | 🥉 3 | OpenMed-ZeroShot-NER-Genomic-XLarge-770M | 0.6875 | 0.9003 | 0.2128 | 30.9% | | 4 | OpenMed-ZeroShot-NER-Genomic-Small-166M | 0.4694 | 0.8082 | 0.3388 | 72.2% | | 5 | OpenMed-ZeroShot-NER-Genomic-Multi-209M | 0.4000 | 0.7333 | 0.3333 | 83.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: GELLUS - Description: Gene Entity Recognition - Gene-related entities Training Details - Base Model: gliner-x-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Genome-Multi-209M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Accurate gene/protein mention recognition, including synonyms and symbol variants from biomedical literature.Enables gene-centric curation, variant/association mining, and network construction. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.75` - F1 Improvement vs Base Model: `26.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genome-Large-459M | 0.5538 | 0.8616 | 0.3078 | 55.6% | | 🥈 2 | OpenMed-ZeroShot-NER-Genome-Medium-209M | 0.5893 | 0.8553 | 0.2660 | 45.1% | | 🥉 3 | OpenMed-ZeroShot-NER-Genome-XLarge-770M | 0.5572 | 0.8367 | 0.2795 | 50.2% | | 4 | OpenMed-ZeroShot-NER-Genome-Base-220M | 0.5322 | 0.7986 | 0.2664 | 50.1% | | 5 | OpenMed-ZeroShot-NER-Genome-Multi-209M | 0.5919 | 0.7494 | 0.1576 | 26.6% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: glinermulti-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Pathology-Multi-209M

Specialized model for Disease Entity Recognition - Disease entities from the NCBI dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) High-precision disease NER tuned for research literature, capturing disease mentions suitable for normalization to MeSH/OMIM.Useful for clinical NLP, cohort discovery, and knowledge graph construction, and pairs well with concept-normalization modules. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated NCBIDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. NCBI Disease corpus is a comprehensive resource for disease name recognition and concept normalization. The NCBI Disease corpus is a gold-standard dataset containing 793 PubMed abstracts with 6,892 disease mentions mapped to 790 unique disease concepts from Medical Subject Headings (MeSH) and Online Mendelian Inheritance in Man (OMIM). Developed by the National Center for Biotechnology Information, this corpus provides both mention-level and concept-level annotations for disease entity recognition and normalization. The dataset is extensively used for developing clinical NLP systems, medical diagnosis support tools, and biomedical text mining applications. It serves as a critical benchmark for evaluating disease name recognition systems in healthcare informatics and medical literature analysis. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.77` - F1 Improvement vs Base Model: `37.9%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Pathology-Large-459M | 0.6183 | 0.8983 | 0.2800 | 45.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Pathology-Medium-209M | 0.6039 | 0.8940 | 0.2901 | 48.0% | | 🥉 3 | OpenMed-ZeroShot-NER-Pathology-XLarge-770M | 0.6806 | 0.8872 | 0.2066 | 30.4% | | 4 | OpenMed-ZeroShot-NER-Pathology-Base-220M | 0.6393 | 0.8556 | 0.2163 | 33.8% | | 5 | OpenMed-ZeroShot-NER-Pathology-Multi-209M | 0.5601 | 0.7726 | 0.2125 | 37.9% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: NCBIDISEASE - Description: Disease Entity Recognition - Disease entities from the NCBI dataset Training Details - Base Model: glinermulti-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Protein-XLarge-770M

Specialized model for Biomedical Entity Recognition - Various biomedical entities [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Focuses on protein entities (families, complexes, variants) and related molecular biology terms.Applicable to protein–protein interaction mining, pathway modeling, and systems biology. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated FSU dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: - `protein` - `proteincomplex` - `proteinenum` - `proteinfamilyorgroup` - `proteinvariant` 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. FSU corpus focuses on protein interactions and molecular biology entities for systems biology research. The FSU (Florida State University) corpus is a biomedical NER dataset designed for protein interaction recognition and molecular biology entity extraction. This corpus contains annotations for proteins, protein complexes, protein families, protein variants, and molecular interaction entities relevant to systems biology and biochemistry research. The dataset supports the development of text mining systems for protein-protein interaction extraction, molecular pathway analysis, and systems biology applications. It is particularly valuable for identifying protein entities involved in cellular processes, signal transduction pathways, and molecular mechanisms. The corpus serves as a benchmark for evaluating NER systems used in proteomics research, drug discovery, and molecular biology informatics. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.88` - F1 Improvement vs Base Model: `63.9%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Protein-Large-459M | 0.5612 | 0.9200 | 0.3589 | 63.9% | | 🥈 2 | OpenMed-ZeroShot-NER-Protein-Medium-209M | 0.5631 | 0.8995 | 0.3364 | 59.7% | | 🥉 3 | OpenMed-ZeroShot-NER-Protein-XLarge-770M | 0.5659 | 0.8786 | 0.3127 | 55.3% | | 4 | OpenMed-ZeroShot-NER-Protein-Base-220M | 0.5230 | 0.8454 | 0.3224 | 61.6% | | 5 | OpenMed-ZeroShot-NER-Protein-Multi-209M | 0.5441 | 0.7810 | 0.2369 | 43.5% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: FSU - Description: Biomedical Entity Recognition - Various biomedical entities Training Details - Base Model: gliner-x-large - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Genome-Large-459M

Specialized model for Gene/Protein Entity Recognition - Gene and protein mentions [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Accurate gene/protein mention recognition, including synonyms and symbol variants from biomedical literature.Enables gene-centric curation, variant/association mining, and network construction. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC2GM dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC2GM corpus targets gene and protein mention recognition from the BioCreative II Gene Mention task. The BC2GM (BioCreative II Gene Mention) corpus is a foundational dataset for gene and protein name recognition in biomedical literature, created for the BioCreative II challenge. This corpus contains thousands of sentences from MEDLINE abstracts with manually annotated gene and protein mentions, serving as a critical benchmark for genomics and molecular biology NER systems. The dataset addresses the challenging task of identifying gene names, which often have complex nomenclature and ambiguous boundaries. It has been instrumental in advancing automated gene recognition systems used in functional genomics research, gene expression analysis, and molecular biology text mining. The corpus continues to be widely used for training and evaluating biomedical NER models. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.86` - F1 Improvement vs Base Model: `55.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Genome-Large-459M | 0.5538 | 0.8616 | 0.3078 | 55.6% | | 🥈 2 | OpenMed-ZeroShot-NER-Genome-Medium-209M | 0.5893 | 0.8553 | 0.2660 | 45.1% | | 🥉 3 | OpenMed-ZeroShot-NER-Genome-XLarge-770M | 0.5572 | 0.8367 | 0.2795 | 50.2% | | 4 | OpenMed-ZeroShot-NER-Genome-Base-220M | 0.5322 | 0.7986 | 0.2664 | 50.1% | | 5 | OpenMed-ZeroShot-NER-Genome-Multi-209M | 0.5919 | 0.7494 | 0.1576 | 26.6% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC2GM - Description: Gene/Protein Entity Recognition - Gene and protein mentions Training Details - Base Model: glinerlarge-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Anatomy-Base-220M

Specialized model for Anatomical Entity Recognition - Anatomical structures and body parts [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Tailored to anatomical structure recognition, including organs, tissues, and substructures in clinical narratives.Supports radiology and surgical note parsing, site-of-disease extraction, and anatomy-aware analytics. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated ANATOMY dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Anatomy corpus focuses on anatomical entity recognition for medical terminology and healthcare applications. The Anatomy corpus is a specialized biomedical NER dataset designed for recognizing anatomical entities and medical terminology in clinical and biomedical texts. This corpus contains annotations for anatomical structures, body parts, organs, and physiological systems mentioned in medical literature. It is essential for developing clinical NLP systems, medical education tools, and healthcare informatics applications where accurate anatomical entity identification is crucial. The dataset supports the development of automated systems for medical coding, clinical decision support, and anatomical knowledge extraction from medical records and literature. It serves as a valuable resource for training NER models used in medical imaging, surgical planning, and clinical documentation. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.86` - F1 Improvement vs Base Model: `207.7%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Anatomy-Large-459M | 0.2978 | 0.9271 | 0.6293 | 211.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Anatomy-Medium-209M | 0.3172 | 0.9114 | 0.5942 | 187.3% | | 🥉 3 | OpenMed-ZeroShot-NER-Anatomy-XLarge-770M | 0.3780 | 0.9021 | 0.5241 | 138.7% | | 4 | OpenMed-ZeroShot-NER-Anatomy-Base-220M | 0.2804 | 0.8627 | 0.5823 | 207.7% | | 5 | OpenMed-ZeroShot-NER-Anatomy-Multi-209M | 0.3121 | 0.8091 | 0.4969 | 159.2% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: ANATOMY - Description: Anatomical Entity Recognition - Anatomical structures and body parts Training Details - Base Model: gliner-x-base - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Chemical-Small-166M

Specialized model for Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Purpose-built for chemical entity recognition in biomedical literature. It robustly identifies small molecules, drugs, reagents, and chemical synonyms across abstracts and full-text articles.Ideal for drug discovery workflows, compound indexing, entity linking to ChEBI/DrugBank, and pharmacology literature curation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC4CHEMD dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC4CHEMD is a biomedical NER corpus for chemical entity recognition from the BioCreative IV challenge. The BC4CHEMD (BioCreative IV Chemical Entity Mention) corpus is a manually annotated dataset designed for chemical entity recognition in biomedical literature. Created for the BioCreative IV challenge, this corpus contains abstracts from PubMed with chemical entities annotated according to Chemical Entities of Biological Interest (ChEBI) guidelines. The dataset is specifically designed to advance automated chemical name recognition systems for drug discovery, pharmacology, and chemical biology applications. It serves as a benchmark for evaluating named entity recognition models in identifying chemical compounds, drugs, and other chemical substances mentioned in scientific literature. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.84` - F1 Improvement vs Base Model: `39.2%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Chemical-Large-459M | 0.6766 | 0.9369 | 0.2603 | 38.5% | | 🥈 2 | OpenMed-ZeroShot-NER-Chemical-Medium-209M | 0.6113 | 0.9343 | 0.3229 | 52.8% | | 🥉 3 | OpenMed-ZeroShot-NER-Chemical-XLarge-770M | 0.6063 | 0.9247 | 0.3184 | 52.5% | | 4 | OpenMed-ZeroShot-NER-Chemical-Base-220M | 0.5269 | 0.9047 | 0.3778 | 71.7% | | 5 | OpenMed-ZeroShot-NER-Chemical-Multi-209M | 0.5490 | 0.8745 | 0.3255 | 59.3% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC4CHEMD - Description: Chemical Entity Recognition - Identifies chemical compounds and substances in biomedical literature Training Details - Base Model: glinersmall-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Genomic-Small-166M

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Organism-Large-459M

license:apache-2.0
4
0

OpenMed-ZeroShot-NER-Disease-Multi-209M

Specialized model for Disease Entity Recognition - Disease entities from the BC5CDR dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Specialized for disease and condition recognition from biomedical texts, covering clinical disorders and pathological states.Supports patient phenotyping, disease indexing, literature triage, and clinical evidence aggregation. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated BC5CDRDISEASE dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. BC5CDR-Disease targets disease entity recognition from the BioCreative V Chemical-Disease Relation extraction corpus. The BC5CDR-Disease corpus is the disease-focused component of the BioCreative V Chemical-Disease Relation (CDR) task, containing 1,500 PubMed abstracts with 5,818 annotated disease entities. This manually curated dataset is designed to advance automated disease name recognition for medical diagnosis, pathology research, and clinical decision support systems. The corpus includes annotations for various disease types, medical conditions, and pathological states mentioned in biomedical literature. It serves as a benchmark for evaluating NER models in clinical and biomedical applications where accurate disease entity identification is crucial for medical informatics and healthcare analytics. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.80` - F1 Improvement vs Base Model: `49.7%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Disease-Large-459M | 0.5890 | 0.9029 | 0.3138 | 53.3% | | 🥈 2 | OpenMed-ZeroShot-NER-Disease-Medium-209M | 0.5721 | 0.8848 | 0.3127 | 54.7% | | 🥉 3 | OpenMed-ZeroShot-NER-Disease-XLarge-770M | 0.6969 | 0.8593 | 0.1624 | 23.3% | | 4 | OpenMed-ZeroShot-NER-Disease-Base-220M | 0.5952 | 0.8293 | 0.2341 | 39.3% | | 5 | OpenMed-ZeroShot-NER-Disease-Multi-209M | 0.5323 | 0.7969 | 0.2645 | 49.7% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: BC5CDRDISEASE - Description: Disease Entity Recognition - Disease entities from the BC5CDR dataset Training Details - Base Model: glinermulti-v2.1 - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
3
3

OpenMed-PII-FastClinical-Small-82M-v1-mlx

NaNK
license:apache-2.0
3
0

OpenMed-PII-LiteClinical-Small-66M-v1-mlx

NaNK
license:apache-2.0
3
0

OpenMed-PII-Spanish-ModernMed-Base-149M-v1

license:apache-2.0
3
0

OpenMed-ZeroShot-NER-Organism-Tiny-60M

Specialized model for Species Entity Recognition - Species names from the Species-800 dataset [](https://opensource.org/licenses/Apache-2.0) []() []() [](https://huggingface.co/OpenMed) Optimized for species identification in scientific text, covering a wide range of taxa and naming variants.Useful for ecology studies, organism tagging, and biocuration. OpenMed ZeroShot NER is an advanced, domain-adapted Named Entity Recognition (NER) model designed specifically for medical, biomedical, and clinical text mining. Leveraging state-of-the-art zero-shot learning, this model empowers researchers, clinicians, and data scientists to extract expert-level biomedical entities—such as diseases, chemicals, genes, species, and clinical findings—directly from unstructured text, without the need for task-specific retraining. Built on the robust GLiNER architecture and fine-tuned on curated biomedical corpora, OpenMed ZeroShot NER delivers high-precision entity recognition for critical healthcare and life sciences applications. Its zero-shot capability means you can flexibly define and extract any entity type relevant to your workflow, from standard biomedical categories to custom clinical concepts, supporting rapid adaptation to new research domains and regulatory requirements. Whether you are working on clinical NLP, biomedical research, electronic health record (EHR) de-identification, or large-scale literature mining, OpenMed ZeroShot NER provides a production-ready, open-source solution that combines expert-level accuracy with unmatched flexibility. Join the OpenMed community to accelerate your medical text analytics with cutting-edge, zero-shot NER technology. 🎯 Key Features - Zero-Shot Capability: Can recognize any entity type without specific training - High Precision: Optimized for biomedical entity recognition - Domain-Specific: Fine-tuned on curated SPECIES800 dataset - Production-Ready: Validated on clinical benchmarks - Easy Integration: Compatible with Hugging Face Transformers ecosystem - Flexible Entity Recognition: Add custom entity types without retraining This zero-shot model can identify and classify biomedical entities, including but not limited to these entity types. You can also add custom entity types without retraining the model: 💡 Zero-Shot Flexibility: As a GliNER-based model, you can specify any entity types you want to detect, even if they weren't part of the original training. Simply provide the entity labels when using the model, and it will adapt to recognize them. Species800 is a corpus for species recognition and taxonomy classification in biomedical texts. The Species800 corpus is a manually annotated dataset designed for species recognition and taxonomic classification in biomedical literature. This corpus contains 800 abstracts with comprehensive annotations for organism mentions, supporting biodiversity informatics and biological taxonomy research. The dataset includes both scientific names and common names of species, making it valuable for developing NER systems that can handle the complexity of biological nomenclature. It serves as a benchmark for evaluating species identification models used in ecological studies, conservation biology, and systematic biology research. The corpus is particularly useful for text mining applications in biodiversity databases and biological literature analysis. - Finetuned F1 vs. Base Model (on test dataset excluded from training): `0.57` - F1 Improvement vs Base Model: `19.6%` | Rank | Model | Base F1 | Finetuned F1 | ΔF1 | ΔF1 % | |------|-------|--------:|------------:|----:|------:| | 🥇 1 | OpenMed-ZeroShot-NER-Organism-Large-459M | 0.6329 | 0.8471 | 0.2142 | 33.8% | | 🥈 2 | OpenMed-ZeroShot-NER-Organism-Medium-209M | 0.6140 | 0.8257 | 0.2117 | 34.5% | | 🥉 3 | OpenMed-ZeroShot-NER-Organism-XLarge-770M | 0.6111 | 0.8256 | 0.2145 | 35.1% | | 4 | OpenMed-ZeroShot-NER-Organism-Base-220M | 0.5853 | 0.7717 | 0.1864 | 31.8% | | 5 | OpenMed-ZeroShot-NER-Organism-Small-166M | 0.5931 | 0.7092 | 0.1161 | 19.6% | Rankings are sorted by finetuned F1 and show ΔF1% over base model. Test dataset is excluded from training. Figure: OpenMed ZeroShot Clinical & Biomedical NER vs. Original GLiNER models. Zero-Shot Usage with Custom Entity Types 💡 Tip: If you want to extract entities that are not present in the original training set (i.e., use custom or rare entity types), you may get better results by lowering the `threshold` parameter in `model.predictentities`. For example, try `threshold=0.3` or even lower, depending on your use case: > Lowering the threshold makes the model more permissive and can help it recognize new or less common entity types, but may also increase false positives. Adjust as needed for your application. - Dataset: SPECIES800 - Description: Species Entity Recognition - Species names from the Species-800 dataset Training Details - Base Model: gliner-x-small - Training Framework: Hugging Face Transformers - Optimization: AdamW optimizer with learning rate scheduling - Validation: Cross-validation on held-out test set This model is particularly useful for: - Clinical Text Mining: Extracting entities from medical records - Biomedical Research: Processing scientific literature - Drug Discovery: Identifying chemical compounds and drugs - Healthcare Analytics: Analyzing patient data and outcomes - Academic Research: Supporting biomedical NLP research - Custom Entity Recognition: Zero-shot detection of domain-specific entities - Task: Zero-Shot Classification (Named Entity Recognition) - Labels: Dataset-specific entity types - Input: Biomedical text - Output: Named entity predictions Licensed under the Apache License 2.0. See LICENSE for details. I welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join my mission to advance open-source Healthcare AI, I'd love to hear from you. Follow OpenMed Org on Hugging Face 🤗 and click "Watch" to stay updated on my latest releases and developments. If you use this model in your research or applications, please cite the following paper: Proper citation helps support and acknowledge my work. Thank you!

license:apache-2.0
3
0