matthewleechen

8 models • 1 total models in database

Sort by:

lt-patent-inventor-linking

multilabel_patent_classifier

sentence_focus_group_colonialbooks

Evaluation: {'evalloss': 0.5952138900756836, 'evalaccuracy': 0.8833333333333333, 'evalprecision': 0.8783653846153847, 'evalrecall': 0.8833333333333333, 'evalf1': 0.880468303826385, 'evalruntime': 0.3208, 'evalsamplespersecond': 374.091, 'evalstepspersecond': 3.117}

—

colonial_debate_title_classifier

—

sentence_focus_group_jstor_1800-1945

—

time-saving_stated_aim_classifier

This is a roberta-base model that is trained to classify whether a set of explicit stated aims extracted from a British historical patent includes a time-saving objective. Labels were manually generated and then checked with Gemini 2.0 Flash with the attached prompt.

—

sentence_focus_group_newspapers

Evaluation: {'evalloss': 0.2520483732223511, 'evalaccuracy': 0.9354838709677419, 'evalprecision': 0.9397849462365592, 'evalrecall': 0.9354838709677419, 'evalf1': 0.9204671857619577, 'evalruntime': 0.1265, 'evalsamplespersecond': 490.111, 'evalstepspersecond': 7.905}

—

patent_text_regions_yolov8

patenttextregions is a YOLOv8 model fine-tuned on a custom dataset of page-level images drawn from historical patent specifications published by the British Patent Office. It has been trained to recognize all text regions located within pages of patent specifications as a single class. We take the initialized weights from the official release of the small YOLOv8s model (yolov8s.pt) and fine tune on our custom dataset. This model can be used in the same way as any pre-trained YOLOv8 model by setting the model path to best.pt. The dataset was created by randomly sampling 420 page images from British patent specifications published between 1850-1899. The data was randomly split 80-10-10 (train-val-test) and then standard preprocessing (images were stretched and auto-oriented to 640 x 640 pixels) and the following data augmentations were applied using Roboflow: - Crop: 0% Minimum Zoom, 20% Maximum Zoom - Grayscale: Apply to 15% of images - Saturation: Between -25% and +25% - Blur: Up to 2.5px - Noise: Up to 0.1% of pixels The custom dataset consists of 1,092 labelled images in total, which are made available in this repository. We train the model using default hyperparameters, except from the batch size (128) and the number of epochs (300). If you use our model or custom training/evaluation data in your research, please cite our accompanying paper as follows:

NaNK

—