oliverguhr

14 models • 1 total models in database
Sort by:

fullstop-punctuation-multilang-large

--- language: - en - de - fr - it - multilingual tags: - punctuation prediction - punctuation datasets: wmt/europarl license: mit widget: - text: "Ho sentito che ti sei laureata il che mi fa molto piacere" example_title: "Italian" - text: "Tous les matins vers quatre heures mon père ouvrait la porte de ma chambre" example_title: "French" - text: "Ist das eine Frage Frau Müller" example_title: "German" - text: "Yet she blushed as if with guilt when Cynthia reading her thoughts said to her one day

license:mit
1,329,538
173

german-sentiment-bert

This model was trained for sentiment classification of German language texts. To achieve the best results all model inputs needs to be preprocessed with the same procedure, that was applied during the training. To simplify the usage of the model, we provide a Python package that bundles the code need for the preprocessing and inferencing. The model uses the Googles Bert architecture and was trained on 1.834 million German-language samples. The training data contains texts from various domains like Twitter, Facebook and movie, app and hotel reviews. You can find more information about the dataset and the training process in the paper. If you are interested in code and data that was used to train this model please have a look at this repository and our paper. Here is a table of the F1 scores that this model achieves on different datasets. Since we trained this model with a newer version of the transformer library, the results are slightly better than reported in the paper. | Dataset | F1 micro Score | | :----------------------------------------------------------- | -------------: | | holidaycheck | 0.9568 | | scare | 0.9418 | | filmstarts | 0.9021 | | germeval | 0.7536 | | PotTS | 0.6780 | | emotions | 0.9649 | | sb10k | 0.7376 | | Leipzig Wikipedia Corpus 2016 | 0.9967 | | all | 0.9639 | For feedback and questions contact me view mail or Twitter @oliverguhr. Please cite us if you found this useful:

license:mit
127,334
62

fullstop-punctuation-multilingual-base

license:mit
84,782
7

fullstop-punctuation-multilingual-sonar-base

This model predicts the punctuation of English, Italian, French and German texts. We developed it to restore the punctuation of transcribed spoken language. This multilanguage model was trained on the Europarl Dataset provided by the SEPP-NLG Shared Task and for the Dutch language we included the SoNaR Dataset. Please note that this dataset consists of political speeches. Therefore the model might perform differently on texts from other domains. The model restores the following punctuation markers: "." "," "?" "-" ":" Sample Code We provide a simple python package that allows you to process text of any length. output > My name is Clara and I live in Berkeley, California. Ist das eine Frage, Frau Müller? > [['My', '0', 0.99998856], ['name', '0', 0.9999708], ['is', '0', 0.99975926], ['Clara', '0', 0.6117834], ['and', '0', 0.9999014], ['I', '0', 0.9999808], ['live', '0', 0.9999666], ['in', '0', 0.99990165], ['Berkeley', ',', 0.9941764], ['California', '.', 0.9952892], ['Ist', '0', 0.9999577], ['das', '0', 0.9999678], ['eine', '0', 0.99998224], ['Frage', ',', 0.9952265], ['Frau', '0', 0.99995995], ['Müller', '?', 0.972517]] The performance differs for the single punctuation markers as hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop. The model achieves the following F1 scores for the different languages: | Label | English | German | French|Italian| Dutch | | ------------- | -------- | ------ | ----- | ----- | ----- | | 0 | 0.990 | 0.996 | 0.991 | 0.988 | 0.994 | | . | 0.924 | 0.951 | 0.921 | 0.917 | 0.959 | | ? | 0.825 | 0.829 | 0.800 | 0.736 | 0.817 | | , | 0.798 | 0.937 | 0.811 | 0.778 | 0.813 | | : | 0.535 | 0.608 | 0.578 | 0.544 | 0.657 | | - | 0.345 | 0.384 | 0.353 | 0.344 | 0.464 | | macro average | 0.736 | 0.784 | 0.742 | 0.718 | 0.784 | | micro average | 0.975 | 0.987 | 0.977 | 0.972 | 0.983 | | Languages | Model | | ------------------------------------------ | ------------------------------------------------------------ | | English, Italian, French and German | oliverguhr/fullstop-punctuation-multilang-large | | English, Italian, French, German and Dutch | oliverguhr/fullstop-punctuation-multilingual-sonar-base | | Dutch | oliverguhr/fullstop-dutch-sonar-punctuation-prediction | | Languages | Model | | ------------------------------------------ | ------------------------------------------------------------ | |English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian| kredor/punctuate-all | | Catalan | softcatala/fullstop-catalan-punctuation-prediction | You can use different models by setting the model parameter:

license:mit
29,226
2

spelling-correction-english-base

license:mit
3,182
77

spelling-correction-german-base

license:apache-2.0
2,015
15

fullstop-dutch-sonar-punctuation-prediction

license:mit
1,413
6

wav2vec2-large-xlsr-53-german-cv9

license:apache-2.0
138
1

gemma-3-4b-it-german-spelling

NaNK
69
0

spelling-correction-multilingual-base

license:mit
59
11

wav2vec2-base-german-cv9

license:mit
15
1

fullstop-dutch-punctuation-prediction

license:mit
11
3

wav2vec2-large-xlsr-53-german-cv13

NaNK
license:apache-2.0
4
1

revosax-granite-embedding-278m-multilingual

NaNK
license:apache-2.0
0
1