oliverguhr
fullstop-punctuation-multilang-large
--- language: - en - de - fr - it - multilingual tags: - punctuation prediction - punctuation datasets: wmt/europarl license: mit widget: - text: "Ho sentito che ti sei laureata il che mi fa molto piacere" example_title: "Italian" - text: "Tous les matins vers quatre heures mon père ouvrait la porte de ma chambre" example_title: "French" - text: "Ist das eine Frage Frau Müller" example_title: "German" - text: "Yet she blushed as if with guilt when Cynthia reading her thoughts said to her one day
german-sentiment-bert
This model was trained for sentiment classification of German language texts. To achieve the best results all model inputs needs to be preprocessed with the same procedure, that was applied during the training. To simplify the usage of the model, we provide a Python package that bundles the code need for the preprocessing and inferencing. The model uses the Googles Bert architecture and was trained on 1.834 million German-language samples. The training data contains texts from various domains like Twitter, Facebook and movie, app and hotel reviews. You can find more information about the dataset and the training process in the paper. If you are interested in code and data that was used to train this model please have a look at this repository and our paper. Here is a table of the F1 scores that this model achieves on different datasets. Since we trained this model with a newer version of the transformer library, the results are slightly better than reported in the paper. | Dataset | F1 micro Score | | :----------------------------------------------------------- | -------------: | | holidaycheck | 0.9568 | | scare | 0.9418 | | filmstarts | 0.9021 | | germeval | 0.7536 | | PotTS | 0.6780 | | emotions | 0.9649 | | sb10k | 0.7376 | | Leipzig Wikipedia Corpus 2016 | 0.9967 | | all | 0.9639 | For feedback and questions contact me view mail or Twitter @oliverguhr. Please cite us if you found this useful:
fullstop-punctuation-multilingual-base
fullstop-punctuation-multilingual-sonar-base
This model predicts the punctuation of English, Italian, French and German texts. We developed it to restore the punctuation of transcribed spoken language. This multilanguage model was trained on the Europarl Dataset provided by the SEPP-NLG Shared Task and for the Dutch language we included the SoNaR Dataset. Please note that this dataset consists of political speeches. Therefore the model might perform differently on texts from other domains. The model restores the following punctuation markers: "." "," "?" "-" ":" Sample Code We provide a simple python package that allows you to process text of any length. output > My name is Clara and I live in Berkeley, California. Ist das eine Frage, Frau Müller? > [['My', '0', 0.99998856], ['name', '0', 0.9999708], ['is', '0', 0.99975926], ['Clara', '0', 0.6117834], ['and', '0', 0.9999014], ['I', '0', 0.9999808], ['live', '0', 0.9999666], ['in', '0', 0.99990165], ['Berkeley', ',', 0.9941764], ['California', '.', 0.9952892], ['Ist', '0', 0.9999577], ['das', '0', 0.9999678], ['eine', '0', 0.99998224], ['Frage', ',', 0.9952265], ['Frau', '0', 0.99995995], ['Müller', '?', 0.972517]] The performance differs for the single punctuation markers as hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop. The model achieves the following F1 scores for the different languages: | Label | English | German | French|Italian| Dutch | | ------------- | -------- | ------ | ----- | ----- | ----- | | 0 | 0.990 | 0.996 | 0.991 | 0.988 | 0.994 | | . | 0.924 | 0.951 | 0.921 | 0.917 | 0.959 | | ? | 0.825 | 0.829 | 0.800 | 0.736 | 0.817 | | , | 0.798 | 0.937 | 0.811 | 0.778 | 0.813 | | : | 0.535 | 0.608 | 0.578 | 0.544 | 0.657 | | - | 0.345 | 0.384 | 0.353 | 0.344 | 0.464 | | macro average | 0.736 | 0.784 | 0.742 | 0.718 | 0.784 | | micro average | 0.975 | 0.987 | 0.977 | 0.972 | 0.983 | | Languages | Model | | ------------------------------------------ | ------------------------------------------------------------ | | English, Italian, French and German | oliverguhr/fullstop-punctuation-multilang-large | | English, Italian, French, German and Dutch | oliverguhr/fullstop-punctuation-multilingual-sonar-base | | Dutch | oliverguhr/fullstop-dutch-sonar-punctuation-prediction | | Languages | Model | | ------------------------------------------ | ------------------------------------------------------------ | |English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian| kredor/punctuate-all | | Catalan | softcatala/fullstop-catalan-punctuation-prediction | You can use different models by setting the model parameter: