retrieva-jp

15 models • 1 total models in database
Sort by:

t5-small-short

This is a T5 v1.1 model, pre-trained on a Japanese corpus. T5 is a Transformer-based Encoder-Decoder model, now in v1.1, with the following improvements over the original T5. - GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 . - Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning. - no parameter sharing between embedding and classifier layer - "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger dmodel and smaller numheads and dff. This model is based on T5 v1.1. It was pre-trained on a Japanese corpus. For the Japanese corpus, Japanese Wikipedia and mC4/ja were used. - Developed by: Retrieva, Inc. - Model type: T5 v1.1 - Language(s) (NLP): Japanese - License: CC-BY-SA 4.0 Although commercial use is permitted, we kindly request that you contact us beforehand. We use T5X (https://github.com/google-research/t5x) for the training of this model, and it has been converted to the Huggingface transformer format. The training data used is - The Japanese part of the multilingual C4(mC4/ja). - Japanese Wikipedia(20220920). Preprocessing The following filtering is done - Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese. - Whitelist-style filtering using the top level domain of URL to remove affiliate sites. - dropout rate: 0.0 - batch size: 256 - fp32 - input length: 512 - output length: 114 - Otherwise, the default value of T5X (https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t511/small.gin) is followed, including the following. - optimizer: Adafactor - baselearningrate: 1.0 - warmup steps: 10000 Model Architecture and Objective Model architecture. - T5 v1.1(https://github.com/google-research/text-to-text-transfer-transformer/blob/main/releasedcheckpoints.md#t511) - Size: Small(~77 million parameters)

license:cc-by-sa-4.0
3,461
2

t5-large-long

license:cc-by-sa-4.0
1,820
9

t5-base-long

This is a T5 v1.1 model, pre-trained on a Japanese corpus. T5 is a Transformer-based Encoder-Decoder model, now in v1.1, with the following improvements over the original T5. - GEGLU activation in feed-forward hidden layer, rather than ReLU - see https://arxiv.org/abs/2002.05202 . - Dropout was turned off in pre-training (quality win). Dropout should be re-enabled during fine-tuning. - no parameter sharing between embedding and classifier layer - "xl" and "xxl" replace "3B" and "11B". The model shapes are a bit different - larger dmodel and smaller numheads and dff. This model is based on T5 v1.1. It was pre-trained on a Japanese corpus. For the Japanese corpus, Japanese Wikipedia and mC4/ja were used. - Developed by: Retrieva, Inc. - Model type: T5 v1.1 - Language(s) (NLP): Japanese - License: CC-BY-SA 4.0 Although commercial use is permitted, we kindly request that you contact us beforehand. We use T5X (https://github.com/google-research/t5x) for the training of this model, and it has been converted to the Huggingface transformer format. The training data used is - The Japanese part of the multilingual C4(mC4/ja). - Japanese Wikipedia(20220920). Preprocessing The following filtering is done - Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese. - Whitelist-style filtering using the top level domain of URL to remove affiliate sites. - dropout rate: 0.0 - batch size: 256 - fp32 - input length: 512 - output length: 114 - Otherwise, the default value of T5X (https://github.com/google-research/t5x/blob/main/t5x/examples/t5/t511/base.gin) is followed, including the following. - optimizer: Adafactor - baselearningrate: 1.0 - warmup steps: 10000 Model Architecture and Objective Model architecture. - T5 v1.1(https://github.com/google-research/text-to-text-transfer-transformer/blob/main/releasedcheckpoints.md#t511) - Size: Base(~220 million parameters)

license:cc-by-sa-4.0
1,594
2

t5-xl

license:cc-by-sa-4.0
400
16

t5-base-medium

license:cc-by-sa-4.0
268
1

amber-large

NaNK
license:apache-2.0
239
7

amber-base

NaNK
license:apache-2.0
235
3

bert-1.3b

NaNK
license:apache-2.0
70
15

t5-small-medium

license:cc-by-sa-4.0
56
1

Llama-3-Swallow-8B-Instruct-v0.1-kokoroe

NaNK
llama
17
0

t5-small-long

license:cc-by-sa-4.0
9
3

t5-large-short

license:cc-by-sa-4.0
3
2

t5-large-medium

license:cc-by-sa-4.0
2
2

t5-base-short

license:cc-by-sa-4.0
1
2

japanese-spoken-language-bert

license:apache-2.0
0
1