bert-large-cased-whole-word-masking-finetuned-squad
22.1K
1
3 languages
license:apache-2.0
by
google-bert
Language Model
OTHER
Fair
22K downloads
Community-tested
Edge AI:
Mobile
Laptop
Server
Unknown
Mobile
Laptop
Server
Quick Summary
BERT large model (cased) whole word masking finetuned on SQuAD Pretrained model on English language using a masked language modeling (MLM) objective.
Code Examples
Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Training proceduretext
[CLS] Sentence A [SEP] Sentence B [SEP]Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \Fine-tuningtext
python -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \
--model_name_or_path bert-large-cased-whole-word-masking \
--dataset_name squad \
--do_train \
--do_eval \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./examples/models/wwm_cased_finetuned_squad/ \
--per_device_eval_batch_size=3 \
--per_device_train_batch_size=3 \BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}BibTeX entry and citation infobibtex
@article{DBLP:journals/corr/abs-1810-04805,
author = {Jacob Devlin and
Ming{-}Wei Chang and
Kenton Lee and
Kristina Toutanova},
title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
Understanding},
journal = {CoRR},
volume = {abs/1810.04805},
year = {2018},
url = {http://arxiv.org/abs/1810.04805},
archivePrefix = {arXiv},
eprint = {1810.04805},
timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}Deploy This Model
Production-ready deployment in minutes
Together.ai
Instant API access to this model
Production-ready inference API. Start free, scale to millions.
Try Free APIReplicate
One-click model deployment
Run models in the cloud with simple API. No DevOps required.
Deploy NowDisclosure: We may earn a commission from these partners. This helps keep LLMYourWay free.