BERT model sizes
2. BERT and Transfer Learning
Bias in BERT
How was BERT trained?
- English Wikipedia » 2.5 billion words,
- BookCorpus » 800 million words.
What tasks was BERT Trained on
- masked language modeling (MLM) The MLM task requires BERT to predict the masked-out word » BERT is conceptually blank and empirically powerful.
- next sentence prediction (NSP) The next sentence prediction task asks the question » Does the second sentence follow immediately after the first?
BERT is conceptually simple and empirically powerful.
It obtains new state-of-the-art results on 11 natural language processing tasks.
1. Encoder-Decoder Model
2. Encoder-Only Model
- Understanding of input
- Sentence calssification
- Named entity recognition
- Family of BERT models
3. Decoder-Only Model
- Generative task
- Examples:
Tasks BERT Can Do
- Text classification
- Named entiry recognition
- Question answering
Tasks BERT Does Not Do
- Text generation
- Text translation
- Text summarization
3.2 BERT model and tokenization
2022, JDAI K.K All Rights Reserved.