自然言語処理 BERTモデル | 文章分類

Transformers are taking the natural language processing (NLP) world by storm. In this document, I'll teach you all about this go-to architecture for NLP and computer vision tasks and must-have skill in your Artificial Intelligence toolkit. I will use a hands-on approach to show you the basics of working with transformers in NLP and production. I go over BERT model sizes, bias in BERT, and how BERT was trained. I will explore transfer learning, shows you how to use the BERT model and tokenization, and covers text classification.

1. NLP and Transformers

Transformer used in NLP

Have you seen the terms Bert or GPT-3 in articles online and wonder what they mean? These are examples of large language models, and their underlying architecture is based on the transform architecture.

Transformers

  • proposed by a team of researchers from Google in 2017
  • paper called "Attention Is All You Need."
  • turning point in NLP.

Transformer in production

  • Bidirectional
  • Encoder
  • Representations from
  • Transformers

Transformer history

TransformerTimeLine

BERT model sizes

2. BERT and Transfer Learning

Bias in BERT

How was BERT trained?

  1. English Wikipedia » 2.5 billion words,
  2. BookCorpus » 800 million words.

What tasks was BERT Trained on

  • masked language modeling (MLM) The MLM task requires BERT to predict the masked-out word » BERT is conceptually blank and empirically powerful.
  • next sentence prediction (NSP) The next sentence prediction task asks the question » Does the second sentence follow immediately after the first?

BERT is conceptually simple and empirically powerful.

It obtains new state-of-the-art results on 11 natural language processing tasks.

3. Transformer Architecture and BERT

3.1 Transformer: Archietcture overview

TansformerModelArchitecture

TansformerArchitecture1

TansformerArchitecture

1. Encoder-Decoder Model

  • Generative tasks
  • BART
  • T5

2. Encoder-Only Model

  • Understanding of input
    • Sentence calssification
    • Named entity recognition
  • Family of BERT models
    • BERT
    • RoBERT
    • DistilBERT

3. Decoder-Only Model

  • Generative task
  • Examples:
    • GPT
    • GPT-2
    • GPT-3

Tasks BERT Can Do

  • Text classification
  • Named entiry recognition
  • Question answering

Tasks BERT Does Not Do

  • Text generation
  • Text translation
  • Text summarization

3.2 BERT model and tokenization