熟妇人妻无码中文字幕老熟妇 ,亚洲第一精品视频在线,99热精品成人免费观看

為了預(yù)訓(xùn)練第 15.8 節(jié)中實現(xiàn)的 BERT 模型，我們需要以理想的格式生成數(shù)據(jù)集，以促進(jìn)兩項預(yù)訓(xùn)練任務(wù)：掩碼語言建模和下一句預(yù)測。一方面，原始的 BERT 模型是在兩個巨大的語料庫 BookCorpus 和英文維基百科（參見第15.8.5 節(jié)）的串聯(lián)上進(jìn)行預(yù)訓(xùn)練的，這使得本書的大多數(shù)讀者難以運行。另一方面，現(xiàn)成的預(yù)訓(xùn)練 BERT 模型可能不適合醫(yī)學(xué)等特定領(lǐng)域的應(yīng)用。因此，在自定義數(shù)據(jù)集上預(yù)訓(xùn)練 BERT 變得越來越流行。為了便于演示 BERT 預(yù)訓(xùn)練，我們使用較小的語料庫 WikiText-2 ( Merity et al. , 2016 )。

與 15.3節(jié)用于預(yù)訓(xùn)練word2vec的PTB數(shù)據(jù)集相比，WikiText-2(i)保留了原有的標(biāo)點符號，適合下一句預(yù)測；(ii) 保留原始案例和編號；(iii) 大兩倍以上。

import os
import random
import torch
from d2l import torch as d2l

import os
import random
from mxnet import gluon, np, npx
from d2l import mxnet as d2l

npx.set_np()

在 WikiText-2 數(shù)據(jù)集中，每一行代表一個段落，其中在任何標(biāo)點符號及其前面的標(biāo)記之間插入空格。保留至少兩句話的段落。為了簡單起見，為了拆分句子，我們只使用句點作為分隔符。我們將在本節(jié)末尾的練習(xí)中討論更復(fù)雜的句子拆分技術(shù)。

#@save
d2l.DATA_HUB['wikitext-2'] = (
  'https://s3.amazonaws.com/research.metamind.io/wikitext/'
  'wikitext-2-v1.zip', '3c914d17d80b1459be871a5039ac23e752a53cbe')

#@save
def _read_wiki(data_dir):
  file_name = os.path.join(data_dir, 'wiki.train.tokens')
  with open(file_name, 'r') as f:
    lines = f.readlines()
  # Uppercase letters are converted to lowercase ones
  paragraphs = [line.strip().lower().split(' . ')
         for line in lines if len(line.split(' . ')) >= 2]
  random.shuffle(paragraphs)
  return paragraphs

#@save
d2l.DATA_HUB['wikitext-2'] = (
  'https://s3.amazonaws.com/research.metamind.io/wikitext/'
  'wikitext-2-v1.zip', '3c914d17d80b1459be871a5039ac23e752a53cbe')

#@save
def _read_wiki(data_dir):
  file_name = os.path.join(data_dir, 'wiki.train.tokens')
  with open(file_name, 'r') as f:
    lines = f.readlines()
  # Uppercase letters are converted to lowercase ones
  paragraphs = [line.strip().lower().split(' . ')
         for line in lines if len(line.split(' . ')) >= 2]
  random.shuffle(paragraphs)
  return paragraphs

15.9.1。為預(yù)訓(xùn)練任務(wù)定義輔助函數(shù)

下面，我們首先為兩個 BERT 預(yù)訓(xùn)練任務(wù)實現(xiàn)輔助函數(shù)：下一句預(yù)測和掩碼語言建模。這些輔助函數(shù)將在稍后將原始文本語料庫轉(zhuǎn)換為理想格式的數(shù)據(jù)集以預(yù)訓(xùn)練 BERT 時調(diào)用。

15.9.1.1。生成下一句預(yù)測任務(wù)

根據(jù)15.8.5.2 節(jié)的描述，該 _get_next_sentence函數(shù)為二元分類任務(wù)生成一個訓(xùn)練樣例。

#@save
def _get_next_sentence(sentence, next_sentence, paragraphs):
  if random.random() < 0.5:
    is_next = True
  else:
    # `paragraphs` is a list of lists of lists
    next_sentence = random.choice(random.choice(paragraphs))
    is_next = False
  return sentence, next_sentence, is_next

#@save
def _get_next_sentence(sentence, next_sentence, paragraphs):
  if random.random() < 0.5:
    is_next = True
  else:
    # `paragraphs` is a list of lists of lists
    next_sentence = random.choice(random.choice(paragraphs))
    is_next = False
  return sentence, next_sentence, is_next

以下函數(shù)paragraph通過調(diào)用該 _get_next_sentence函數(shù)從輸入生成用于下一句預(yù)測的訓(xùn)練示例。這paragraph是一個句子列表，其中每個句子都是一個標(biāo)記列表。該參數(shù) max_len指定預(yù)訓(xùn)練期間 BERT 輸入序列的最大長度。

#@save
def _get_nsp_data_from_paragraph(paragraph, paragraphs, vocab, max_len):
  nsp_data_from_paragraph = []
  for i in range(len(paragraph) - 1):
    tokens_a, tokens_b, is_next = _get_next_sentence(
      paragraph[i], paragraph[i + 1], paragraphs)
    # Consider 1 '' token and 2 '' tokens
    if len(tokens_a) + len(tokens_b) + 3 > max_len:
      continue
    tokens, segments = d2l.get_tokens_and_segments(tokens_a, tokens_b)
    nsp_data_from_paragraph.append((tokens, segments, is_next))
  return nsp_data_from_paragraph

#@save
def _get_nsp_data_from_paragraph(paragraph, paragraphs, vocab, max_len):
  nsp_data_from_paragraph = []
  for i in range(len(paragraph) - 1):
    tokens_a, tokens_b, is_next = _get_next_sentence(
      paragraph[i], paragraph[i + 1], paragraphs)
    # Consider 1 '' token and 2 '' tokens
    if len(tokens_a) + len(tokens_b) + 3 > max_len:
      continue
    tokens, segments = d2l.get_tokens_and_segments(tokens_a, tokens_b)
    nsp_data_from_paragraph.append((tokens, segments, is_next))
  return nsp_data_from_paragraph

15.9.1.2。生成掩碼語言建模任務(wù)

為了從 BERT 輸入序列為掩碼語言建模任務(wù)生成訓(xùn)練示例，我們定義了以下 _replace_mlm_tokens函數(shù)。在它的輸入中，tokens是代表BERT輸入序列的token列表，candidate_pred_positions 是BERT輸入序列的token索引列表，不包括特殊token（masked語言建模任務(wù)中不預(yù)測特殊token），num_mlm_preds表示預(yù)測（召回 15% 的隨機標(biāo)記來預(yù)測）。遵循第 15.8.5.1 節(jié)中屏蔽語言建模任務(wù)的定義，在每個預(yù)測位置，輸入可能被特殊的“”標(biāo)記或隨機標(biāo)記替換，或者保持不變。最后，該函數(shù)返回可能替換后的輸入標(biāo)記、發(fā)生預(yù)測的標(biāo)記索引以及這些預(yù)測的標(biāo)簽。

#@save
def _replace_mlm_tokens(tokens, candidate_pred_positions, num_mlm_preds,
            vocab):
  # For the input of a masked language model, make a new copy of tokens and
  # replace some of them by '' or random tokens
  mlm_input_tokens = [token for token in tokens]
  pred_positions_and_labels = []
  # Shuffle for getting 15% random tokens for prediction in the masked
  # language modeling task
  random.shuffle(candidate_pred_positions)
  for mlm_pred_position in candidate_pred_positions:
    if len(pred_positions_and_labels) >= num_mlm_preds:
      break
    masked_token = None
    # 80% of the time: replace the word with the '' token
    if random.random() < 0.8:
      masked_token = ''
    else:
      # 10% of the time: keep the word unchanged
      if random.random() < 0.5:
        masked_token = tokens[mlm_pred_position]
      # 10% of the time: replace the word with a random word
      else:
        masked_token = random.choice(vocab.idx_to_token)
    mlm_input_tokens[mlm_pred_position] = masked_token
    pred_positions_and_labels.append(
      (mlm_pred_position, tokens[mlm_pred_position]))
  return mlm_input_tokens, pred_positions_and_labels

#@save
def _replace_mlm_tokens(tokens, candidate_pred_positions, num_mlm_preds,
            vocab):
  # For the input of a masked language model, make a new copy of tokens and
  # replace some of them by '' or random tokens
  mlm_input_tokens = [token for token in tokens]
  pred_positions_and_labels = []
  # Shuffle for getting 15% random tokens for prediction in the masked
  # language modeling task
  random.shuffle(candidate_pred_positions)
  for mlm_pred_position in candidate_pred_positions:
    if len(pred_positions_and_labels) >= num_mlm_preds:
      break
    masked_token = None
    # 80% of the time: replace the word with the '' token
    if random.random() < 0.8:
      masked_token = ''
    else:
      # 10% of the time: keep the word unchanged
      if random.random() < 0.5:
        masked_token = tokens[mlm_pred_position]
      # 10% of the time: replace the word with a random word
      else:
        masked_token = random.choice(vocab.idx_to_token)
    mlm_input_tokens[mlm_pred_position] = masked_token
    pred_positions_and_labels.append(
      (mlm_pred_position, tokens[mlm_pred_position]))
  return mlm_input_tokens, pred_positions_and_labels

通過調(diào)用上述_replace_mlm_tokens函數(shù)，以下函數(shù)將 BERT 輸入序列 ( tokens) 作為輸入并返回輸入標(biāo)記的索引（在可能的標(biāo)記替換之后，如第15.8.5.1 節(jié)所述）、發(fā)生預(yù)測的標(biāo)記索引和標(biāo)簽這些預(yù)測的指標(biāo)。

#@save
def _get_mlm_data_from_tokens(tokens, vocab):
  candidate_pred_positions = []
  # `tokens` is a list of strings
  for i, token in enumerate(tokens):
    # Special tokens are not predicted in the masked language modeling
    # task
    if token in ['', '']:
      continue
    candidate_pred_positions.append(i)
  # 15% of random tokens are predicted in the masked language modeling task
  num_mlm_preds = max(1, round(len(tokens) * 0.15))
  mlm_input_tokens, pred_positions_and_labels = _replace_mlm_tokens(
    tokens, candidate_pred_positions, num_mlm_preds, vocab)
  pred_positions_and_labels = sorted(pred_positions_and_labels,
                    key=lambda x: x[0])
  pred_positions = [v[0] for v in pred_positions_and_labels]
  mlm_pred_labels = [v[1] for v in pred_positions_and_labels]
  return vocab[mlm_input_tokens], pred_positions, vocab[mlm_pred_labels]

#@save
def _get_mlm_data_from_tokens(tokens, vocab):
  candidate_pred_positions = []
  # `tokens` is a list of strings
  for i, token in enumerate(tokens):
    # Special tokens are not predicted in the masked language modeling
    # task
    if token in ['', '']:
      continue
    candidate_pred_positions.append(i)
  # 15% of random tokens are predicted in the masked language modeling task
  num_mlm_preds = max(1, round(len(tokens) * 0.15))
  mlm_input_tokens, pred_positions_and_labels = _replace_mlm_tokens(
    tokens, candidate_pred_positions, num_mlm_preds, vocab)
  pred_positions_and_labels = sorted(pred_positions_and_labels,
                    key=lambda x: x[0])
  pred_positions = [v[0] for v in pred_positions_and_labels]
  mlm_pred_labels = [v[1] for v in pred_positions_and_labels]
  return vocab[mlm_input_tokens], pred_positions, vocab[mlm_pred_labels]

15.9.2。將文本轉(zhuǎn)換為預(yù)訓(xùn)練數(shù)據(jù)集

現(xiàn)在我們幾乎準(zhǔn)備好定制一個Dataset用于預(yù)訓(xùn)練 BERT 的類。在此之前，我們?nèi)匀恍枰x一個輔助函數(shù) _pad_bert_inputs來將特殊的“”標(biāo)記附加到輸入中。它的參數(shù)examples包含輔助函數(shù) _get_nsp_data_from_paragraph和_get_mlm_data_from_tokens兩個預(yù)訓(xùn)練任務(wù)的輸出。

#@save
def _pad_bert_inputs(examples, max_len, vocab):
  max_num_mlm_preds = round(max_len * 0.15)
  all_token_ids, all_segments, valid_lens, = [], [], []
  all_pred_positions, all_mlm_weights, all_mlm_labels = [], [], []
  nsp_labels = []
  for (token_ids, pred_positions, mlm_pred_label_ids, segments,
     is_next) in examples:
    all_token_ids.append(torch.tensor(token_ids + [vocab['']] * (
      max_len - len(token_ids)), dtype=torch.long))
    all_segments.append(torch.tensor(segments + [0] * (
      max_len - len(segments)), dtype=torch.long))
    # `valid_lens` excludes count of '' tokens
    valid_lens.append(torch.tensor(len(token_ids), dtype=torch.float32))
    all_pred_positions.append(torch.tensor(pred_positions + [0] * (
      max_num_mlm_preds - len(pred_positions)), dtype=torch.long))
    # Predictions of padded tokens will be filtered out in the loss via
    # multiplication of 0 weights
    all_mlm_weights.append(
      torch.tensor([1.0] * len(mlm_pred_label_ids) + [0.0] * (
        max_num_mlm_preds - len(pred_positions)),
        dtype=torch.float32))
    all_mlm_labels.append(torch.tensor(mlm_pred_label_ids + [0] * (
      max_num_mlm_preds - len(mlm_pred_label_ids)), dtype=torch.long))
    nsp_labels.append(torch.tensor(is_next, dtype=torch.long))
  return (all_token_ids, all_segments, valid_lens, all_pred_positions,
      all_mlm_weights, all_mlm_labels, nsp_labels)

#@save
def _pad_bert_inputs(examples, max_len, vocab):
  max_num_mlm_preds = round(max_len * 0.15)
  all_token_ids, all_segments, valid_lens, = [], [], []
  all_pred_positions, all_mlm_weights, all_mlm_labels = [], [], []
  nsp_labels = []
  for (token_ids, pred_positions, mlm_pred_label_ids, segments,
     is_next) in examples:
    all_token_ids.append(np.array(token_ids + [vocab['']] * (
      max_len - len(token_ids)), dtype='int32'))
    all_segments.append(np.array(segments + [0] * (
      max_len - len(segments)), dtype='int32'))
    # `valid_lens` excludes count of '' tokens
    valid_lens.append(np.array(len(token_ids), dtype='float32'))
    all_pred_positions.append(np.array(pred_positions + [0] * (
      max_num_mlm_preds - len(pred_positions)), dtype='int32'))
    # Predictions of padded tokens will be filtered out in the loss via
    # multiplication of 0 weights
    all_mlm_weights.append(
      np.array([1.0] * len(mlm_pred_label_ids) + [0.0] * (
        max_num_mlm_preds - len(pred_positions)), dtype='float32'))
    all_mlm_labels.append(np.array(mlm_pred_label_ids + [0] * (
      max_num_mlm_preds - len(mlm_pred_label_ids)), dtype='int32'))
    nsp_labels.append(np.array(is_next))
  return (all_token_ids, all_segments, valid_lens, all_pred_positions,
      all_mlm_weights, all_mlm_labels, nsp_labels)

將兩個預(yù)訓(xùn)練任務(wù)生成訓(xùn)練樣例的輔助函數(shù)和填充輸入的輔助函數(shù)放在一起，我們自定義如下類_WikiTextDataset作為預(yù)訓(xùn)練 BERT 的 WikiText-2 數(shù)據(jù)集。通過實現(xiàn)該 __getitem__功能，我們可以任意訪問從 WikiText-2 語料庫中的一對句子生成的預(yù)訓(xùn)練（掩碼語言建模和下一句預(yù)測）示例。

原始 BERT 模型使用詞匯量為 30000 的 WordPiece 嵌入( Wu et al. , 2016 )。WordPiece 的標(biāo)記化方法是對15.6.2 節(jié)中原始字節(jié)對編碼算法的輕微修改。為簡單起見，我們使用該d2l.tokenize函數(shù)進(jìn)行標(biāo)記化。過濾掉出現(xiàn)次數(shù)少于五次的不常見標(biāo)記。

#@save
class _WikiTextDataset(torch.utils.data.Dataset):
  def __init__(self, paragraphs, max_len):
    # Input `paragraphs[i]` is a list of sentence strings representing a
    # paragraph; while output `paragraphs[i]` is a list of sentences
    # representing a paragraph, where each sentence is a list of tokens
    paragraphs = [d2l.tokenize(
      paragraph, token='word') for paragraph in paragraphs]
    sentences = [sentence for paragraph in paragraphs
           for sentence in paragraph]
    self.vocab = d2l.Vocab(sentences, min_freq=5, reserved_tokens=[
      '', '', '', ''])
    # Get data for the next sentence prediction task
    examples = []
    for paragraph in paragraphs:
      examples.extend(_get_nsp_data_from_paragraph(
        paragraph, paragraphs, self.vocab, max_len))
    # Get data for the masked language model task
    examples = [(_get_mlm_data_from_tokens(tokens, self.vocab)
           + (segments, is_next))
           for tokens, segments, is_next in examples]
    # Pad inputs
    (self.all_token_ids, self.all_segments, self.valid_lens,
     self.all_pred_positions, self.all_mlm_weights,
     self.all_mlm_labels, self.nsp_labels) = _pad_bert_inputs(
      examples, max_len, self.vocab)

  def __getitem__(self, idx):
    return (self.all_token_ids[idx], self.all_segments[idx],
        self.valid_lens[idx], self.all_pred_positions[idx],
        self.all_mlm_weights[idx], self.all_mlm_labels[idx],
        self.nsp_labels[idx])

  def __len__(self):
    return len(self.all_token_ids)

#@save
class _WikiTextDataset(gluon.data.Dataset):
  def __init__(self, paragraphs, max_len):
    # Input `paragraphs[i]` is a list of sentence strings representing a
    # paragraph; while output `paragraphs[i]` is a list of sentences
    # representing a paragraph, where each sentence is a list of tokens
    paragraphs = [d2l.tokenize(
      paragraph, token='word') for paragraph in paragraphs]
    sentences = [sentence for paragraph in paragraphs
           for sentence in paragraph]
    self.vocab = d2l.Vocab(sentences, min_freq=5, reserved_tokens=[
      '', '', '', ''])
    # Get data for the next sentence prediction task
    examples = []
    for paragraph in paragraphs:
      examples.extend(_get_nsp_data_from_paragraph(
        paragraph, paragraphs, self.vocab, max_len))
    # Get data for the masked language model task
    examples = [(_get_mlm_data_from_tokens(tokens, self.vocab)
           + (segments, is_next))
           for tokens, segments, is_next in examples]
    # Pad inputs
    (self.all_token_ids, self.all_segments, self.valid_lens,
     self.all_pred_positions, self.all_mlm_weights,
     self.all_mlm_labels, self.nsp_labels) = _pad_bert_inputs(
      examples, max_len, self.vocab)

  def __getitem__(self, idx):
    return (self.all_token_ids[idx], self.all_segments[idx],
        self.valid_lens[idx], self.all_pred_positions[idx],
        self.all_mlm_weights[idx], self.all_mlm_labels[idx],
        self.nsp_labels[idx])

  def __len__(self):
    return len(self.all_token_ids)

通過使用_read_wiki函數(shù)和_WikiTextDataset類，我們定義了以下內(nèi)容load_data_wiki來下載 WikiText-2 數(shù)據(jù)集并從中生成預(yù)訓(xùn)練示例。

#@save
def load_data_wiki(batch_size, max_len):
  """Load the WikiText-2 dataset."""
  num_workers = d2l.get_dataloader_workers()
  data_dir = d2l.download_extract('wikitext-2', 'wikitext-2')
  paragraphs = _read_wiki(data_dir)
  train_set = _WikiTextDataset(paragraphs, max_len)
  train_iter = torch.utils.data.DataLoader(train_set, batch_size,
                    shuffle=True, num_workers=num_workers)
  return train_iter, train_set.vocab

#@save
def load_data_wiki(batch_size, max_len):
  """Load the WikiText-2 dataset."""
  num_workers = d2l.get_dataloader_workers()
  data_dir = d2l.download_extract('wikitext-2', 'wikitext-2')
  paragraphs = _read_wiki(data_dir)
  train_set = _WikiTextDataset(paragraphs, max_len)
  train_iter = gluon.data.DataLoader(train_set, batch_size, shuffle=True,
                    num_workers=num_workers)
  return train_iter, train_set.vocab

將批量大小設(shè)置為 512，將 BERT 輸入序列的最大長度設(shè)置為 64，我們打印出 BERT 預(yù)訓(xùn)練示例的小批量形狀。請注意，在每個 BERT 輸入序列中，10 (64×0.15) 位置是為掩碼語言建模任務(wù)預(yù)測的。

batch_size, max_len = 512, 64
train_iter, vocab = load_data_wiki(batch_size, max_len)

for (tokens_X, segments_X, valid_lens_x, pred_positions_X, mlm_weights_X,
   mlm_Y, nsp_y) in train_iter:
  print(tokens_X.shape, segments_X.shape, valid_lens_x.shape,
     pred_positions_X.shape, mlm_weights_X.shape, mlm_Y.shape,
     nsp_y.shape)
  break

Downloading ../data/wikitext-2-v1.zip from https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip...
torch.Size([512, 64]) torch.Size([512, 64]) torch.Size([512]) torch.Size([512, 10]) torch.Size([512, 10]) torch.Size([512, 10]) torch.Size([512])

batch_size, max_len = 512, 64
train_iter, vocab = load_data_wiki(batch_size, max_len)

for (tokens_X, segments_X, valid_lens_x, pred_positions_X, mlm_weights_X,
   mlm_Y, nsp_y) in train_iter:
  print(tokens_X.shape, segments_X.shape, valid_lens_x.shape,
     pred_positions_X.shape, mlm_weights_X.shape, mlm_Y.shape,
     nsp_y.shape)
  break

(512, 64) (512, 64) (512,) (512, 10) (512, 10) (512, 10) (512,)

最后，讓我們看一下詞匯量。即使在過濾掉不常見的標(biāo)記后，它仍然比 PTB 數(shù)據(jù)集大兩倍以上。

len(vocab)

len(vocab)

15.9.3。概括

與 PTB 數(shù)據(jù)集相比，WikiText-2 數(shù)據(jù)集保留了原始標(biāo)點符號、大小寫和數(shù)字，并且大了一倍多。

我們可以任意訪問從 WikiText-2 語料庫中的一對句子生成的預(yù)訓(xùn)練（掩碼語言建模和下一句預(yù)測）示例。

15.9.4。練習(xí)

為簡單起見，句點用作拆分句子的唯一分隔符。嘗試其他句子拆分技術(shù)，例如 spaCy 和 NLTK。以 NLTK 為例。您需要先安裝 NLTK：. 在代碼中，首先. 然后，下載 Punkt 句子分詞器：。要拆分諸如之類的句子，調(diào)用將返回兩個句子字符串的列表：。pip install nltkimport nltknltk.download('punkt')sentences = 'This is great ! Why not ?'nltk.tokenize.sent_tokenize(sentences)['This is great !', 'Why not ?']

如果我們不過濾掉任何不常見的標(biāo)記，詞匯表的大小是多少？

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報投訴

數(shù)據(jù)集

數(shù)據(jù)集

+關(guān)注

關(guān)注
4

文章
1200

瀏覽量
24621
pytorch

pytorch

+關(guān)注

關(guān)注
2

文章
802

瀏覽量
13115

Pytorch模型訓(xùn)練實用PDF教程【中文】

本教程以實際應(yīng)用、工程開發(fā)為目的，著重介紹模型訓(xùn)練過程中遇到的實際問題和方法。在機器學(xué)習(xí)模型開發(fā)中，主要涉及三大部分，分別是數(shù)據(jù)、模型和損失函數(shù)及優(yōu)化器。本文也按順序的依次介紹數(shù)據(jù)、模型和損失函數(shù)

發(fā)表于 12-21 09:18

怎樣使用PyTorch Hub去加載YOLOv5模型

在Python>=3.7.0環(huán)境中安裝requirements.txt，包括PyTorch>=1.7。模型和數(shù)據(jù)集從最新的 YOLOv5版本自動下載。簡單示例此示例從

發(fā)表于 07-22 16:02

1024塊TPU在燃燒！將BERT預(yù)訓(xùn)練模型的訓(xùn)練時長從3天縮減到了76分鐘

BERT是一種先進(jìn)的深度學(xué)習(xí)模型，它建立在語義理解的深度雙向轉(zhuǎn)換器上。當(dāng)我們增加batch size的大?。ㄈ绯^8192）時，此前的模型訓(xùn)練技巧在BERT上表現(xiàn)得并不好。BERT

發(fā)表于 04-04 16:27 ?1.1w次閱讀

改進(jìn)版BERT——SpanBERT，通過表示和預(yù)測分詞提升預(yù)訓(xùn)練效果！

在本文中，作者提出了一個新的分詞級別的預(yù)訓(xùn)練方法 SpanBERT ，其在現(xiàn)有任務(wù)中的表現(xiàn)優(yōu)于 BERT ，并在問答、指代消解等分詞選擇任務(wù)中取得了較大的進(jìn)展。對 BERT 模型進(jìn)行了

發(fā)表于 08-02 09:29 ?8765次閱讀

圖解BERT預(yù)訓(xùn)練模型！

BERT的發(fā)布是這個領(lǐng)域發(fā)展的最新的里程碑之一，這個事件標(biāo)志著NLP 新時代的開始。BERT模型打破了基于語言處理的任務(wù)的幾個記錄。在 BERT 的論文發(fā)布后不久，這個團(tuán)隊還公開了模型的代碼，并提供了模型的下載版本

發(fā)表于 11-24 10:08 ?3604次閱讀

基于BERT的中文科技NLP預(yù)訓(xùn)練模型

深度學(xué)習(xí)模型應(yīng)用于自然語言處理任務(wù)時依賴大型、高質(zhì)量的人工標(biāo)注數(shù)據(jù)集。為降低深度學(xué)習(xí)模型對大型數(shù)據(jù)集的依賴，提出一種基于BERT的中文科技自

發(fā)表于 05-07 10:08 ?14次下載

Multilingual多語言預(yù)訓(xùn)練語言模型的套路

Facebook在Crosslingual language model pretraining（NIPS 2019）一文中提出XLM預(yù)訓(xùn)練多語言模型，整體思路基于BERT，并提出了針對多語言

發(fā)表于 05-05 15:23 ?2909次閱讀

預(yù)訓(xùn)練數(shù)據(jù)大小對于預(yù)訓(xùn)練模型的影響

BERT類模型的工作模式簡單，但取得的效果也是極佳的，其在各項任務(wù)上的良好表現(xiàn)主要得益于其在大量無監(jiān)督文本上學(xué)習(xí)到的文本表征能力。那么如何從語言學(xué)的特征角度來衡量一個預(yù)訓(xùn)練模型的究竟學(xué)習(xí)到了什么樣的語言學(xué)文本知識呢？

發(fā)表于 03-03 11:20 ?1335次閱讀

什么是預(yù)訓(xùn)練 AI 模型？

預(yù)訓(xùn)練 AI 模型是為了完成特定任務(wù)而在大型數(shù)據(jù)集上訓(xùn)練的深度學(xué)習(xí)模型。這些模型既可以直接使用，也可以根據(jù)不同行業(yè)的應(yīng)用需求進(jìn)行自定義。如

發(fā)表于 04-04 01:45 ?1391次閱讀

PyTorch教程11.9之使用Transformer進(jìn)行大規(guī)模預(yù)訓(xùn)練

電子發(fā)燒友網(wǎng)站提供《PyTorch教程11.9之使用Transformer進(jìn)行大規(guī)模預(yù)訓(xùn)練.pdf》資料免費下載

發(fā)表于 06-05 15:07 ?0次下載

PyTorch教程15.4之預(yù)訓(xùn)練word2vec

電子發(fā)燒友網(wǎng)站提供《PyTorch教程15.4之預(yù)訓(xùn)練word2vec.pdf》資料免費下載

發(fā)表于 06-05 10:58 ?0次下載

PyTorch教程15.9之預(yù)訓(xùn)練BERT的數(shù)據(jù)集

電子發(fā)燒友網(wǎng)站提供《PyTorch教程15.9之預(yù)訓(xùn)練BERT的數(shù)據(jù)

發(fā)表于 06-05 11:06 ?0次下載

PyTorch教程15.10之預(yù)訓(xùn)練BERT

電子發(fā)燒友網(wǎng)站提供《PyTorch教程15.10之預(yù)訓(xùn)練BERT.pdf》資料免費下載

發(fā)表于 06-05 10:53 ?0次下載

PyTorch如何訓(xùn)練自己的數(shù)據(jù)集

PyTorch是一個廣泛使用的深度學(xué)習(xí)框架，它以其靈活性、易用性和強大的動態(tài)圖特性而聞名。在訓(xùn)練深度學(xué)習(xí)模型時，數(shù)據(jù)集是不可或缺的組成部分。然而，很多時候，我們可能需要使用自己的

發(fā)表于 07-02 14:09 ?1183次閱讀

pytorch如何訓(xùn)練自己的數(shù)據(jù)

本文將詳細(xì)介紹如何使用PyTorch框架來訓(xùn)練自己的數(shù)據(jù)。我們將從數(shù)據(jù)準(zhǔn)備、模型構(gòu)建、訓(xùn)練過程、評估和測試等方面進(jìn)行講解。環(huán)境搭建首先，

發(fā)表于 07-11 10:04 ?421次閱讀

搜索歷史

PyTorch教程-15.9。預(yù)訓(xùn)練 BERT 的數(shù)據(jù)集

評論

Pytorch模型訓(xùn)練實用PDF教程【中文】

怎樣使用PyTorch Hub去加載YOLOv5模型

1024塊TPU在燃燒！將BERT預(yù)訓(xùn)練模型的訓(xùn)練時長從3天縮減到了76分鐘

改進(jìn)版BERT——SpanBERT，通過表示和預(yù)測分詞提升預(yù)訓(xùn)練效果！

圖解BERT預(yù)訓(xùn)練模型！

基于BERT的中文科技NLP預(yù)訓(xùn)練模型

Multilingual多語言預(yù)訓(xùn)練語言模型的套路

預(yù)訓(xùn)練數(shù)據(jù)大小對于預(yù)訓(xùn)練模型的影響

什么是預(yù)訓(xùn)練 AI 模型？

PyTorch教程11.9之使用Transformer進(jìn)行大規(guī)模預(yù)訓(xùn)練

PyTorch教程15.4之預(yù)訓(xùn)練word2vec

PyTorch教程15.9之預(yù)訓(xùn)練BERT的數(shù)據(jù)集

PyTorch教程15.10之預(yù)訓(xùn)練BERT

PyTorch如何訓(xùn)練自己的數(shù)據(jù)集

pytorch如何訓(xùn)練自己的數(shù)據(jù)

搜索歷史

PyTorch教程-15.9。預(yù)訓(xùn)練 BERT 的數(shù)據(jù)集

評論

PyTorch教程-15.9。預(yù)訓(xùn)練 BERT 的數(shù)據(jù)集