Advertisement

ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators

ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators This video explains the new Replaced Token Detection pre-training objective introduced in ELECTRA. ELECTRA is much more compute efficient due to defining the loss on the entire input sequence and avoiding the introduction of the [MASK] token into the self-supervised learning task. ELECTRA-small is trained on 1 GPU for 4 days and outperforms GPT trained with 30x more compute. ELECTRA is on par with RoBERTa and XLNet with 1/4 of the compute and surpasses those models with the same level of compute!
Thanks for watching! Please Subscribe!

Paper Link:
ELECTRA:
BERT:

Generators

Post a Comment

0 Comments