BERT Large Pretrained checkpoint using LAMB optimizer on 16 DGX2H nodes.
Model-scripts available in the NGC Model scripts registry.
BERT, or Bidirectional Encoder Representations from Transformers, is a new method of pre-training language representations which obtains state-of-the-art results on a wide array of Natural Language Processing (NLP) tasks. This model is based on BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding paper. NVIDIA's BERT 19.08 is an optimized version of Google's official implementation, leveraging mixed precision arithmetic and tensor cores on V100 GPUS for faster training times while maintaining target accuracy.
Model scripts are available at the NGC model scripts registry. For researchers aiming to improve upon or tailor the model, we recommend starting with information in README. It captures details about the architecture, accuracy and performance result, and corresponding scripts.
For a quick start follow the sections on inference in the model-scripts quick start guide