NGC | Catalog
CatalogModelsTransformer-BIG-en-de-8K for NeMo

Transformer-BIG-en-de-8K for NeMo

Logo for Transformer-BIG-en-de-8K for NeMo
Description
This model is similar to the original Transformer-Big model from "Attention is all you need", except that it uses 8K vocabluary
Publisher
NVIDIA
Latest Version
1
Modified
April 4, 2023
Size
385.57 MB

Overview

Transformer Encoder, Decoder and Projection modules' checkpoints available here are trained with Neural Modules Toolkit. NVIDIA’s Apex/Amp O1 optimization level was used for training on V100 GPUs

  • bpe8k_yttm.model - pretrained tokenizer model
  • TransformerDecoderNM_2-STEP-37000.pt - pretrained decoder module
  • TransformerEncoderNM_1-STEP-37000.pt - pretrained encoder module
  • TransformerLogSoftmaxNM_3-STEP-37000.pt - pre-trained projection layer

Documentation

Refer to documentation at https://github.com/NVIDIA/NeMo

Usage example: Put your saved checkpoint into below.

Run nmt_tutorial.py (from NeMo's NLP examples) in an interactive mode:

python nmt_tutorial.py --tokenizer_model bpe8k_yttm.model
--eval_datasets test --optimizer novograd --d_model 1024
--d_inner 4096 --num_layers 6 --num_attn_heads 16
--checkpoint_dir --interactive