Transformer Encoder, Decoder and Projection modules' checkpoints available here are trained with Neural Modules Toolkit. NVIDIA’s Apex/Amp O1 optimization level was used for training on V100 GPUs
Refer to documentation at https://github.com/NVIDIA/NeMo
Usage example: Put your saved checkpoint into below.
Run nmt_tutorial.py (from NeMo's NLP examples) in an interactive mode:
python nmt_tutorial.py --tokenizer_model bpe8k_yttm.model
--eval_datasets test --optimizer novograd --d_model 1024
--d_inner 4096 --num_layers 6 --num_attn_heads 16
--checkpoint_dir --interactive