NGC | Catalog
CatalogModelsMultidataset-QuartzNet15x5

Multidataset-QuartzNet15x5

Logo for Multidataset-QuartzNet15x5
Description
This is a checkpoint for the QuartzNet 15x5 model trained on multiple datasets in NeMo.
Publisher
NVIDIA
Latest Version
2
Modified
April 4, 2023
Size
67.79 MB

Overview

This is a checkpoint for the QuartzNet 15x5 model that was trained in NeMo on six datasets: LibriSpeech, Mozilla Common Voice (validated clips from en_1488h_2019-12-10), WSJ, Fisher, Switchboard, and NSC Singapore English. It was trained with Apex/Amp optimization level O1 for 600 epochs.

The model achieves a WER of 3.79% on LibriSpeech dev-clean, and a WER of 10.05% on dev-other.

Please be sure to download the latest version in order to ensure compatibility with the latest NeMo release.

  • QuartzNet15x5-En-Base.nemo - a compressed tarball that contains the encoder and decoder checkpoint, as well as the associated config file.

Documentation

Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at [https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html]

Usage example:

python examples/asr/speech2text_infer.py --asr_model=QuartzNet15x5-En --dataset=test.json

You can also grab this model directly from your code by including this line:

asr_model = nemo_asr.models.ASRConvCTCModel.from_pretrained(model_info='QuartzNet15x5-En')