NGC | Catalog
CatalogModelsTTS En TalkNet

TTS En TalkNet

Logo for TTS En TalkNet
Description
Speech Synthesis model trained on female English speech
Publisher
NVIDIA
Latest Version
1.0.0rc1
Modified
April 4, 2023
Size
48.75 MB

Model Overview

TalkNet is an non-autoregressive model that generates mel spectrograms from text.

Model Architecture

For more information about the model architecture, see the TalkNet paper [1,2].

Training

This model is trained on LJSpeech sampled at 22050Hz, and has been tested on generating female English voices with an American accent.

Performance

No performance information available at this time.

How to Use this Model

This model can be automatically loaded from NGC.

from nemo.collections.tts.models import TalkNetSpectModel, TalkNetPitchModel, TalkNetDursModel
pretrained_model = "tts_en_talknet"
model = TalkNetSpectModel.from_pretrained(pretrained_model)
model.add_module('_pitch_model', TalkNetPitchModel.from_pretrained(pretrained_model))
model.add_module('_durs_model', TalkNetDursModel.from_pretrained(pretrained_model))

Input

This model accepts batches of text.

Output

This model generates mel spectrograms.

Limitations

This checkpoint only works well with vocoders that were trained on 22050Hz data. Otherwise, the generated audio may be scratchy or choppy-sounding.

Versions

1.0.0rc1 (current): The original version released with NeMo 1.0.0rc1.

References

[1] TalkNet Paper [2] TalkNet 2 Paper

Licence

License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.