For downloads and more information, please view on a desktop device.

Description

Fine-tune a pre-trained BERT model with the SQuAD dataset, optimize for inference using TensorRT and deploy with Triton Inference Server on Google Cloud AI Platform using Custom Containers

Publisher

NVIDIA

Latest Version

v1.0

Modified

April 4, 2023

Compressed Size

1.14 MB

Description

This repo contains a set of code and scripts that performs the followings on Google Cloud AI Platform with assets from NGC:

Finetuning BERT model on SQuAD dataset
Optimizing finetuned BERT model with TensorRT
Deploying BERT TensorRT engine with Triton Inference Server

Each folder, named after each tasks, contains an individual set of scripts and code, along with the readme that shows the general steps to run them. The scripts, as-is, are NOT READY TO USE as the scripts may contain commands with user-specific information such as Google Cloud Storage bucket name, or user's Google Cloud project name.

Reader may refer to the demo recording that accompanies this repo, to learn the details.

Usage Instructions

Jupyter notebook, ngc_triton_bert_deployment/bert_on_caip.ipynb, is the best place to start, as it contains the commands to run for each task. However, it is strongly recommended to watch demo recording created to walk through step-by-step on what's shown in the notebook, to understand the details.

For each tasks, refer to individual readme, included in the directory to learn more about the task and its detail.

License

Refer to the following NVIDIA End User License Agreements, included in LICENSE file.

BERT on Google Cloud AI Platform

Description

Usage Instructions

License