NGC | Catalog
CatalogContainersNVIDIA Optimized Deep Learning Framework powered by Apache MXNet

NVIDIA Optimized Deep Learning Framework powered by Apache MXNet

Logo for NVIDIA Optimized Deep Learning Framework powered by Apache MXNet
Features
Description
NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet is a deep learning framework that allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity.
Publisher
Apache Software Foundation
Latest Tag
24.02-py3
Modified
March 13, 2024
Compressed Size
6.46 GB
Multinode Support
Yes
Multi-Arch Support
Yes
24.02-py3 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

Linux / arm64

Sorry, your browser does not support inline SVG.

What Is NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet?

NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavors of symbolic programming and imperative programming to maximize efficiency and productivity.

In its core is a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The library is portable and lightweight, and it scales to multiple GPUs and multiple machines.

NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet is also more than a deep learning project. It is also a collection of blueprints and guidelines for building deep learning systems and interesting insights of DL systems for hackers.

Running the Container

Before you can run an NGC deep learning framework container, your Docker environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers And Frameworks User Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.

The method implemented in your system depends on the DGX OS version installed (for DGX systems), the specific NGC Cloud Image provided by a Cloud Service Provider, or the software that you have installed in preparation for running NGC containers on TITAN PCs, Quadro PCs, or vGPUs.

Procedure

  1. Select the Tags tab and locate the container image release that you want to run.

  2. In the Pull Tag column, click the icon to copy the docker pull command.

  3. Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.

  4. Run the container image.

    If you have Docker 19.03 or later, a typical command to launch the container is:

    docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/mxnet:xx.xx-py3

If you have Docker 19.02 or earlier, a typical command to launch the container is:

    nvidia-docker run -it --rm -v local_dir:container_dir nvcr.io/nvidia/mxnet:xx.xx-py3

Where:

  • -it means run in interactive mode
  • --rm will delete the container when finished
  • -v is the mounting directory
  • local_dir is the directory or file from your host system (absolute path) that you want to access from inside your container. For example, the local_dir in the following path is /home/jsmith/data/mnist.
 -v /home/jsmith/data/mnist:/data/mnist

If you are inside the container, for example, ls /data/mnist, you will see the same files as if you issued the ls /home/jsmith/data/mnist command from outside the container.

  • container_dir is the target directory when you are inside your container. For example, /data/mnist is the target directory in the example:
 -v /home/jsmith/data/mnist:/data/mnist
  • xx.xx is the container version. For example, 20.01.

NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet is run simply by importing it as a Python module: $ python Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import mxnet as mx >>> a = mx.nd.ones((2,3), mx.gpu()) >>> print((a*2).asnumpy()) [[ 2. 2. 2.] [ 2. 2. 2.]] You might want to pull in data and model descriptions from locations outside the container for use by MXNet. To accomplish this, the easiest method is to mount one or more host directories as Docker data volumes. You have pulled the latest files and run the container image.

Note: In order to share data between ranks, NCCL may require shared system memory for IPC and pinned (page-locked) system memory resources. The operating system's limits on these resources may need to be increased accordingly. Refer to your system's documentation for details. In particular, Docker containers default to limited shared and pinned memory resources. When using NCCL inside a container, it is recommended that you increase these resources by issuing:

--shm-size=1g --ulimit memlock=-1

in the command line to:

docker run --gpus all

See /workspace/README.md inside the container for information on customizing your container image.

Running MXNet Using Base Command Platform

Jobs using the MXNet NGC Container on Base Command Platform clusters can be launched either by using the NGC CLI tool or by using the Base Command Platform Web UI. To use the NGC CLI tool, configure the Base Command Platform user, team, organization, and cluster information using the ngc config command as described here.

An example command to launch the container on a single-GPU instance is:

ngc batch run --name "My-1-GPU-mxnet-job" --instance dgxa100.80g.1.norm --commandline "sleep infinity" --result /results --image "nvidia/mxnet:23.03-py3"

The MXNet container includes JupyterLab in it and can be invoked as part of the job command for easy access to the container and exploring the capabilities of the container. Once the job is created, let it run for 2-3 minutes before opening the JupyterLab. Example to invoke JupyterLab as part of the job run on a single DGX node is:

ngc batch run --name "My-1-node-mxnet-jupyterlab-job" --instance dgxa100.80g.8.norm --commandline "jupyter lab --allow-root --ip=* --port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/ & sleep infinity" --result /results --image "nvidia/mxnet:23.03-py3" --port 8888

Suggested Reading

For the latest Release Notes, see the NVIDIA Optimized Deep Learning Framework, powered by Apache MXNet Release Notes Documentation website.

Link to Open Source Code

For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the Frameworks Support Matrix.

For more information about MXNet, including tutorials, documentation, and examples, see:

Security CVEs

To review known CVEs on the 21.07 image, please refer to the Known Issues section of the Product Release Notes.

License

By pulling and using the container, you accept the terms and conditions of this End User License Agreement.