NGC | Catalog
CatalogResourcesClara Deploy Genomics Analysis - De Novo Sequence Assembly

Clara Deploy Genomics Analysis - De Novo Sequence Assembly

Logo for Clara Deploy Genomics Analysis - De Novo Sequence Assembly
Description
Clara Deploy Genomics Analysis - De Novo Sequence Assembly [Deprecated]
Publisher
NVIDIA
Latest Version
0.8.1-2108.1
Modified
April 4, 2023
Compressed Size
5.79 MB

Clara Deploy SDK is being consolidated into Clara Holoscan SDK

More info https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/collections/claradeploy

De Novo Sequence Assembly - Clara Genomics Analysis

This asset requires the Clara Deploy SDK. Follow the instructions on the Clara Ansible page to install the Clara Deploy SDK.

Overview

This is a reference pipeline using Clara Genomic Analysis tools to assemble genome with Clara Deploy SDK.

These tools exploit the abilities of GPU to accelerate gene sequencing.

Pipeline Definition

api-version: 0.4.0
name: denovo-gpu
parameters:
  DOCKER_IMAGE: claraparabricks/cga_cuda10
  DOCKER_TAG: v0.5.2

  MAPPER_KMER_SIZE: 15
  MAPPER_WINDOW_SIZE: 5
  MAPPER_ADDITIONAL_PARAMS: ''

  RACON_LOOPS: 5
  RACON_THREADS: 5
  RACON_POLISH_BATCH_SIZE: 4
  RACON_ADDITIONAL_PARAMS: ''
  JOB_ID: '.'

operators:
- name: mapper
  description: CUDA mapper
  container:
    image: ${{DOCKER_IMAGE}}
    tag: ${{DOCKER_TAG}}
    command: ["pef", "cudamapper",
              "-i", "/input",
              "-o", "/mapperOutput/${{JOB_ID}}",
              "-p", "-k ${{MAPPER_KMER_SIZE}} -w ${{MAPPER_WINDOW_SIZE}}"]
  requests:
    gpu: 1
    memory: 16384
  input:
  - path: /input/
  output:
  - path: /mapperOutput

- name: miniasm
  description: Miniasm
  container:
    image: ${{DOCKER_IMAGE}}
    tag: ${{DOCKER_TAG}}
    command: ["pef", "miniasm",
              "-i", "/input",
              "-l", "/mapperOutput/${{JOB_ID}}/overlaps.paf",
              "-o", "/asmOutput/${{JOB_ID}}"]
  input:
  - path: /input/
  - from: mapper
    path: /mapperOutput
  output:
  - path: /asmOutput
  requests:
    memory: 16384

- name: racon
  description: Polish Assembly using racon
  container:
    image: ${{DOCKER_IMAGE}}
    tag: ${{DOCKER_TAG}}
    command: ["pef", "racon",
              "-i", "/input",
              "-a", "/asmOutput/${{JOB_ID}}/reads.fa",
              "-o", "/raconOutput/${{JOB_ID}}",
              "-t", "${{RACON_THREADS}}",
              "-l", "${{RACON_LOOPS}}"]
  requests:
    cpu: 4
    gpu: 1
    memory: 16384
  input:
  - path: /input/
  - from: miniasm
    path: /asmOutput
  - from: mapper
    path: /mapperOutput
  output:
  - path: /raconOutput/

Executing the Pipeline

Please refer to the Quick Start Guide section on how to run this reference pipeline using local input files.

Data Input

Input requires a folder containing the following files:

  • sample.fasta - Input fasta sample file for all-to-all mapping
  • jobConfig(optional) - A file containing param and value in shell script style. A sample (sample_job_config.sh) is provided. Following is content of a jobConfig file with default values.
      KMER_SIZE=15         # length of kmer to use for minimizers
      WINDOW_SIZE=5        # length of window to use for minimizers
      INDEX_SIZE=10000     # length of batch size used for query
    
      RACON_LOOPS=5        # Number of polishing loops
      RACON_THREADS=15     # number of threads
      POLISH_BATCH_SIZE=6  # number of batches for CUDA accelerated polishing
    

Data Output

Assembled and polished sequence

License

An End User License Agreement is included with the product. By pulling and using the Clara Deploy asset on NGC, you accept the terms and conditions of these licenses.

Suggested Reading

Release Notes, the Getting Started Guide, and the SDK itself are available at the NVIDIA Developer forum: (https://developer.nvidia.com/clara).

For answers to any questions you may have about this release, visit the NVIDIA Devtalk forum: (https://devtalk.nvidia.com/default/board/362/clara-sdk/).