Clara Deploy SDK is being consolidated into Clara Holoscan SDK

More info https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/collections/claradeploy

De Novo Sequence Assembly - Clara Genomics Analysis

This asset requires the Clara Deploy SDK. Follow the instructions on the Clara Ansible page to install the Clara Deploy SDK.

Overview

This is a reference pipeline using Clara Genomic Analysis tools to assemble genome with Clara Deploy SDK.

These tools exploit the abilities of GPU to accelerate gene sequencing.

Pipeline Definition

api-version: 0.4.0
name: denovo-gpu
parameters:
  DOCKER_IMAGE: claraparabricks/cga_cuda10
  DOCKER_TAG: v0.5.2

  MAPPER_KMER_SIZE: 15
  MAPPER_WINDOW_SIZE: 5
  MAPPER_ADDITIONAL_PARAMS: ''

  RACON_LOOPS: 5
  RACON_THREADS: 5
  RACON_POLISH_BATCH_SIZE: 4
  RACON_ADDITIONAL_PARAMS: ''
  JOB_ID: '.'

operators:
- name: mapper
  description: CUDA mapper
  container:
    image: ${{DOCKER_IMAGE}}
    tag: ${{DOCKER_TAG}}
    command: ["pef", "cudamapper",
              "-i", "/input",
              "-o", "/mapperOutput/${{JOB_ID}}",
              "-p", "-k ${{MAPPER_KMER_SIZE}} -w ${{MAPPER_WINDOW_SIZE}}"]
  requests:
    gpu: 1
    memory: 16384
  input:
  - path: /input/
  output:
  - path: /mapperOutput

- name: miniasm
  description: Miniasm
  container:
    image: ${{DOCKER_IMAGE}}
    tag: ${{DOCKER_TAG}}
    command: ["pef", "miniasm",
              "-i", "/input",
              "-l", "/mapperOutput/${{JOB_ID}}/overlaps.paf",
              "-o", "/asmOutput/${{JOB_ID}}"]
  input:
  - path: /input/
  - from: mapper
    path: /mapperOutput
  output:
  - path: /asmOutput
  requests:
    memory: 16384

- name: racon
  description: Polish Assembly using racon
  container:
    image: ${{DOCKER_IMAGE}}
    tag: ${{DOCKER_TAG}}
    command: ["pef", "racon",
              "-i", "/input",
              "-a", "/asmOutput/${{JOB_ID}}/reads.fa",
              "-o", "/raconOutput/${{JOB_ID}}",
              "-t", "${{RACON_THREADS}}",
              "-l", "${{RACON_LOOPS}}"]
  requests:
    cpu: 4
    gpu: 1
    memory: 16384
  input:
  - path: /input/
  - from: miniasm
    path: /asmOutput
  - from: mapper
    path: /mapperOutput
  output:
  - path: /raconOutput/

Executing the Pipeline

Please refer to the Quick Start Guide section on how to run this reference pipeline using local input files.

Data Input

Input requires a folder containing the following files:

sample.fasta - Input fasta sample file for all-to-all mapping

jobConfig(optional) - A file containing param and value in shell script style. A sample (sample_job_config.sh) is provided. Following is content of a jobConfig file with default values.

  KMER_SIZE=15         # length of kmer to use for minimizers
  WINDOW_SIZE=5        # length of window to use for minimizers
  INDEX_SIZE=10000     # length of batch size used for query

  RACON_LOOPS=5        # Number of polishing loops
  RACON_THREADS=15     # number of threads
  POLISH_BATCH_SIZE=6  # number of batches for CUDA accelerated polishing

Data Output

Assembled and polished sequence

License

An End User License Agreement is included with the product. By pulling and using the Clara Deploy asset on NGC, you accept the terms and conditions of these licenses.

Clara Deploy Genomics Analysis - De Novo Sequence Assembly