How to properly assemble the virus genome using map to reference strategy

0

Entering edit mode

5.2 years ago

xatabadich • 0

Hello everyone,

I am trying to assemble a virus sars-cov2 in consensus using map to reference strategy. The problem is that the reference genome is approx. 30.000 in length and my assembly is only 1500 bp. While map to reference method using bwa mem software I got 93% mapped reads, so I am expecting almost circular genome.

I used the following commands:

1) Indexing file - bwa index reference\ /SARS_cov_2.fasta

2) Allign and convert to bam - bwa mem reference\ /SARS_cov_2.fasta ERR4082713_1.fastq.gz ERR4082713_2.fastq.gz | samtools sort -o aln_sars_cov.bam

3) Keep all mapped sequences - samtools view -c -f 4 aln_sars_cov.bam

4) Retrieve reads from bam file - bam2fastq --aligned --force --strict -o mapped#.fq aln_sars_cov.bam

5) Assemble using SPAdes - spades.py -k 127 -1 mapped_1.fq -2 mapped_2.fq --careful -o output/

Am I using the wrong method to assemble it or is it a problem with pre-processing the reads ? What is the best strategy (pipeline) to get a consensus?

genome Assembly map to reference rna-seq • 1.0k views

ADD COMMENT • link updated 5.1 years ago by Biostar 20 • written 5.2 years ago by xatabadich • 0

2

Entering edit mode

Why would you need to assemble? Just align to reference and extract consensus. See e.g. Generating consensus sequence from bam file