Hello everyone,
I am trying to assemble a virus sars-cov2 in consensus using map to reference strategy. The problem is that the reference genome is approx. 30.000 in length and my assembly is only 1500 bp. While map to reference method using bwa mem software I got 93% mapped reads, so I am expecting almost circular genome.
I used the following commands:
1) Indexing file - bwa index reference\ /SARS_cov_2.fasta
2) Allign and convert to bam - bwa mem reference\ /SARS_cov_2.fasta ERR4082713_1.fastq.gz ERR4082713_2.fastq.gz | samtools sort -o aln_sars_cov.bam
3) Keep all mapped sequences - samtools view -c -f 4 aln_sars_cov.bam
4) Retrieve reads from bam file - bam2fastq --aligned --force --strict -o mapped#.fq aln_sars_cov.bam
5) Assemble using SPAdes - spades.py -k 127 -1 mapped_1.fq -2 mapped_2.fq --careful -o output/
Am I using the wrong method to assemble it or is it a problem with pre-processing the reads ? What is the best strategy (pipeline) to get a consensus?
Why would you need to assemble? Just align to reference and extract consensus. See e.g. Generating consensus sequence from bam file
I will try this method, thanks a lot!