Question

How to run SPAdes assembler with bam files?

1

Entering edit mode

3.7 years ago

marongiu.luigi ▴ 730

Hello,

I would like to assemble the reads that did not align to my reference genome. I understand that I can gather the unmapped reads with

samtools view -f 12 -F 256 aligned-sorted-deduplicated.bam > unmapped.bam

where -f12 = keep read unmapped, mate unmapped and -F256 = skip not primary alignment. I am essentially keeping only the pairs that did not match. Then I need to re-sort the reads with

samtools sort -n unmapped.bam unmapped-sorted.bam

The manual indicates to use fastq files, so I need to extract them with

bamToFastq -i unmapped-sorted.bam -fq R1.fastq -fq2 R2.fastq

and then assemble with

spades.py -1 R1.fastq  -2 R2.fastq  -o someFolder

I would like to ask:

Is this procedure correct?
Can I compress the fastq files directly?
Can I use the unmapped-sorted.bam without making the fastq files and if yes how?

Thank you

spades assembler bam • 2.9k views

ADD COMMENT • link 3.7 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

You should be able to use pipes to string those steps together

samtools view -f 12 -F 256 aligned-sorted-deduplicated.bam | samtools sort -n  - | samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -N -

ADD REPLY • link 3.7 years ago by GenoMax 148k

0

Entering edit mode

Thank you. Actually I am getting an unmapped.bam file of 132 Mb, a sorted version of 34 Mb (shouldn't they be the same weight) and fastq of zero bytes. Is that correct? Can I compress the fastq files directly? Can I use bam files with spades directly?

ADD REPLY • link 3.7 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

Sorted files compress better so the size being smaller is not a particular surprise. Have you checked to see that you are getting reads at each intermediate step?

ADD REPLY • link 3.7 years ago by GenoMax 148k

0

Entering edit mode

Well, I got all shades. One sample gave correct reads so it is done. Another has 30 Mb of stuff in the sorted bam file but the derived fastq files are empty. Another has stuff in the bam file, but when I sort it I get: samtools sort: failed to read header from "unmapped.bam". Yet the command is the same for all files...

ADD REPLY • link 3.7 years ago by marongiu.luigi ▴ 730

0

Entering edit mode

for the latter error, I added -h in the first samtools call...

ADD REPLY • link 3.7 years ago by marongiu.luigi ▴ 730