Entering edit mode
3.7 years ago
marongiu.luigi
▴
730
Hello,
I would like to assemble the reads that did not align to my reference genome. I understand that I can gather the unmapped reads with
samtools view -f 12 -F 256 aligned-sorted-deduplicated.bam > unmapped.bam
where -f12 = keep read unmapped, mate unmapped
and -F256 = skip not primary alignment
. I am essentially keeping only the pairs that did not match. Then I need to re-sort the reads with
samtools sort -n unmapped.bam unmapped-sorted.bam
The manual indicates to use fastq files, so I need to extract them with
bamToFastq -i unmapped-sorted.bam -fq R1.fastq -fq2 R2.fastq
and then assemble with
spades.py -1 R1.fastq -2 R2.fastq -o someFolder
I would like to ask:
- Is this procedure correct?
- Can I compress the fastq files directly?
- Can I use the unmapped-sorted.bam without making the fastq files and if yes how?
Thank you
You should be able to use pipes to string those steps together
Thank you. Actually I am getting an unmapped.bam file of 132 Mb, a sorted version of 34 Mb (shouldn't they be the same weight) and fastq of zero bytes. Is that correct? Can I compress the fastq files directly? Can I use bam files with spades directly?
Sorted files compress better so the size being smaller is not a particular surprise. Have you checked to see that you are getting reads at each intermediate step?
Well, I got all shades. One sample gave correct reads so it is done. Another has 30 Mb of stuff in the sorted bam file but the derived fastq files are empty. Another has stuff in the bam file, but when I sort it I get:
samtools sort: failed to read header from "unmapped.bam"
. Yet the command is the same for all files...for the latter error, I added
-h
in the first samtools call...