ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580717/scrEXT030_hg19_S11_L001.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580718/scrEXT030_hg19_S11_L002.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580719/scrEXT030_hg19_S11_L003.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580720/scrEXT030_hg19_S11_L004.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580721/scrEXT030_hg19_S11_L005.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580722/scrEXT030_hg19_S11_L006.bam
For those bam files, samtools output all reads as unmapped. When I checked the bam files, I found the flag in all reads were 4.
samtools view -f 4 scrEXT030_hg19_S15_L002.bam | cut -f1 > S15_L002_unmapped_reads.txt
Why all those reads in bam files were unmapped?
Thank you for your timely reply, how can we extract unmapped reads from those uBAM related fastq files?
samtools fastq
. Make sure the BAM files arecollated
orname
sorted before running the conversion.My aim is to extract the unmapped reads without the host sequence. Now there is no way to extract the unmapped reads from uBAM is there? I can only transform these uBAMs into fastq and then use bowtie2 to remove the host sequences? Could you kindly add a line of samtools fastq code to clarify your point?
Correct. If there are no alignments included (as seems to be the case) then you will need to do alignments yourself to extract the reads. One way to do that would be to bin the reads using
bbsplit.sh
from BBMap suite (example BBSplit syntax for generating builds for the reference genome and how to call different builds. replace with your genomes).samtools fastq
is described in manual page here.they should really name them
.ubam
. calling them.bam
only leads to confusion