Question

Why all those reads in bam files were unmapped?

0

Entering edit mode

3.0 years ago

Wang • 0

ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580717/scrEXT030_hg19_S11_L001.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580718/scrEXT030_hg19_S11_L002.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580719/scrEXT030_hg19_S11_L003.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580720/scrEXT030_hg19_S11_L004.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580721/scrEXT030_hg19_S11_L005.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580722/scrEXT030_hg19_S11_L006.bam

For those bam files, samtools output all reads as unmapped. When I checked the bam files, I found the flag in all reads were 4.

samtools view -f 4 scrEXT030_hg19_S15_L002.bam | cut -f1 > S15_L002_unmapped_reads.txt

Why all those reads in bam files were unmapped?

samtools single-cell sequencing • 2.4k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 3.0 years ago by Wang • 0

score 3 · Answer 1 · 2021-12-10

3

Entering edit mode

3.0 years ago

Carlo Yague 8.9k

The reads are probably in uBAM (Unmapped BAM) file format. It is a way to store raw reads that is sometimes preferred to fastq as it allows you to attach metadata to the reads.

ADD COMMENT • link 3.0 years ago by Carlo Yague 8.9k

0

Entering edit mode

Thank you for your timely reply, how can we extract unmapped reads from those uBAM related fastq files?

ADD REPLY • link 3.0 years ago by Wang • 0

2

Entering edit mode

samtools fastq. Make sure the BAM files are collated or name sorted before running the conversion.

ADD REPLY • link 3.0 years ago by GenoMax 147k

0

Entering edit mode

My aim is to extract the unmapped reads without the host sequence. Now there is no way to extract the unmapped reads from uBAM is there? I can only transform these uBAMs into fastq and then use bowtie2 to remove the host sequences? Could you kindly add a line of samtools fastq code to clarify your point?

ADD REPLY • link 3.0 years ago by Wang • 0

1

Entering edit mode

Correct. If there are no alignments included (as seems to be the case) then you will need to do alignments yourself to extract the reads. One way to do that would be to bin the reads using bbsplit.sh from BBMap suite (example BBSplit syntax for generating builds for the reference genome and how to call different builds. replace with your genomes).

samtools fastq is described in manual page here.

ADD REPLY • link 3.0 years ago by GenoMax 147k

0

Entering edit mode

they should really name them .ubam. calling them .bam only leads to confusion

ADD REPLY • link 3.0 years ago by Jeremy Leipzig 22k

score 1 · Answer 2 · 2021-12-10

1

Entering edit mode

3.0 years ago

ATpoint 85k

Could well be unmapped BAM files, so the reads were stored in unaligned BAM format rather than fastq. That is not unusual. You will need to check whether there are some information about the processing of these files available. The hg19 indeed suggests some alignment, but file names are usually poor sources of information.

ADD COMMENT • link 3.0 years ago by ATpoint 85k

0

Entering edit mode

Thank you. Do you know how can we extract unmapped reads from those uBAM files or their related fastq