BAM file reads mapping to multiple genes
3
0
Entering edit mode
3.0 years ago
Nicole • 0

I am unfamiliar with BAM files and pretty new to the linux command line. I have what I suspect is a fairly simple problem to solve. I have a dataset where I believe there are a large number of reads mapping to multiple genes. I am trying to find a way to filter for reads that map to more than one gene. Thanks for any help!

Edit: This is data from a scRNA sequencing experiment done with 10X equipment. The cells were from a rabbit which is not supported by 10X for genome alignment so I made a custom reference genome. I had a low rate of mapping to the genome (~20%) and I am trying to figure out the cause - if it is an issue with the reference or our sample. I think there is a possibility that the multiple mapping is due to overlapping annotations in the reference genome (this was suggested to me by 10X support) in which case I don't want to be filtering out those reads I want to fix the reference. But here I am trying to identify if that is actually the case and if so if there are a particular set of genes that are the problem.

scrnaseq 10x samtools bam sequencing • 3.3k views
ADD COMMENT
0
Entering edit mode
3.0 years ago
tomas4482 ▴ 430

Do you align the reads to your own reference or do you get the bam from other data source? If it is the first case, you can set your aligner to ignore multiple-mapping reads and align those pair-matched reads. Most aligners have this function. If it is the second case, samtools view -bq 1 file.bam > unique.bam should work. Take a reference here.

ADD COMMENT
0
Entering edit mode
3.0 years ago
Marco Pannone ▴ 810

What sort of data are you dealing with? ChIP-seq data? However, if you want to remove multimapping reads (for a valid reason) you can use sambamba in the following way:

sambamba view -h -f bam -F "[XS] == null" input.bam -o output.bam

XS flag is a "mark" given by certain aligners (such as Bowtie2) to reads that report multiple alignments. There is a debate if removing or not multimapping reads. Personally, when dealing with ChIP-seq data I always remove multimapping reads.

ADD COMMENT
0
Entering edit mode

I updated my original post with more detail. This data is from a scRNA seq experiment and it was processed using 10X cellranger count which uses the STAR aligner. The reads were removed by cellranger but I am not sure they should be removed as I explain in my edit above.

ADD REPLY
0
Entering edit mode
3.0 years ago

A more general option is to filter a BAM file by Mapping Quality to exclude poorly aligned reads

Just one answer on how to do that is here:

Filtering A Sam File For Quality Scores

ADD COMMENT

Login before adding your answer.

Traffic: 1615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6