Extract uniquely mapped reads from bam file
1
1
Entering edit mode
8.0 years ago
User000 ▴ 710

Hello,

I have a genome with homologous pairs of chromosome. I am using hisat2 to map RNA-seq reads to genome. I want to do variant calling with GATK, but before I want to extract the reads that mapped uniquely from BAM file(not SAM). Could you suggest how to do that? From other threads seems that flag NH:i:1 means mapped once, but how can I grep it from bam file? and is it a right way?

HISEQ1:128:C0HUMACXX:6:2210:15441:67674 163 chr1A   1101543 60  59M =   1104438 626 CAATCTTCTCGTGGTTTATTGGCGAATGGAGTTTCAGAAGATGCTCAAAATGGTTTCCA @BCFFDDEFHDDFGEEEDHEHCBHFD<3D3C*?D:B?:B?B*98D?4?BB=CG)=FFHA AS:i:0  ZS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:59 YS:i:0  YT:Z:CP NH:i:1
HISEQ1:130:C0G55ACXX:1:1203:2254:30173  163 chr1A   1101543 60  101M    =   1101599 155 CAATCTTCTCGTGGTTTATTGGCGAATGGAGTTTCAGAAGATGCTCAAAATGGTTTCCATGATGCTGATCGAACTGTCCGCAGAGGTGAGGGGCCATCTAA   CCCFFFFFHHHHHJGIJJJJJJJJJJJJJGIJIJJIJIJJJJJJJJJJJJJJJIIJJJJJJJJJJJJHHHHHFFFFEEEDDDDDDD@BDDDDDDDDDDDDD   AS:i:0  ZS:i:-15    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:101    YS:i:-2 YT:Z:CP NH:i:1
RNA-Seq • 9.6k views
ADD COMMENT
5
Entering edit mode
8.0 years ago

samtools view -b -q 5 -o output.bam alignments.bam

Having said that, there's no reason to bother doing this. GATK is smart enough to understand that low MAPQ alignments should be ignored.

ADD COMMENT
0
Entering edit mode

thank you, may I pose the question in another way also:;since i have these homology i want to optimize parameters to map reads uniquely to the different subgenomes..so I either optimize parameters of mapping with hisat2 or filter the bam file...any suggestion is much appreciated.

ADD REPLY
0
Entering edit mode

There are many allele-specific alignment and processing packages already made, don't reinvent the wheel.

ADD REPLY
0
Entering edit mode

can you suggest me how to grep "NH:i:1" reads from bam? thanks

ADD REPLY
1
Entering edit mode

That's much slower and will produce the same results. Don't waste your time.

ADD REPLY
0
Entering edit mode

another question is: why with the command line above my new bam file is 4 times bigger than the previous one?

ADD REPLY
1
Entering edit mode

Because I didn't include the -b option, mea culpa.

ADD REPLY
0
Entering edit mode

I see, thank you very much! Why did you use -q 5? also, do you think it is ok to filter unique reads after doing mark dups step?

ADD REPLY
0
Entering edit mode

I don't understand why mapping quality >= 5 means uniquely mapped :(

ADD REPLY
3
Entering edit mode

It's an undocumented "feature" of hisat2/topat (and STAR, in case you ever need to use that) that MAPQ values of 0-3 are used for multimappers and a higher value (255 or 50) is used for non-multimappers.

ADD REPLY

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6