Tophat : Sam-Flag 115 = Properly-Paired + Read.Reverse + Mate.Reverse ?
0
2
Entering edit mode
11.1 years ago

EDIT: Cross posted on SE: http://seqanswers.com/forums/showthread.php?p=122835

I run tophat2 using the standard options.

$ tophat2 -v                                                                    
TopHat v2.0.10

$ samtools view -H TOPHAT/accepted_hits.bam   | grep PG
@PG     ID:TopHat       VN:2.0.10       CL:/commun/data/packages/tophat-2.0.10.Linux_x86_64/tophat -p 10 -G genes.gtf -o  TOPHAT --rg-id g24 --rg-library 6VGWT3 --rg-sample 6VGWT3 --rg-description 6VGWT3  34 fastqs --rg-platform-unit 1 2 3 4 --rg-center Nantes --rg-platform Illumina  mm10 6VGWT3_ATGTCA_L002_R1_002.fastq.gz,6VGWT3_ATGTCA_L004_R1_002.fastq.gz,6VGWT3_ATGTCA_L003_R1_002.fastq.gz,6VGWT3_ATGTCA_L002_R1_003.fastq.gz,6VGWT3_ATGTCA_L004_R1_003.fastq.gz,6VGWT3_ATGTCA_L003_R1_003.fastq.gz,6VGWT3_ATGTCA_L002_R1_004.fastq.gz,6VGWT3_ATGTCA_L004_R1_004.fastq.gz,6VGWT3_ATGTCA_L003_R1_004.fastq.gz,6VGWT3_ATGTCA_L004_R1_001.fastq.gz,6VGWT3_ATGTCA_L002_R1_001.fastq.gz,6VGWT3_ATGTCA_L003_R1_001.fastq.gz,6VGWT3_ATGTCA_L001_R1_002.fastq.gz,6VGWT3_ATGTCA_L001_R1_003.fastq.gz,6VGWT3_ATGTCA_L001_R1_004.fastq.gz,6VGWT3_ATGTCA_L001_R1_001.fastq.gz 6VGWT3_ATGTCA_L002_R2_002.fastq.gz,6VGWT3_ATGTCA_L004_R2_002.fastq.gz,6VGWT3_ATGTCA_L003_R2_002.fastq.gz,6VGWT3_ATGTCA_L002_R2_003.fastq.gz,6VGWT3_ATGTCA_L004_R2_003.fastq.gz,6VGWT3_ATGTCA_L003_R2_003.fastq.gz,6VGWT3_ATGTCA_L002_R2_004.fastq.gz,6VGWT3_ATGTCA_L004_R2_004.fastq.gz,6VGWT3_ATGTCA_L003_R2_004.fastq.gz,6VGWT3_ATGTCA_L004_R2_001.fastq.gz,6VGWT3_ATGTCA_L002_R2_001.fastq.gz,6VGWT3_ATGTCA_L003_R2_001.fastq.gz,6VGWT3_ATGTCA_L001_R2_002.fastq.gz,6VGWT3_ATGTCA_L001_R2_003.fastq.gz,6VGWT3_ATGTCA_L001_R2_004.fastq.gz,6VGWT3_ATGTCA_L001_R2_001.fastq.gz

I found some sam flags 115 !

115=

  • read paired
  • read mapped in proper pair
  • read reverse strand
  • mate reverse strand
  • first in pair

.

$ samtools view  -f 115 -F 256  dir/accepted_hits.bam | head -n 2
HWI-1KL149:61:D2C11ACXX:4:2204:6848:94129       115     chr1    24611547        1       70M2D31M        chrM    10906   0       GTAGGCGATTAGTGATTTTAAATCTGTTTGGCGTAAGCAGATTGAGCTAGTTATAATTATTCCTCATAGGGAGAAGGATGAAGGGGTATGCTATATATTTT      DDDDDBDDDDDEEEEEEFFFFFFFHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJIJJJJJJJJJJJJJJHHHHHFFFFFCCC      AS:i:-11        XN:i:0  XM:i:0  XO:i:1  XG:i:2  NM:i:2  MD:Z:70^GA31    YT:Z:UU NH:i:4  CC:Z:chrM CP:i:10928       HI:i:2  RG:Z:g24
HWI-1KL149:61:D2C11ACXX:2:2109:12004:4228       115     chr1    24611549        1       101M    chrM    10815   0       TGGCGATTAGTGATTTTAAATCTGTTTGGCGTAAGCAGATTGAGCTAGTTATAATTATTCCTCATAGGGAGAGAAGGATGAAGGGGTATGCTATATATTTT      DDDDDDDDDDDEEEEEEEFFFFFHHHJJJJJJJJJJJJJJJJJJIJJJJIJJJJJJJJIJJJJIGJJJJJJJJJJIJJJJJJJJJJJJHHHHHFFFFFCCC      AS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:0A100      YT:Z:UU NH:i:4  CC:Z:chrM       CP:i:10928HI:i:2   RG:Z:g24

how can a read be "mapped in proper pair" and read reverse strand+ mate reverse strand ? what is the consequence for a tool like htseqcount ? Does it only count the reads in proper pair ?

tophat2 bowtie2 sam • 4.7k views
ADD COMMENT
1
Entering edit mode

Note the HI:i:2 auxiliary flag. When I still used tophat2, I never had it output more than a single hit. I presume that this is the source of the issue. I suspect that if you grep for those reads, you'll find that the mate is for HI:i:1, which would make sense when looking at the RNEXT, PNEXT and TLEN columns and the CC and CP auxiliary fields. This is my only guess as to why this might be the case (I'd post this as an answer if I actually knew!).

Regarding htseq-count, just use the MAPQ threshold and set it above 4 to get rid of multimappers from tophat.

ADD REPLY
0
Entering edit mode

you have strange RNEXT, PNEXT and TLEN columns as well. why is the chromosome name there? shouldn't be another read name or =

ADD REPLY
0
Entering edit mode

yes. That's strange but that's really the raw output for tophat2.

ADD REPLY
0
Entering edit mode

no idea here ? I'll ask SE...

ADD REPLY

Login before adding your answer.

Traffic: 2070 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6