Entering edit mode
8.6 years ago
Jackie
▴
70
I use tophat to map the rna-seq reads (paired-end). It seems that some of the reads in the .sam do not make sense. Please see some examples below:
Example line 1:
NB500923:20:H7WCJBGXX:1:12109:13961:14748 385 chr1 11646 3 151M chr22 114359002 0 CTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGAT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEE/AEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAEA/EAA<EEEEEEAEA/<AEE XA:i:0 MD:Z:151 NM:i:0 NH:i:2 CC:Z:chr15 CP:i:102519374 HI:i:0
In the above line, it says the start position of the mate on chr22 is 114359002, however, the size of the chr22 is only 51304566
Example line 2:
NB500923:20:H7WCJBGXX:1:23107:14442:20318 323 chr1 11696 0 151M chrUn_GL000249 155257832 0 GTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGTGCAAATTTGCCGGATTTCCTTCGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGT AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEEEAEEEEEEE<EEEEEEEEEEEEEEEAAEEA<EEEEEE XA:i:2 MD:Z:85C21T43 NM:i:2 NH:i:8 CC:Z:= CP:i:11696 HI:i:4
It says the start position of the mate on chrUn_GL000249 is 155257832, however, the total size of chrUn_GL000249 is only 38502.
Can anyone tell how this could happen? and how to fix the file?
Thanks,
Can anyone help with this?
You posted this 32 minutes ago, and also on SeqAnswers earlier today. Crossposting is not encouraged, and a bit of patience would be great.
Are you using pre-computed indexes? No chance of an error in bowite2 index itself (mis-labeling/odd concatenation in fasta genome)?