I'm trying to find gene fusions, and when looking at the reads aligned by star I find that some reads will map 3 bases on a gene and then have no other base correct. When I look at the cigar I see the following 3M<high number > 400000bp>N47M. To see this reads I have used samtools view, why does this happen. This can be an example:
Analising fusion: RP11-96H19.1::RP11-446N19.1
READS IN BAM
['SRR064438.11916954', 83, 12, 46387924, 255, '49M1S', '=', 46387805, -168, 'TAAGACCAGACCAAATCAAACCAAACCAAGCAAACCACGGGGAATGGAGA', 'A@@>9BBA@@B@B@B@BAB>B??@@BCCBABBB@BB@B?BBABBCBACAB', 'NH:i:1', 'HI:i:1', 'AS:i:97', 'nM:i:0', 'NM:i:0']
['SRR064439.375905', 163, 12, 46387934, 255, '39M11S', '=', 46652514, 264630, 'CCAAATCAAACCAAACCAAGCAAACCACGGGGAATGGAGATTATTGCCTG', 'ABBBBB@BBBAABBB@@BBBABBB@@BABBBB@ABBBACBBBBBBB=5BB', 'NH:i:1', 'HI:i:1', 'AS:i:84', 'nM:i:0', 'NM:i:0']
***
**['SRR064438.12267353', 99, 12, 46387943, 255, '30M264417N20M', '=', 46652487, 264594, 'ACCAAACCAAGCAAACCACGGGGAATGGAGATTATTGCCTGCTCCTCCAA', 'BBBBBBA@BBB?AAB@?B?A@@A@AAAA<A?@A@A?<<8=*<@=5><8<=', 'NH:i:1', 'HI:i:1', 'AS:i:95', 'nM:i:0', 'NM:i:0']
['SRR064439.1473779', 163, 12, 46387962, 255, '11M82180N39M', '=', 46874958, 487046, 'GGGGAATGGAGGTCATGTGAGCACACAGCATAAAGGCAGCTGCCCACAAG', 'BCBBCCCBBCB@:CBBB>>BBACABCBBC?ACBC<ABBA94?:7>AA9B;', 'NH:i:1', 'HI:i:1', 'AS:i:97', 'nM:i:0', 'NM:i:0']
['SRR064439.2800317', 355, 12, 46387970, 3, '3M486844N47M', '=', 46874828, 486908, 'GAGGACCTGATGATTGATTTAGCATCTTTGGCATCCGGCCACTGCTCTGC', 'B@BAAABB>?A@@AAA?BBAB>A?@@BB???BA@B@::@AAA<<B=@;:;', 'NH:i:2', 'HI:i:2', 'AS:i:97', 'nM:i:0', 'NM:i:0']**
***
['SRR064438.4051038', 99, 12, 46652386, 255, '7S43M', '=', 46652461, 123, 'GGGGACTACAGATTATTGCCTGCTCCTCCAAGCCCTTCACTGTAGAATGG', 'BBBB@=BAAAB@BB@BBB<;?A?B?8@@=@@@?8;B@<;<BB??B;?@A@', 'NH:i:1', 'HI:i:1', 'AS:i:89', 'nM:i:0', 'NM:i:0']
REFERENCE:
TAAGACCAGACCAAATCAAACCAAACCAAGCAAACCACGGGGAATGGA==GGTAGGTGAATAGCGCCAAAGAGAATGATGGCTCACAACACTTCTAAGCA
READS:
TAAGACCAGACCAAATCAAACCAAACCAAGCAAACCACGGGGAATGGA==GA
***
CCAAATCAAACCAAACCAAGCAAACCACGGGGAATGGA==GATTATTGCCTG
ACCAAACCAAGCAAACCACGGGGAATGGA==GATTATTGCCTGCTCCTCCAA
GGGGAATGGA==GGTCATGTGAGCACACAGCATAAAGGCAGCTGCCCACAAG
***
GA==GGACCTGATGATTGATTTAGCATCTTTGGCATCCGGCCACTGCTCT GC
The 2,3,4th reads are the ones that bring up this problem, should I account them as mapping here or not? Is there something I'm missing?