Hi,
I generated transcript sequences with the Isoseq3 pipeline from PacBio. Then, I aligned these sequences against my fasta reference, with pbmm2 align
tool ; it worked well.
My question is : why some of my full length transcripts (76 transcripts out of 94818) align twice (sometimes more, up to 4 times) on the reference whereas all the others align only once. It looks weird that a full-length transcript (~1500nt) with a high quality sequence can match with two different spots of the reference, right?
Probably I can get the answer with the header BAM files (I don't show the whole sequence):
transcript/66333 16 Super-Scaffold_100015 1701509 60 1795S67=62N55=76N136=57N221=120N99=132N131=67N301=1X139N128=100N110=1366N211=2358N38=1967N109=1023N291= * 0 0 CGTACGGAAACCAAAAAAACCTATTCGTCGGTGGACGGCAGGTTTTCGGTGTGTAGTCAGAGCTTTAGATCGTTGGCTATTTTTGACGCAATGTTCTTGAACCCGATGGACGTGCCCGATATGCGCTTAAACCGGACGCCGTTCAGCGACAGCCTGGGCAGCTTGCACACCTCGATCTCCCACTGGACCAGCGAGTCGGTGTTCGGGTCGCCGTGCACGCACAGCAACAGGAAGCGCTCGCGCTGTTCGTAATCGCAATTGTTGGCGTCCAGCACCTCTCTTATTTCAGCCATAATCTCGTTTGGGTCTCTTGTAGACGTCGTTTTCATACTCCATGTGAATCTCAACGATCTCGGTTTCATTTGATCGTCATTCACTGTATTACTAACAATATTATTTTTGGGCAGATTTTGATCCATTGGCCGTTTAACGAACTTGGATGATATTTTTGAAAAGAATGATGGCCTTTGCACAGATGGATCATGTGGACTGCCAGTGTTGTTAGTAGGTCC
transcript/66333 2048 Super-Scaffold_100015 1701660 60 1898S35=76N136=57N221=120N100=132N128=67N298=139N135=100N109=1366N210=2358N39=1967N109=1023N256=1I18= * 0 0 TGTGTAATTTTTTTTCGTACGGAAACCAAAAAACCTATTCGTCGGTGGACGGCAGGTTTTCGGTGTGTAGTCAGAGCTTTAGATCGTTGGCTATTTTTGACGCAATGTTCTTGAACCCGATGGACGTGCCCGATATGCGCTTAAACCGGACGCCGTTCAGCGACAGCCTGGGCAGCTTGCACACCTCGATCTCCCACTGGACCAGCGAGTCGGTGTTCGGGTCGCCGTGCACGCACAGCAACAGGAAGCGCTCGCGCTGTTCGTAATCGCAATTGTTGGCGTCCAGCACCTCTCTTATTTCAGCCATAATCTCGTTTGGGTCTCTTGTAGACGTCGTTTTCATACTCCATGTGAATCTCAACGATCTCGGTTTCATTTGATCGTCATTCACTGTATTACTAACAATATTATTTTTGGGCAGATTTTGATCCATTGGCCGTTTAACGAACTTGGATGATATTTTTGAAAAGAATGATGGCCTTTGCACAGATGGATCATGTGGACTGCCAGTGGCTGGTGATGCAGGTGATACAGTGACACCCCTTGCTGGATCCATTA
Moreover, this is the second transcript with the header transcript/66333 2048 Super-Scaffold_100015 1701660 60 1898S35=76N136=57N221=120N100=132N128=67N298=139N135=100N109=1366N210=2358N39=1967N109=1023N256=1I18= * 0 0
which correspond to the transcript generated with Isoseq. The first transcript has a different sequence compared to the transcript generated with Isoseq. Why?
Best