Entering edit mode
10.4 years ago
biolab
★
1.4k
HI everyone,
I have a short sequence, which has been mapped to the genome with known numbers of mismatches and INDELs. Below shows an example:
target sequence: ATGGATCGTACGTAACGCTAGCATGACT, query sequence: ATGCGTCCGCGTAACG, mismatch No.: 6, INDEL No.: 3.
based on the above information, how to generate the following pattern, in other words, how to output the mismatch and INDEL position information?
ATG-GATCGTACGTAACGCTAGCATGACT
||| | || || |||
ATGCG-TCCGCG-TAACG
Thank you very much.
Ok, so when you say how to output do you mean 'how to represent them properly' or 'how to write a code to represent them the way you displayed'? I think I got a bit confused there.
Hi Jordan,
I mean "how to represent them properly?" I know little about CIGAR string. Any tools that helps align my short sequence to the genome and output mismatch and INDEL information will be useful to me. But one important thing is that I would allow fuzzy match. Thanks!
I see. You can use BLAST and choose the option
Align two or more sequences
.You can put in query and target sequence in fasta format. For e.g., in your case:
You need to give your match, mismatch and gap scores. Any tool which does alignment needs them.
This will display the type of alignment you mentioned in your question. Not a CIGAR string though.