Find mismatch and INDEL position
1
0
Entering edit mode
10.5 years ago
biolab ★ 1.4k

HI everyone,

I have a short sequence, which has been mapped to the genome with known numbers of mismatches and INDELs. Below shows an example:

target sequence: ATGGATCGTACGTAACGCTAGCATGACT, query sequence: ATGCGTCCGCGTAACG, mismatch No.: 6, INDEL No.: 3.

based on the above information, how to generate the following pattern, in other words, how to output the mismatch and INDEL position information?

ATG-GATCGTACGTAACGCTAGCATGACT
||| | ||    || |||
ATGCG-TCCGCG-TAACG

Thank you very much.

mismatch script INDEL • 3.8k views
ADD COMMENT
1
Entering edit mode

Ok, so when you say how to output do you mean 'how to represent them properly' or 'how to write a code to represent them the way you displayed'? I think I got a bit confused there.

ADD REPLY
0
Entering edit mode

Hi Jordan,

I mean "how to represent them properly?" I know little about CIGAR string. Any tools that helps align my short sequence to the genome and output mismatch and INDEL information will be useful to me. But one important thing is that I would allow fuzzy match. Thanks!

ADD REPLY
0
Entering edit mode

I see. You can use BLAST and choose the option Align two or more sequences.

You can put in query and target sequence in fasta format. For e.g., in your case:

>query_seq
ATGCGTCCGCGTAACG

>target_seq
ATGGATCGTACGTAACGCTAGCATGACT

You need to give your match, mismatch and gap scores. Any tool which does alignment needs them.

This will display the type of alignment you mentioned in your question. Not a CIGAR string though.

ADD REPLY
1
Entering edit mode
10.5 years ago
Jordan ★ 1.3k

You mean something like a CIGAR string?

3M1I1M1D2M4X1D5M
  • M - Match
  • X - Mismatch
  • I - Insertion
  • D - Deletion
ADD COMMENT

Login before adding your answer.

Traffic: 1975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6