Hello, I have a really annoying problem. I need to sort my BLAST results by percentage identity (%ident). Some of my sequences have an ambig. base in them. I want these sequences to show up as 100% identical to identical sequences that have a resolved ambig in them.
example: ATGC and ATGS = 100% ident ATGG and ATGS = 100% ident
because otherwise(?): ATGC and ATGS = 75% ident TTCG and ATGC = 75% ident
I know for sure that is correct, because my sequencing run has multiple gene copies that are amost (99,9%) identical, but sometimes I have two copies in one sequencing reaction and the basecaller calls an S, for example, if one paralogue has a C and the other one a G. This step is beyond my control. I have >100.000 sequences.
Thanks!
I don't think that is possible unless you post-process your data. IUPAC codes are treated as mismatches in nucleotide alignment. See the help page here.