Entering edit mode
2.8 years ago
LDT
▴
340
Dear all,
I have a reference/subject that looks like this
CGAAGCTCTCCTACGNNNNNNNNNNNNNNNNNNNNNNNNNCAGTCCAGCGCC
and I want to blast on it multiple queries that look like this
>M01755:672:000000000-K43MP:1:1101:15627:1757 1:N:0:1
GGTCGAGGTCGGTGTAGCGTCGTAAGCTAATACGAAAATTAAAAATGACAAAATAGTTTGGAACTAGATTTCACTTATCTGTTTGTCGCTGGACTGACTGCACTGTTGTTTTTCATGAGAACGTAGGAGAGCTTCTTGGCCATCGGCCCAA
my intention is to reveal like this the masked area, NNNNNNNNNN, in the reference. It seems that blast does not like NNNNNNNN. Any idea?
you can consider to run blast with a custom made scoring matrix. This will require to dig a bit in the blast code but it is doable (I've done it myself in the past as we used to use a custom matrix in annotation purposes).
it would be custom as to include the N in it and give it a match score to any other nucleotide.
on the other hand I do also have the feeling there must be more appropriate tools than blast to do this (some regex matching perhaps? )