Entering edit mode
4.4 years ago
kristina.mahan
▴
170
I have a genome assembly that I used to make a blast database. I then took the 18S sequence and blasted against the database. I got results showing contigs that had the 18S matching 100% identity, with 0 gaps and 0 mismatches. Once I extracted the contig and aligned the 18S sequence in Clustal- I see several gaps and mismatches. How can that be?
did you extract the whole contig or only the region indicated by the blast result? in the latter, what you describe should not happen, in the former it could as clustal is a global aligner and will thus try to align the two sequences completely resulting a gapped alignments (and likely not very accurate).
Initially I was extracting the whole contig. I just tried it now extracting 2500 bases from the contig and aligning using clustal omega and still not seeing the 100% seq identity as reported in the blastdb results.
I used the contig to make a blast database and then blasted using the 18S sequence and now the sequence identity is only 92% which matches up with the results I was seeing when I aligned the 18S with the whole contig or the part of the contig that aligned to the 18S.
ok, so that already adds up
in your original blast was the hit on single match on a single contig? It could also be that blast did not align the complete 18S sequence and only reported the 100% match region?
Can you perhaps post an extract of that orginal blast result?
I figured it out. I extracted the contig from a contig.fasta file but then when I went to make the database of the full assembly I was using the assembly.fasta file. When I use the same assembly file to extract the contig and to make the database of the assembly they match 100%. I didn't realize the contigs in these files would be different.
https://ibb.co/r3Ynqk9
https://ibb.co/gmL7SS3 The headers for the output are: # Fields: query id, subject ids, query acc.ver, subject acc.ver, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score