Entering edit mode
8.3 years ago
Redmar
▴
20
I'm using abyss to assembly pair-end data, and then blast to pick out a set of genes I'm interested in. Sometimes, a contig contains a stretch of N characters, and I'm not sure how to interpret those.
gene TTTGC----------------CGGTGC
midl ||||| ||||||
cont TTTGCCGGTNNNNNNNNNNNNCGGTGC
Blast counts this as 16 gaps, since it has to insert 16 gaps to overlap the 4 basepairs and 12 N's from the contig. How certain is abyss that there should be 12 N's there, and how is this determined? Based on the blast result, and similar samples I ran this on, I would say those 12 N to "-" mismatches are wrong. But if abyss is certain there should be 12 N's, then I don't want to discount them.
Is it a contig or a scaffold?
It is a contig, from the contigs.fa output file from abyss