This is a little basic I'll bet, however is it possible to get two different results from the same data set?
For example we have sequenced a coronavirus and we are interested in the S1 segment of the spike protein and when I use the RefSeq for reference guided whole genome assemble and just select the area I'm interested in; I get a slightly different result when I just use a close reference of the gene of interest. It's not by much less then 1%, but its enough to in the right places to make it so we have to redesign a pcr assay.
It looks like from that the whole genome approach is the better way to go for our data, but I do not understand why they should be different. It is still the same underlying data.
Is there a "rule of thumb" or "best practice" to prevent this from happening?
I'm sure its more of my bioinformatic skills then anything but I would like to improve. So any advice would be helpful.