Hi, I have reads from a plant species. I want to identify to which lineage (subspecies) my reads belongs based on distinct variants found in genome representatives of these different lineages. Ex: T at postion 3 in lineages 1, C at position 3 in lineage 2 - Reads : C at these position, probably lineage 2.
I mapped my reads to a reference and called SNPs. I want to look if some of these SNPs are found in the other genome. The problem is that postion are not identical between reference genome. Any idea about how I can deal with that ?
I tried to align my different genomes to identify lineage-specific SNP but as gaps are created, they can't be compared to my other list.
Thanks
I can't give an answer but I can help with trying to think of a solution. What about only looking at variants that are present in genes (coding DNA) that both species share.
Thank you for the suggestion. I have a very limited set of reads, not necessary on coding sequences so I strive to keep the maximum information. I eventually simulated reads (without error and with high coverage) from one reference genome and mapped them to the other reference. That way I get variants position according to the same reference genome (without indels messing things up). I hope that's right!