Hello everyone,
I'm running a bunch of computational approaches (pangenome graphs) on my data to figure out new/fast ways to get variation info from genomic data. I have a Reference FASTA file + VCF file of different samples of the same specie (plant).
The idea is to do comparative genomics between all the genomes and retrieve relevant information according to query criteria.
Through articles, I have noticed that researchers have some idea about the samples, like whether they are related or not, to which degree they are different/similar etc, In order to evaluate what I'm doing. I want to start by knowing my data.
Sorry, i'm new and a bit lost, any help about what kind of info I should look for in the first place?
The more variation the more distant samples are to one another.
But other than that ... it is a very complicated question and you'd be well served to hit some training. Evolutionary genomics is a very(!) complicated subject.
Please be more specific. What kind of information are you looking for? Can you give some examples from previous publications?