Hi all,
I have a quick question regarding the starting of my masters project that I'm trying to understand, I am currently trying to create a reference panel of wgs for the species equus caballus (horse) as part of my research project and I am struggling to understand the different filter options on the ensembl-ebi website and NCBI SRA websites.
Essentially I am in the phase of collecting data and just wanted to know how to do it, the plan is to find out/collect the following
1) how many whole genome sequences are currently available for the species, 2) determine the frequency of each breed from the sequences 3) and then download all the wgs to create the reference panel
What filters should I be applying, or what databases should I be looking into as I am currently going around in circles. -> [update] I think I've made sense of somewhat of the database ENA and have some results but how would I now filter these results to refine a high quality reference panel.
I am currently doing research prior to my project commencing so any advice would great or even long term advice is great as well :D
Thanks and kind regards!
Looks like there are 5 assemblies available at NCBI: https://www.ncbi.nlm.nih.gov/genome/browse#!/eukaryotes/145/