Hi biostars, I know this question has similar here enter link description here. but mine a little ebit diferente I'm confused about which reference genome to pick up for genome alignment (assembly-level: scaffolds). My purpose is to map some specific individuals (partridges). I have Illumina raw-reads paired-end.
The thing is that I downloaded the ref genome from this link enter link description here that corresponds to that described in the paper Then I also downloaded this one from NCBI genome enter link description here
I used grep and wc cmmand and got contraduction
ws -l JADBKV01.1.fa
7143241 JADBKV01.1.fa
wc -l GCA_019345075.1_ASM1934507v1_genomic.fna
12859261 GCA_019345075.1_ASM1934507v1_genomic.fna
grep -c "^>" GCA_019345075.1_ASM1934507v1_genomic.fna
10598
The one that has been used in the paper
grep -c "^>" JADBKV01.1.fa
26
Then I looked at each one from the text editor. The first has 26 scaffolds, and the second has 10598 scaffolds with many lowercase and uppercase.
I would go with the one with 26 scaffolds because that is what they used in the paperenter link description here But I'm not sure it is the right one.
Any help would be appreciated. Thanks
Can you please confirm to me in case GenBank/RefSeq not available for this particular species, "Alectoris rufa," could I use JADBKV01.1.fa not available Genebank I appreciate any help you can provide.
https://www.ncbi.nlm.nih.gov/nuccore/JADBKV000000000.1 is the WGS project that is associated with this genome. It appears to have been submitted in Oct 2020 where as the genome sequence page https://www.ncbi.nlm.nih.gov/data-hub/genome/GCA_019345075.1/ is indicating a date of July 2021.
As to why there is such a big difference in the number of scaffolds is puzzling. It is possible that NCBI created the genome assembly using the submitted raw data where as the JAD* is what the authors made.
You may want to email the authors and ask for clarification about the big discrepancy in number of scaffolds.