How to choose the right reference genome (Assembly level: Scaffolds)
1
0
Entering edit mode
2.1 years ago
ben@f ▴ 20

Hi biostars, I know this question has similar here enter link description here. but mine a little ebit diferente I'm confused about which reference genome to pick up for genome alignment (assembly-level: scaffolds). My purpose is to map some specific individuals (partridges). I have Illumina raw-reads paired-end.

The thing is that I downloaded the ref genome from this link enter link description here that corresponds to that described in the paper Then I also downloaded this one from NCBI genome enter link description here

I used grep and wc cmmand and got contraduction

ws -l JADBKV01.1.fa
7143241 JADBKV01.1.fa

wc -l GCA_019345075.1_ASM1934507v1_genomic.fna
12859261 GCA_019345075.1_ASM1934507v1_genomic.fna

grep -c "^>" GCA_019345075.1_ASM1934507v1_genomic.fna
10598

The one that has been used in the paper

grep -c "^>" JADBKV01.1.fa
26

Then I looked at each one from the text editor. The first has 26 scaffolds, and the second has 10598 scaffolds with many lowercase and uppercase.

I would go with the one with 26 scaffolds because that is what they used in the paperenter link description here But I'm not sure it is the right one.

Any help would be appreciated. Thanks

Avian Scaffolds genome Reference species assembly • 825 views
ADD COMMENT
1
Entering edit mode
2.1 years ago
GenoMax 147k

It is safer to use an assembly that is available from NCBI GenBank/RefSeq since that is going to be the most complete representation of that particular genome. Unless you know that JADBKV01.1.fa is a newer version that has not yet made it to GenBank.

ADD COMMENT
0
Entering edit mode

Can you please confirm to me in case GenBank/RefSeq not available for this particular species, "Alectoris rufa," could I use JADBKV01.1.fa not available Genebank I appreciate any help you can provide.

ADD REPLY
1
Entering edit mode

https://www.ncbi.nlm.nih.gov/nuccore/JADBKV000000000.1 is the WGS project that is associated with this genome. It appears to have been submitted in Oct 2020 where as the genome sequence page https://www.ncbi.nlm.nih.gov/data-hub/genome/GCA_019345075.1/ is indicating a date of July 2021.

As to why there is such a big difference in the number of scaffolds is puzzling. It is possible that NCBI created the genome assembly using the submitted raw data where as the JAD* is what the authors made.

You may want to email the authors and ask for clarification about the big discrepancy in number of scaffolds.

ADD REPLY

Login before adding your answer.

Traffic: 2374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6