How to select the right reference genome
2
1
Entering edit mode
2.2 years ago
rj.rezwan ▴ 10

Hi, I have Illumina PE sequencing data of different particular plant species accessions and I am interested in mapping the reads to the available genome and later on variant calling. For reference genome, there are two files available for the reference genome, i.e., one file has the scaffold data having 1.3 GB data (https://www.ncbi.nlm.nih.gov/data-hub/taxonomy/176265/), while another file has the genome assembly data (~339 MB) (http://www.pitayagenomic.com/download.php). So suggest to me which one I should use as a reference genome for the mapping the reads.

genome reads mapping diploid • 1.3k views
ADD COMMENT
1
Entering edit mode
2.2 years ago
shelkmike ★ 1.4k

339 MB is not a genome length, its a size of a FASTA file compressed with gzip. I suppose, after you decompress it, the genome size you'll see will be approximately 1.3 Gbp.

These two genome assemblies were made by different laboratories and published in the same year in the same journal: https://www.nature.com/articles/s41438-021-00501-6 and https://www.nature.com/articles/s41438-021-00612-0. These assemblies have similar scaffold N50, but the one from http://www.pitayagenomic.com/download.php has 19 times larger contig N50, so I suppose it's more accurate since scaffolding of short contigs is error-prone.

ADD COMMENT
0
Entering edit mode

Thank you so much for your great suggestion. Really helpful

ADD REPLY
0
Entering edit mode
2.2 years ago

Note: This question should be titled: how do I select the proper reference genome. (so I have changed the title as a moderator)

To which the answer is that you have to assess the completeness and quality of each genome and then think about which one you think is more suitable for your needs. Read up on publications that talk about the differences and tradeoffs.

In the next step, create a modular pipeline so you can rerun your analysis with minimal fuss with both genomes.

Now you can evaluate and characterize the anticipated differences and the observed ones.

ADD COMMENT
0
Entering edit mode

Thanks a bunch for your advice and its helpful.

ADD REPLY

Login before adding your answer.

Traffic: 1582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6