Hi, I have Illumina PE sequencing data of different particular plant species accessions and I am interested in mapping the reads to the available genome and later on variant calling. For reference genome, there are two files available for the reference genome, i.e., one file has the scaffold data having 1.3 GB data (https://www.ncbi.nlm.nih.gov/data-hub/taxonomy/176265/), while another file has the genome assembly data (~339 MB) (http://www.pitayagenomic.com/download.php). So suggest to me which one I should use as a reference genome for the mapping the reads.
339 MB is not a genome length, its a size of a FASTA file compressed with gzip. I suppose, after you decompress it, the genome size you'll see will be approximately 1.3 Gbp.
Note: This question should be titled: how do I select the proper reference genome. (so I have changed the title as a moderator)
To which the answer is that you have to assess the completeness and quality of each genome and then think about which one you think is more suitable for your needs. Read up on publications that talk about the differences and tradeoffs.
In the next step, create a modular pipeline so you can rerun your analysis with minimal fuss with both genomes.
Now you can evaluate and characterize the anticipated differences and the observed ones.
Thank you so much for your great suggestion. Really helpful