Hi everyone,
I'm having some troubles with the indexing of the reference genome (GRCh38) with 'bowtie2-build':
bowtie2-build ReferenceGenome GRCh38_index --large-index
The indexes I've build (by running this cmd) are much more smaller than the genome (GRCh38) indexes that can be downloaded from Bowtie2 website. How is it possible ? What is wrong?
Thanks in advance
Why you are not using the pre-build bowtie2 index, is there any specific reason?
bowtie2 large index option
About the genome size and index size, what do you mean much more smaller? there should be 6 files for the index. Do you mean all the files are smaller in size?
I would suggest to re-check the reference genome file, whether it is truncated or what? Another thing you can do is just run the alignment and check the difference between your index and the bowtie2 pre-built index.
I have to use the indexes build with bowtie2-build for research reason.
The 6 files that I've build are all smaller in size than the pre-build bowtie2 index.
The size of my files is respectively: 60.8 MB, 78.6 MB, 18.1 kB, 39.3 MB, 60.8 MB, 78.6 MB. The size of the pre-build bowtie2 indexes is respectively: 982.5 MB, 733.7 MB, 10.9 kB, 733.7 MB, 982.5 MB, 733.7 MB.
The reference genome file seems to be ok (it is not truncated).
When I perform the alignment with the pre-build indexes the overall alignment rate is very high (98%), while when I perform the alignment with the indexes I've build the overall alignment rate is very low (35%). Which can be the problem?
Thank you for your reply.
I think there is a problem with your reference genome. Can you check/compare the chromosome sizes of the references (from the bwotie2 index that you build and one obtained from the website/pre-built index)? You can obtain this information from the header of the alignment Link.
Where did you download the reference genome and what is the file size?
I've downloaded 'Homo_sapiens.GRCh38.dna.alt.fa.gz' from Ensembl and the file size is 56.8 GB. Where should I download the reference genome ?
When I try to run bowtie2-inspect on the indexes I've build I get the following error: Could not locate a Bowtie index corresponding to basename "GrCh38_index".