We are trying to index the UMD 3.1.1 Bovine genome using the HISAT2 software.
The problem is that we need more than 200 Gbytes of memory for the hisat2-build script, and we were unable to get enough hardware resources in the Argentine scientific computing network. We assume more than 200Gb because of this note in the HISAT2 manual:
If you use --snp, --ss, and/or --exon, hisat2-build will need about
200GB RAM for the human genome size as index building involves a graph
construction.
Do you know any facility, preferably free of charge, where we could run the indexer, provided that I already wrote a script which automates all the steps ?
You are right, the output is wrong because I've provided the input chr list not in the comma delimited format required by HISAT2. I've did this before but somehow uploaded old files to GitHub.
Now I've re-uploaded the files in (hopefully) the correct format, however I don't want to abuse your kindness, let me know if you still can allocate some minutes.
97M Mar 6 09:43 indices/AC_000159.1.fa,.1.ht2
38M Mar 6 09:43 indices/AC_000159.1.fa,.2.ht2
35K Mar 6 09:39 indices/AC_000159.1.fa,.3.ht2
38M Mar 6 09:39 indices/AC_000159.1.fa,.4.ht2
80M Mar 6 09:44 indices/AC_000159.1.fa,.5.ht2
39M Mar 6 09:44 indices/AC_000159.1.fa,.6.ht2
351K Mar 6 09:39 indices/AC_000159.1.fa,.7.ht2
72K Mar 6 09:39 indices/AC_000159.1.fa,.8.ht2
Apparently HISAT2 cannot parse spaces after commas in the input chromosomes list? I removed spaces and re-uploaded the GCF_AC.txt and GCF_ACNW.txt. The output should be eight files named like these:
You could send a request to Daehwan Kim (one of the HISAT2 authors) to see if his lab can build one for you.
I wrote a mail to hisat2.genomics@gmail.com some days ago but unfortunately still not received an answer. Maybe writing to him directly would help?
I would think so. He has his own lab now (linked above).