Hi everyone,
I am currently running Dual-Seq RNA-seq approaches. I already received the data and now need to map it to the genome. In order to do so, I am trying to create an indexed genome consisting of both organisms present in the data and index it using STAR.
My current approach is to concatenate fasta as well as gtf files.
For the gtf files I already am running in the following problem: One of my organims solely provides a gff3 annotation file from the official database, while the other provides gff3 as well as gtf. So far I have been downloading the gff3 files for both, converted them to gtf files and concatenated them thereafter. I did the same for the fasta files and ran STAR:
STAR --runMode genomeGenerate --runThreadN 15 --genomeDir /path/to/genome/file --genomeFastaFiles path/to/fasta --sjdbOverhang 100 --sjdbGTFfile path/to/gtf --limitGenomeGenerateRAM 300000000000
I am running this on a server with an assigned time. However, the generation of the indexed genome takes a very long time as the job keeps failing due to time restrictions.
I would be very happy about any advice from you, hoping some of you have experience with a similar issue. I am very happy to elaborate further or provide more information in order to get this resolved!
Thank you in advance for any input on the matter - potentially providing a simpler solution to my problem!
I guess the double space after STAR is a typo? And
--genomeDir
should lead to a directory not a file, like so: