I made a fasta file combining the genome annotations from 2 organisms(S.cerevisiae and S.Pombe). I want to use this file as a reference to align my RNAseq reads. How do I generate HISAT2 indexes for this file?
I read the hisat2 manual and looked at a few blogs online but nothing seems to work.
Where do I generate indexes? Is it possible to do it on a computer cluster like ada?
@ genomax
We used S.Pombe as a spike in control while making library preps for S.Cerevisiae samples. I would need a composite index file to align my reads. As I wrote before, I have read the manual multiple times and find the instructions quite vague and non specific. That being said, I'm still a novice trying to learn things as I go along.
@neha: Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.
It is a rather odd choice of using S. pombe as a spike-in since those two yeasts are relatively similar. You are likely going to have the problem of many reads multi-mapping (mapping to both genomes). For RNAseq data such reads are not counted by default.
To build the genome.fa file you could concatenate chromosome sequence of both yeast (make sure the fasta headers contain something to distinguish S. pombe from S. cerevisiae, e.g. both can't have chr1 in header, make them, chr1_pombe and chr1_cere, you get the idea).
Could you explain in greater detail how you used a S. pombe spike-in control? Do you know the exact transcript composition and abundancies of the S. pombe spike-in? Did you add this spike-in to all your S. cerevisiae samples?
Building HISAT2 index should be simple as hisat2-build genome.fa index_name. You can find detailed options (if you need them) on the manual page.
That said why are you building a composite index of two species? Does your sample have both genomes in it? Are you looking to separate the reads for the two?
@ genomax We used S.Pombe as a spike in control while making library preps for S.Cerevisiae samples. I would need a composite index file to align my reads. As I wrote before, I have read the manual multiple times and find the instructions quite vague and non specific. That being said, I'm still a novice trying to learn things as I go along.
@neha: Please use
ADD REPLY/ADD COMMENT
when responding to existing posts to keep threads logically organized.It is a rather odd choice of using S. pombe as a spike-in since those two yeasts are relatively similar. You are likely going to have the problem of many reads multi-mapping (mapping to both genomes). For RNAseq data such reads are not counted by default.
To build the
genome.fa
file you could concatenate chromosome sequence of both yeast (make sure the fasta headers contain something to distinguish S. pombe from S. cerevisiae, e.g. both can't havechr1
in header, make them,chr1_pombe
andchr1_cere
, you get the idea).You can then use the command below to create the genome index
Then you would use the
cere_pombe
name in your alignments.Could you explain in greater detail how you used a S. pombe spike-in control? Do you know the exact transcript composition and abundancies of the S. pombe spike-in? Did you add this spike-in to all your S. cerevisiae samples?