My task is to perform RNAseq analysis on the UCSC RN4 genome by first mapping the reads I have to the transcriptome and then mapping the leftover reads to the genome.
I have downloaded the genome from igenome (http://support.illumina.com/sequencing/sequencing_software/igenome.html) because it seems to have all files in the format required for tophat/bowtie.
My first problem is adding the ERCC sequence file (http://www3.appliedbiosystems.com/cms/groups/mcb_support/documents/generaldocuments/cms_095047.txt). I really dont which files these sequences must be added to. I know they must be added to both the reference transcriptome and genome but I don't know where in the download from igenome to find the appropriate files.
Advice would be much appreciated.
Thanks.
Thanks for your reply Genomax....here is what I have done:
Using the ERCC files with the igenome download:
Is this sufficient? How do I "rebuild all of the index files" as you suggest? is this still needed?
Regards.
Rebuilding aligner indexes is needed since the pre-existing indexes do not have information about the additional sequences you added to the genome file.
Please do all these steps (move the appended genome and GTF file) in a separate directory. This way there would be less chance of using files from the original iGenomes download by mistake.
Are you planning to use TopHat (you should consider using HISAT2 instead)? If yes then you would use
bowtie2-build
program frombowtie v.2.x
(you will need to download it, if you don't have it). Details are in TopHat manual.Excellent...I have that....thank you so much. I just started the build programme. I'm guessing this is going to take a few hours. This is part of a University assignment so I have to use TopHat for now.
Thanks again