Hi all,
I have a PE reads from Illumina (for human breast cancer), I want to do the mapping against the reference human genome GRCh37 (downloaded from iGenome). I perform the mapping with Tophat2 without building the bowtie2 indexes (because, the indexe is included in the downloaded genome). The downloaded genome has ~20Go, and the file I want to map ~ 5 Go (for PE, so two file to include in the Tophat2 command). I have read some posts that recommend to build the indexes (even if the indexe is included in the downloaded reference genome). So, my question is to know what the difference to use the pre-made indexes, and to build the own indexes ?
thanks in advance
I think you are referring to building a "transcriptome" specific index (that uses only the "known" genes that you provide via an annotation file). See the section on "
Supplying your own transcript annotation data
" on TopHat manual page.