Dear Group,
I have some additional contigs along with the chromosomes in my reference genome file. These contigs are parts of the chromosomes and the genomic locations of these additional contigs are known. These are kept in the reference file in the file by an algorithm (I am not going in details). My reference file looks something like this: >Chr1 >Chr2 >Chr3 >Chr1:1000 >Chr1:2000 (where >Chr1:1000 >Chr1:2000 are additional contigs. 1000 and 2000 are the location of this contig in the file)
In my gtf file I have information for only Chr1, 2 and 3. My question is how tophat will treat these contigs. For mapping the reads onto this contigs, will tophat pick the information from gtf file and treat it as Chr1 starting from bp1 or these contigs will be treated as separate chromosomes and the mapping will be done considering that information for gene models is not available? Please help.
Thanks, Ritu
Can you make it more clear?
Hi ashutoshmits, In my file >Chr1:1000 is a contig and is 200 bp long. So >Chr1:1000 contig spans from 1000-1200 on Chr1. In gtf file I have information for only chr1 2 and 3. My question is how tophat will treat this contig. For mapping the reads onto this contig, will tophat pick the information from gtf file and treat it as Chr1 starting from bp1? Does it make my question clear.
Thanks for the help!
agree with dpryan79, if "...these contigs are parts of the chromosomes and the genomic locations of these additional contigs are known...", why do you have them at the first place?