I am very new to RNA-Seq. I am trying to align my samples with STAR. I am generating the genome index myself. Because I was hoping to add the spike-in sequence to the GTF and FASTA files.
There are 2 GTF files one with CHR in the name and one without. I was wondering which one should I use and how they are different. I have not figured this out just by opening the files.
Thank you,
If you are not sure too, would you please let me know which one you use for your analysis.
The one with CHR seems to have more lines and it seems to be scaffold genes. Am I missing something?
ADD COMMENT
• link
updated 7.7 years ago by
Emily
24k
•
written 8.1 years ago by
rf
▴
60
3
Entering edit mode
They're the same, just one doesn't have the prefix. I often use the one without the 'chr' prefix, and when references needed to be mixed with those downloaded from the UCSC genome browser, I remove the prefix manually.
The one without 'chr' contains annotations for genes on unplaced or unlocalized contigs, while the one with 'chr' only contains annotation for assembled chromosomes, both of them have no prefix 'chr' in chromosome name, see this example:
In my opinion, use either one is OK for normal DE analysis. However, if you do not want to loose information about any annotated gene, use the one without 'chr'. (And sorry for the late response).
They're the same, just one doesn't have the prefix. I often use the one without the 'chr' prefix, and when references needed to be mixed with those downloaded from the UCSC genome browser, I remove the prefix manually.
I don't think this is accurate. OP was referring to the presence or absence of
chr
in the file name, not the contig names. Please see A: Difference between GTF file with CHR and without CHR. ENSEMBLthank you very much