I am looking for novel genes in two different experimental groups ( differential expression analysis).
When do I include the following information in my command?
Cufflinks- reference annotation (GTF)
Cuffmerge- reference annotation (GTF) and reference sequence (Fasta)
Cuffdiff- reference GTF
Thanks for the help.
Is it necessary to include the gene annotation in Hisat2 (I am not using Tophat) in addition to the indexed reference genome? What are the pros of doing this?
In
tophat
, it helped me avoid generation of multiple XLOC ids for single gene id. So I use GTF for tophat. But this happens rarely and only a handful of genes get multiple XLOC ids. I dont see any disadvantage. Dont know about HISAT2. But if HISAT2 allows reference GTF, I will use it.Okay, thank you. What is the benefit of including the reference GTF in Cuffdiff as well?
GTF file is essential for
cuffdiff
as it will act as a template to quantitate transcripts in GTF file using aligned reads from bam file.can it be the merged gtf created from cuffmerge or does the reference have to be included as well?
Use either
merged.gtf
or reference GTF, not both.