Due to a low read / repetitive sequence / gap, a gene was split into two genes which were annotated separately in the genome. As a result, this issue carried through to the transcriptome.
The transcriptome shows these two different "genes" have wildly different expression profiles. What could explain this – considering that we're now sure they are one gene?
I have experimentally confirmed that they are one gene, by sequencing cDNA. I now need to modify the .fasta files of the genome and transcriptome to remove the two split "genes" and add in the combined, correct gene. Is this first part is as easy as deleting one and pasting the correct sequence over the other using a text editor...?
Any special considerations to make sure the corrected transcript works properly as a Cuffdiff reference?
Thanks
If you run
Tophat
andCufflinks
without any guide GTF file, do you see the reads assembling into one gene?The reason behind differential expression of these two regions can be alternative promoter usage or alternative polyadenylation depending on the orientation of these regions on the gene.