Entering edit mode
9.0 years ago
aleka
▴
110
I use cufflinks pipeline to perform RNA Seq. I have some genes in gtf but after I perform cuffmerge, there are some genes missing.
Is that normal?
Why?
Yes, the merging of genes seems to happen pretty frequently. If you have nearby genes they can sometimes get merged together (possibly correctly, though!).
is there a specific reason that they merge? Just because they are close to each other?
What I have is 3 different genes, that have the same assembled merged gene ID but have different transcripts.
I was expecting that since cuffmerge merges the gene IDs, it would merge the transcripts as well, and this could have been the reason that it merges the genes together. But in my case I get different assembled merged transcript IDs.
Any idea why?
It's likely that the assembled transcripts between the different genes overlap a bit, which would result in merging the genes. I tend to see this more often when the 3' end gets extended compared to what's annotated and genes are tail to tail (and an unstranded library is being used, though a head to tail gene configuration can show the same thing).
Hi. thanks for your reply. very useful. Indeed I checked in the isoforms file and it seems that the three transcripts that have the same assembled gene ID have exactly the same position on the chromosome but their length differs, which I suppose means that they are somehow overlapping. is there any way to see if the 3' end gets extended to what is annotated?
You can usually see that visually with IGV.
Hi, I was just wondering whether you know if the genes are being merged together because the CDS coordinates among the genes overlap or because the reads overlap, which results in the actual transcript overlap. In my case the coordinates of the genes overlap and also some reads among the genes overlap, so I am not sure on what the merge is based (coordinates or reads).
The CDS doesn't need to overlap for the genes to be merged.
so it is mainly based on the overlap of the reads among the genes. Is that right?
Correct
great. thanks
Hi. I was wondering if there is any way to find the coordinates of the transcripts that overlap. In the merged.gtf file there are only the coordinates of the genes. I didn't find them somewhere in my files, but maybe I miss something.
No clue, you might need to manually write something to do that.