genes in gtf do not exist in the merged.gtf after cuffmerge
1
0
Entering edit mode
9.0 years ago
aleka ▴ 110

I use cufflinks pipeline to perform RNA Seq. I have some genes in gtf but after I perform cuffmerge, there are some genes missing.

Is that normal?

Why?

next-gen rna-seq alignment • 2.8k views
ADD COMMENT
0
Entering edit mode
9.0 years ago
aleka ▴ 110

I found it. There were some genes that in the merged.gtf file were with their gene ID and some other with their gene name.

However, I saw that different genes sometimes correspond to the same merged_id in the merged.gtf file. Is that normal?

ADD COMMENT
0
Entering edit mode

Yes, the merging of genes seems to happen pretty frequently. If you have nearby genes they can sometimes get merged together (possibly correctly, though!).

ADD REPLY
0
Entering edit mode

is there a specific reason that they merge? Just because they are close to each other?

What I have is 3 different genes, that have the same assembled merged gene ID but have different transcripts.

I was expecting that since cuffmerge merges the gene IDs, it would merge the transcripts as well, and this could have been the reason that it merges the genes together. But in my case I get different assembled merged transcript IDs.

Any idea why?

ADD REPLY
0
Entering edit mode

It's likely that the assembled transcripts between the different genes overlap a bit, which would result in merging the genes. I tend to see this more often when the 3' end gets extended compared to what's annotated and genes are tail to tail (and an unstranded library is being used, though a head to tail gene configuration can show the same thing).

ADD REPLY
0
Entering edit mode

Hi. thanks for your reply. very useful. Indeed I checked in the isoforms file and it seems that the three transcripts that have the same assembled gene ID have exactly the same position on the chromosome but their length differs, which I suppose means that they are somehow overlapping. is there any way to see if the 3' end gets extended to what is annotated?

ADD REPLY
0
Entering edit mode

You can usually see that visually with IGV.

ADD REPLY
0
Entering edit mode

Hi, I was just wondering whether you know if the genes are being merged together because the CDS coordinates among the genes overlap or because the reads overlap, which results in the actual transcript overlap. In my case the coordinates of the genes overlap and also some reads among the genes overlap, so I am not sure on what the merge is based (coordinates or reads).

ADD REPLY
0
Entering edit mode

The CDS doesn't need to overlap for the genes to be merged.

ADD REPLY
0
Entering edit mode

so it is mainly based on the overlap of the reads among the genes. Is that right?

ADD REPLY
0
Entering edit mode

Correct

ADD REPLY
0
Entering edit mode

great. thanks

ADD REPLY
0
Entering edit mode

Hi. I was wondering if there is any way to find the coordinates of the transcripts that overlap. In the merged.gtf file there are only the coordinates of the genes. I didn't find them somewhere in my files, but maybe I miss something.

ADD REPLY
0
Entering edit mode

No clue, you might need to manually write something to do that.

ADD REPLY

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6