Question

DEXSeq with cufflinks gtf file

0

Entering edit mode

10.1 years ago

Assa Yeroslaviz ★ 1.9k

Hi,

I would like to run DEXSeq with the results we are getting from cufflinks. I would like to use for that the gtf file created after running cuffmerge.

The problem I have is, that in this gtf file there are some exons with no strand information.

cut -f7 cufflinksAnalysis/cuffmerge/merged.gtf | sort | uniq -c
 126224 -
   3105 .
 124468 +

Interestingly, these genes have only one exon. None of them have more than that.

awk '{if($7=="."){print $0}}' cufflinksAnalysis/cuffmerge/merged.gtf | grep -v 'exon_number "1"' | wc -l
0
awk '{if($7=="+"){print $0}}' cufflinksAnalysis/cuffmerge/merged.gtf| grep -v 'exon_number "1"' | wc -l
104447

As these genes have only one exon, there can't be any differentially exon usage or alternative splicing events.

I know I can modify the dexseq_prepare_annotation.py script from the DEXSeq package, so that it would allow to work with non-stranded information, but I am not sure if these is a good idea, as we are also looking into some genes which have a certain overlap on the genomic sequence and without the strand information we will loose this genes/exons.

awk '{if($7!="."){print $0}}' cufflinksAnalysis/cuffmerge/merged.gtf > cufflinks_Dmel_DEXSeq_strandInfoIncluded.gff

My question is - can I just extract the exons without the strand information (the ones with ".") from the merged.gtf file and than run it with the DEXSeq package/scripts, or will it alter the results in any way?

Thanks
Assa

cuffmerge DEXSeq cufflinks gtf • 3.3k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

10.1 years ago

Sam ★ 4.8k

If I remember correctly, the DEXSeq flow are as follow:

Provide a GTF file and then break it down to exons units
Count the number of reads per exome (DEXSeq should be using HTSeq as the counting algorithm)
Run DEXSeq

If you extract the exons without the strain information and then merge them back together, some of the reads which will originally classified as ambiguous case might be counted twice. So that might have a slight impact to the results. However, that shouldn't really be a big problem as long as you are only trying to compare between each exons in difference conditions instead of comparing the expression of different exons. Maybe someone else have better idea as to how to deal with the data.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by Sam ★ 4.8k

0

Entering edit mode

The problem is mostly that some of the DEXseq methods (e.g. for visualization) need a strand to work. Since single-exon genes can't show differential exon usage (by definition), it makes sense to just do what komal.rathi suggested and just exclude such genes.

As an aside, these tend to just be repeat regions anyway (at least in my experience), so for me they've always just constituted noise.

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

0

Entering edit mode

And in another perspective, the multiple testing problem is already huge when working with per exone situation so it might not worth the effort to retain that many regions anyway?

ADD REPLY • link 10.1 years ago by Sam ★ 4.8k

0

Entering edit mode

thanks for all the help

ADD REPLY • link 10.1 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

I don't know if it's in the released version yet or not, but since DEXSeq is using DESeq2 internally now, there should be a switch at some point to doing independent filtering before producing the results (I though Alejandro Reyes mentioned that somewhere, though I'd need to look to be sure). This would remove the issue of "too many regions". Granted, on could just do this one's self...

ADD REPLY • link 10.1 years ago by Devon Ryan 104k

score 5 · Accepted Answer · 2014-11-07

5

Entering edit mode

10.1 years ago

komal.rathi ★ 4.1k

You can safely remove "any" single-exonic genes because you won't get differential exon usage for them anyway.

ADD COMMENT • link 10.1 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

I've moved this to an answer, because it nicely puts things back into perspective :)

ADD REPLY • link 10.1 years ago by Devon Ryan 104k