Hi,
I would like to run DEXSeq with the results we are getting from cufflinks. I would like to use for that the gtf file created after running cuffmerge.
The problem I have is, that in this gtf file there are some exons with no strand information.
cut -f7 cufflinksAnalysis/cuffmerge/merged.gtf | sort | uniq -c
126224 -
3105 .
124468 +
Interestingly, these genes have only one exon. None of them have more than that.
awk '{if($7=="."){print $0}}' cufflinksAnalysis/cuffmerge/merged.gtf | grep -v 'exon_number "1"' | wc -l
0
awk '{if($7=="+"){print $0}}' cufflinksAnalysis/cuffmerge/merged.gtf| grep -v 'exon_number "1"' | wc -l
104447
As these genes have only one exon, there can't be any differentially exon usage or alternative splicing events.
I know I can modify the dexseq_prepare_annotation.py
script from the DEXSeq package, so that it would allow to work with non-stranded information, but I am not sure if these is a good idea, as we are also looking into some genes which have a certain overlap on the genomic sequence and without the strand information we will loose this genes/exons.
awk '{if($7!="."){print $0}}' cufflinksAnalysis/cuffmerge/merged.gtf > cufflinks_Dmel_DEXSeq_strandInfoIncluded.gff
My question is - can I just extract the exons without the strand information (the ones with ".") from the merged.gtf file and than run it with the DEXSeq package/scripts, or will it alter the results in any way?
Thanks
Assa
I've moved this to an answer, because it nicely puts things back into perspective :)