Creation Of Exoncountset In Dexseq
1
0
Entering edit mode
11.4 years ago
alittleboy ▴ 220

Hi,

I am using DEXSeq for differential exon usage tests. In the vignette of producing ExonCountSet object: http://bioconductor.org/packages/devel/data/experiment/vignettes/pasilla/inst/doc/create_objects.pdf, the author used dexseq_prepare_annotation.py to convert the GTF file to GFF file. In the GFF output, I see that (since the GTF is downloaded from Ensembl) the gene_id's start with "ENSG".

I know that the next step is to use dexseq_count.py on the GFF and SAM files to generate counts. However, because currently we have the count data file (which we prefer to use), we are hoping to use our own counts (i.e. the treated2fb.txt as in the vignette example) for the analysis. The issue is that, our count files contain EntrezGene ID's, NOT Ensembl IDs, and the conversion between the two is not bijective (i.e. 1-1). Therefore, we I run the read.HTSeqCounts() function in R, the error message "Count files do not correspond to the flattened annotation file" appears.

Question:

(1) is Ensembl GTF the only input for dexseq_prepare_annotation.py? It seems the resultant GFF file contains only Ensembl gene IDs, accordingly...

(2) in my case of non-Ensembl gene IDs, how can I instruct or manipulate the codes to generate an ExonCountSet object?

Thank you!

exon ensembl • 4.4k views
ADD COMMENT
2
Entering edit mode
11.4 years ago
venks ▴ 740

Just to cross check see if your following the same

python dexseq_prepare_annotation.py hg19.gtf hg19.gff

You might want to try UCSC reference genome sequence. Also see if you are using right reference say for eg., HG19 build.

Then try

~some_location/samtools view ~some_/location/file.bam | python /some_location_where_dexseq_py_is/dexseq_count.py --paired=no -s no -a 10 /location/reference37.gff - "countfile.txt"

This will give you count table.

To my knowledge you are getting the error in ECS because of the wrong reference genome that you might have picked.

Good luck.

ADD COMMENT
0
Entering edit mode

Thanks a lot! I think we prefer to use Ensembl instead of UCSC for the annotation file. Simon pointed out here one solution: http://seqanswers.com/forums/showthread.php?p=108191#post108191

ADD REPLY
1
Entering edit mode

Perfect! I never used the newexoncount function. Hope it generated ecs without any problems.

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6