dexseq_count.py gives zero counts for any exon
1
0
Entering edit mode
8.8 years ago

Hi, I run dexseq_count.py on my SAM file using a GFF file generated (from ensembl Homo_sapiens.GRCh38.83.gtf) with dexseq_prepare_annotation.py

The output file contains zero counts for any exon. I also tried using a modified gff file where instead of 1,2,3,... for chromosomes I had chr1,chr2,chr3,... Can you please help me to fix this?

I used:

DEXSeq 1.6.3
HTSeq 0.6.1p1
python 2.7.6

head gff file:

1   dexseq_prepare_annotation.py    aggregate_gene  11869   14409   .   +   .   gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 11869   12009   .   +   .   transcripts "ENST00000456328"; exonic_part_number "001"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 12010   12057   .   +   .   transcripts "ENST00000456328+ENST00000450305"; exonic_part_number "002"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 12058   12178   .   +   .   transcripts "ENST00000456328"; exonic_part_number "003"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 12179   12227   .   +   .   transcripts "ENST00000456328+ENST00000450305"; exonic_part_number "004"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 12613   12697   .   +   .   transcripts "ENST00000456328+ENST00000450305"; exonic_part_number "005"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 12698   12721   .   +   .   transcripts "ENST00000456328"; exonic_part_number "006"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 12975   13052   .   +   .   transcripts "ENST00000450305"; exonic_part_number "007"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 13221   13374   .   +   .   transcripts "ENST00000456328+ENST00000450305"; exonic_part_number "008"; gene_id "ENSG00000223972"
1   dexseq_prepare_annotation.py    exonic_part 13375   13452   .   +   .   transcripts "ENST00000456328"; exonic_part_number "009"; gene_id "ENSG00000223972"

head sam file:

@HD VN:1.0  SO:coordinate
@SQ SN:chr1 LN:248956422
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr12    LN:133275309
@SQ SN:chr13    LN:114364328
@SQ SN:chr14    LN:107043718
@SQ SN:chr15    LN:101991189
@SQ SN:chr16    LN:90338345
@SQ SN:chr17    LN:83257441
@SQ SN:chr18    LN:80373285
@SQ SN:chr19    LN:58617616
@SQ SN:chr2 LN:242193529
@SQ SN:chr20    LN:64444167
@SQ SN:chr21    LN:46709983
@SQ SN:chr22    LN:50818468
@SQ SN:chr3 LN:198295559
@SQ SN:chr4 LN:190214555
@SQ SN:chr5 LN:181538259
@SQ SN:chr6 LN:170805979
@SQ SN:chr7 LN:159345973
@SQ SN:chr8 LN:145138636
@SQ SN:chr9 LN:138394717
@SQ SN:chrMT    LN:16569
@SQ SN:chrX LN:156040895
@SQ SN:chrY LN:57227415
@PG ID:TopHat   VN:2.0.11   CL:/apps/leuven/thinking/2014a/software/TopHat/2.0.11-intel-2014a/bin/tophat --b2-sensitive --no-novel-juncs --output-dir /staging/leuven/stg_00003/users/Laura/RPL5_OtherGenes_CooperativeMutations_Isabel18062015/Mapping/2355_007_merged_vs_hg20 --transcriptome-index=/staging/leuven/stg_00003/resources/human/hg20/transcriptome_Ensembl_data/hg20_Ensembl_transcriptome /staging/leuven/stg_00003/resources/human/hg20/hg20 /staging/leuven/stg_00003/users/Laura/RPL5_OtherGenes_CooperativeMutations_Isabel18062015/CleanSequences/2355_007_merged_decontam.fastq
NS500200:118:HFFF3BGXX:3:11605:13657:15043  0   chr1    16184   50  76M *   0   0   ACACCCTGCAGAGCTGGACCCCTGAGCTAGCCATGCTCTGACAGTCTCAGTTGCACACACGAGCCAGCAGAGGCGT    AA6AAEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEE<EAE/EEEEEEEEEEEEEEEEE    AS:i:-10    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:0C72G2 YT:Z:UU NH:i:1
NS500200:118:HFFF3BGXX:4:22511:16316:6491   0   chr1    17464   50  76M *   0   0   GGTCCCAGGCCTCCCGAGCCGAGCCACCCGTCACCCCCTGGCTCCTGGCCTATGTGCTGTACCTGTGTCTGATGAC    AAAAAEEEEEEEEEEEEEEEEEEEEAEEEEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE6E    AS:i:-4 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:74C1   YT:Z:UU NH:i:1
NS500200:118:HFFF3BGXX:1:22104:5581:5386    0   chr1    19195   50  76M *   0   0   TGTAGCTCCCCTACCTCCAAGAGCCCAGCCCTTGCCCACAGGGCCACACTCCACGTGCAGAGCAGCCTCAGCACTC    AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEAEEE    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UU NH:i:1
NS500200:118:HFFF3BGXX:1:21211:14990:10490  0   chr1    19721   50  76M *   0   0   CCACAATTCAGAAAGAAAAAAGAAGAGCACCATCTCCTTCCAGTGAGGAAGCGGGACCACCACCCAGCGTGTGCTC    AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEAEEEEEE<EEEEAEEAEE    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UU NH:i:1

tail output

ENSG00000283122+ENSG00000118495:055 0
ENSG00000283122+ENSG00000118495:056 0
ENSG00000283122+ENSG00000118495:057 0
ENSG00000283123:001 0
ENSG00000283125:001 0
_ambiguous  2613
_ambiguous_readpair_position    0
_empty  15124232
_lowaqual   0
_notaligned 0
dexseq dexseq_count.py exon • 3.4k views
ADD COMMENT
0
Entering edit mode

Edit: Changed the format. Can you post the command ?

ADD REPLY
0
Entering edit mode
8.8 years ago
michael.ante ★ 3.9k

Hi Laura,

The gff has no "chr" included in the chromosomes' name, whilst the mapping has it included. Just use sed -i 's/^/chr/g' my.gff in order to fix this file.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Apparently the user already tried with "chr", as mentioned in the question.

ADD REPLY
0
Entering edit mode

I'm sorry, I haven't read that.

If it's not that easy, let me ask some questions: Is the Hg20 annotation you are aligning your reads to synchronous with the one of GRCH38.83? Is the library a stranded one? (If you are not sure, you may use RSeQC: infer_experiment.py) Have you checked your alignment visually with the IGV/IGB browser?

ADD REPLY
0
Entering edit mode

Thanks the problem was indeed the reference annotation file used. I thought it was the same I did the mapping with but it was indeed not the case. Also it was a strandedness problem: adding -s no it works fine

ADD REPLY

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6