Hi All!
I am new to DEXSeq. When looking at the counts' files, I see a lot of reads are _empty assignment. From this thread, I learned these are the reads DEXSeq doesn't consider in the actual counts and are ones that fall in the intronic region. Since I am bound to use "-r no" option (because the gene I am interested in a gene that gets aggregated with 5 neighbor genes which are undesirable for my analysis), I see read assigned to _empty .
ENSG00000288588.1:002 0
ENSG00000288588.1:003 0
ENSG00000288588.1:004 0
ENSG00000288588.1:005 0
ENSG00000288588.1:006 0
_ambiguous 26889
_ambiguous_readpair_position 0
_empty 16171092
_lowaqual 0
_notaligned 0
Using '-r no' option
ENSG00000288588.1:002 0
ENSG00000288588.1:003 0
ENSG00000288588.1:004 0
ENSG00000288588.1:005 0
ENSG00000288588.1:006 0
_ambiguous 37555
_ambiguous_readpair_position 0
_empty 20575375
_lowaqual 0
_notaligned 0
My data contains ~63 million reads that are uniquely mappable and rest ~22 million are multi-mappable reads giving ~85 million mappable reads. Loosing 16-20 million reads means losing 1/4th of the data.
Relevant Codes
python ./DEXSeq/python_scripts/dexseq_prepare_annotation.py -r no /rnaseq/gencode.v33.annotation.gtf gencode.v33.gff 2> gff.stderr
python ./DEXSeq/python_scripts/dexseq_count.py -p yes -r pos -s reverse -a 10 gencode.v33.gff $p ${name}.Readcounts.txt 2> ${name}.stderr
Alignments were performed using RSEM, GRCh38, v33 primary assembly, STAR aligner.
I used the following command