Using DEXSeq I would like to identify the differential exon usage in my samples. I did my alignment on GRCh38 with fasta and .gtf file from Ensembl (gencode v29). The Alignment was done by STAR (not done by me). I use dexseq_count.py to quantify the reads on every exon in my .bam files, however i have 0 counts for each read. I tried to run the script with various options for paired end data and strandedness however i receive the same outcome. I believe that the alignment/mapping was done correct since it was previously used in another analysis with rMATS.
output after running python3.8 dexseq_count.py -p yes -r pos -s no -f bam gencodev29_DEXSeq.gff RNA1.Aligned.sortedByCoord.out.bam RNA1.out.txt
:
""ENSG00000000003.14"":"001" 0
""ENSG00000000003.14"":"002" 0
""ENSG00000000003.14"":"003" 0
""ENSG00000000003.14"":"004" 0
""ENSG00000000003.14"":"005" 0
""ENSG00000000003.14"":"006" 0
""ENSG00000000003.14"":"007" 0
""ENSG00000000003.14"":"008" 0
""ENSG00000000003.14"":"009" 0
""ENSG00000000003.14"":"010" 0
not sure if it has something to do with sample file but ill post a example of the bam file output.
samtools view RNA1.Aligned.sortedByCoord.out.bam | head -n 5 :
HWI-1KL111:59:C1VWTACXX:4:2115:16063:46921 419 1 11634 1 74M = 11692 132 ACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCT @@BFADFDDFFBDGHHGBAAFGHBEG><A<FHHIGGGIB?<FH@HAFDFGC>BFFHIIEGGIC4@;@C=CGBC@ NH:i:4 HI:i:3 AS:i:146 nM:i:0
HWI-1KL111:59:C1VWTACXX:4:1307:11846:59737 419 1 11641 1 74M = 11890 322 GTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCT CBBFFFFFHGHHHIJJJJJJJJJIJIJJIJJIHIFIJJJJJJJHIAHIJJIGIIIIFHIJJJJJJJJJIJHHFF NH:i:4 HI:i:3 AS:i:145 nM:i:0
HWI-1KL111:59:C1VWTACXX:4:2108:3156:75335 355 1 11675 1 73M = 11873 272 AAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATG @<?DBDDD?;B+<<EFI<?EHCC?AEHHB<@3;EFEFBDFCFGICFFD9*/88C<C;AA?=?ADBDDDDCCAA NH:i:4 HI:i:3 AS:i:145 nM:i:0
HWI-1KL111:59:C1VWTACXX:4:2310:6416:3302 419 1 11681 1 74M = 11792 185 TGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGC C@CFFDFFHHHHHJJIEHJJJJJJIJIIIJJJJIIJJJIIJIH?FGIIJJFHIJJJJJJIHHHHGGFFFFFFFE NH:i:4 HI:i:2 AS:i:146 nM:i:0
HWI-1KL111:59:C1VWTACXX:4:2115:16063:46921 339 1 11692 1 74M = 11634 -132 ATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGG CCC@CCCCC??BB@AHECDAIIHHBDGIGGB<CEEGEHDD>IIIHF<;A2EHEEIIHGCBC?BHFHDDDDD@@@ NH:i:4 HI:i:3 AS:i:146 nM:i:0
Does anyone know what could cause this error? All tips are welcome!
Not very helpful, but I'd recommend running Salmon for gene counts with RNA-seq data rather than Star. STAR is a splice aware mapper but doesn't give you counts. As a bonus, there are a ton of resources out there to bring Salmon results into R for DEXseq analysis.
You can use Salmon together with DEXSeq for differential transcript analysis, but not differential exon analysis, which was the original intended use of DEXSeq. If you want to do differential exon analysis, you need to align the reads to the genome using a splice aware mapper, like STAR or HiSat, and count reads aligning to each exon with something like featureCounts or HTSeq (which is what dexseq_count uses) like the OP is doing.
Thanks for the input. Unfortunately i can't change much about the mapper being STAR. Looking at previous post/questions etc I should be possible to get the exon counts using DEXSeq even-though i used STAR as my mapper.
STAR can be configured to give gene counts. RSEM can be run on STAR output if it was configured to output a transcriptome bam. But it looks like OP doesn't have either.