Hi! I am trying to extract the reads from an RNA seq file after having trimmed it with Trimmomatic v38 and mapping it with bowtie2 with 95% overall allignment rate with this command in RSeQC:
$ infer_experiment.py -i trialoutput.sorted.bam -r Dictyannots.bed
Reading reference gene model Dictyannots.bed ... Done
Loading SAM/BAM file ... Finished
Total 0 usable reads were sampled
Unknown Data type
The genome I use for reference is found here dicty reference genome and the annotation files I tried look like this:
1) dicty_annots.bed
DDB0232428 1890 3287 DDB_G0267178 0 + 1890 3287 0,0,0 1 1397 0
DDB0232428 3848 4855 DDB_G0267180 0 + 3848 4855 0,0,0 1 1007 0
DDB0232428 5505 7769 DDB_G0267182 0 + 5505 7769 0,0,0 1 2264 0
2) Dictyannots.bed
1 1889 3287 gene:DDB_G0267178 . + ensembl CDS 0 transcript_id "transcript:EAL73826"; gene_id "gene:DDB_G0267178";
1 1889 3287 gene:DDB_G0267178 . + ensembl exon . transcript_id "transcript:EAL73826"; gene_id "gene:DDB_G0267178";
1 3847 4855 gene:DDB_G0267180 . + ensembl CDS 0 transcript_id "transcript:EAL73827"; gene_id "gene:DDB_G0267180";
The second annotation file was converted from gff3 to bed by cufflinks.
Do you have any suggestions why this might happen?
I have also tried to load the sorted bam file with either of the annotations in gtf format in subread's package featurecounts with this command:
featureCounts -a Dictyannot.gtf -t exon -g gene_id -o counts.txt trialoutput.sorted.bam
========== _____ _ _ ____ _____ ______ _____
===== / ____| | | | _ \| __ \| ____| /\ | __ \
===== | (___ | | | | |_) | |__) | |__ / \ | | | |
==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
v1.6.0
//========================== featureCounts setting ===========================\\
|| ||
|| Input files : 1 BAM file ||
|| S trialoutput.sorted.bam ||
|| ||
|| Output file : counts.txt ||
|| Summary : counts.txt.summary ||
|| Annotation : Dictyannot.gtf (GTF) ||
|| Dir for temp files : ./ ||
|| ||
|| Threads : 1 ||
|| Level : meta-feature level ||
|| Paired-end : no ||
|| Strand specific : no ||
|| Multimapping reads : not counted ||
|| Multi-overlapping reads : not counted ||
|| Min overlapping bases : 1 ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
//================================= Running ==================================\\
|| ||
|| Load annotation file Dictyannot.gtf ... ||
|| Features : 30575 ||
|| Meta-features : 13731 ||
|| Chromosomes/contigs : 37 ||
|| ||
|| Process BAM file trialoutput.sorted.bam... ||
|| Single-end reads are included. ||
|| Assign reads to features... ||
|| Total reads : 2484495 ||
|| Successfully assigned reads : 0 (0.0%) ||
|| Running time : 0.04 minutes ||
|| ||
|| Read assignment finished. ||
|| ||
|| Summary of counting results can be found in file "counts.txt.summary" ||
|| ||
\\===================== http://subread.sourceforge.net/ ======================//
Looks like your BAM and GTF files individually does not have any issue. Sometimes this happens when the chromosome names are not matching between BAM and GTF files (Example, "chr12" or "12").
This is the header from the sam file that was converted to bam:
I think the names of the chromosome are the same ("1"). Thank for your reply! Do you have any other suggestions?
Sorry other than that I could not see any problem. It is is hard to figure out when you do not get any errors. can you post few lines from your GTF file ?
Hi! I have managed to find the problem, it was actually the chromosome names. However,now I face another problem as I am trying to map the sequence to the open reading frame reference file which looks like this:
When I load the reference ORF in IGV instead of chromosomes I can see gene names (as DDB0216524|DDB_G0267364). Do you have any idea how to get around this problem?
PS. In this case DDB0216524 is not the chromosome name.
Thank you for your help and time!
No, the chromosome names in that bam are not "1". Look up what each column of a sam file is. And that's not the header of the sam file either.