Hi there,
I'm currently analyzing some RNA-seq data, and featureCounts is behaving weirdly.
The problem is that I expect a specific gene (pie-1) to be expressed in my sample, however featureCounts tells me that I have no reads. Though when I input my data into IGV the program shows at least 5 paired reads on my gene.
This is my featureCounts code:
S996_1 <-featureCounts(files="foo.filtered.sorted.bam",
annot.ext="Caenorhabditis_elegans.WBcel235.96.gtf",
isGTFAnnotationFile=TRUE, GTF.featureType="exon",GTF.attrType="exon_id",
isPairedEnd=TRUE, nthreads=12, useMetaFeatures=F,
genome="Caenorhabditis_elegans.WBcel235.dna.toplevel.fa",
juncCounts = T, allowMultiOverlap=T, reportReads= "SAM", reportReadsPath=NULL)
I would expect to see at least some reads for this gene, as I expressed it in-vivo as part of my experiments. For reference I aligned my data with hisat2, filtered it for paired-end reads and sorted it by chromosome position with samtools.
Can somebody please help?
Thanks!
Can you show
head
Of your bam file and alsogrep Y49E10.6 Caenorhabditis_elegans.WBcel235.96.gtf
output ?I grepped the gene from the gtf file and it returned loads of exons, but as an example:
I cross-referenced this WBGene00013035 with my data and I got 6 reads in my sample. I then realised that these correspond to the 6 paired-end reads shown in IGV, and furthermore that Y49E10.16 is the wrong gene. Duh! Pie-1 is actually Y49E10.14 and there are no reads shown in IGV, which matches my featureCounts output.
Thanks for bearing with me through my brain fart
thanks for following up and posting the solution, everyone gets bitten by these type of misreads at point or other
Glad that it solved
the image only shows overlap with a gene, your feature counts code requires overlap with exon
I tried previously with GTF.featuerType="gene" and GTF.attrType="gene" but it didn't make any difference
dont cite code that is not "exactly" what you enter. No one can troubleshoot that.
For example you have
foo.filtered.sorted.bam
that is not what is in the IGV.Perhaps the GTF file is different. Make sure to visualize the exact file you are using as well.
In general there are two reasons why read counts do not show up:
Thanks Istvan,
The GTF file I'm using is from the same release as my reference genome, which is what I used to create my index when aligning with hisat2.
Do you think that there are parameters in featureCounts that may be throwing out these reads due to read overlap, etc?
Run featurecounts from the command line, it is quite straightforward and will let you better see what is going on.