Reads shown in IGV not quantified by featureCounts
0
0
Entering edit mode
4.6 years ago
a.palmer ▴ 20

Hi there,

I'm currently analyzing some RNA-seq data, and featureCounts is behaving weirdly.
The problem is that I expect a specific gene (pie-1) to be expressed in my sample, however featureCounts tells me that I have no reads. Though when I input my data into IGV the program shows at least 5 paired reads on my gene.

IGV result https://imgur.com/a/qO8mjgN

This is my featureCounts code:

S996_1 <-featureCounts(files="foo.filtered.sorted.bam",
                   annot.ext="Caenorhabditis_elegans.WBcel235.96.gtf",
                   isGTFAnnotationFile=TRUE, GTF.featureType="exon",GTF.attrType="exon_id",
                   isPairedEnd=TRUE, nthreads=12, useMetaFeatures=F,
                   genome="Caenorhabditis_elegans.WBcel235.dna.toplevel.fa",
                   juncCounts = T, allowMultiOverlap=T, reportReads= "SAM", reportReadsPath=NULL)

featureCounts output
https://imgur.com/a/ESNHlak

I would expect to see at least some reads for this gene, as I expressed it in-vivo as part of my experiments. For reference I aligned my data with hisat2, filtered it for paired-end reads and sorted it by chromosome position with samtools.

Can somebody please help?

Thanks!

RNA-Seq IGV featureCounts • 1.9k views
ADD COMMENT
1
Entering edit mode

Can you show head Of your bam file and also grep Y49E10.6 Caenorhabditis_elegans.WBcel235.96.gtf output ?

ADD REPLY
1
Entering edit mode

I grepped the gene from the gtf file and it returned loads of exons, but as an example:

III     WormBase        CDS     12433605        12433758        .       +       1       gene_id "WBGene00013035"; transcript_id "Y49E10.16a"; exon_number "7"; gene_name "Y49E10.16"; gene_source "WormBase"; gene_biotype "protein_coding"; transcript_name "Y49E10.16a"; transcript_source "WormBase"; transcript_biotype "protein_coding"; protein_id "Y49E10.16a";

I cross-referenced this WBGene00013035 with my data and I got 6 reads in my sample. I then realised that these correspond to the 6 paired-end reads shown in IGV, and furthermore that Y49E10.16 is the wrong gene. Duh! Pie-1 is actually Y49E10.14 and there are no reads shown in IGV, which matches my featureCounts output.

Thanks for bearing with me through my brain fart

ADD REPLY
0
Entering edit mode

thanks for following up and posting the solution, everyone gets bitten by these type of misreads at point or other

ADD REPLY
0
Entering edit mode

Glad that it solved

ADD REPLY
0
Entering edit mode

the image only shows overlap with a gene, your feature counts code requires overlap with exon

ADD REPLY
0
Entering edit mode

I tried previously with GTF.featuerType="gene" and GTF.attrType="gene" but it didn't make any difference

ADD REPLY
1
Entering edit mode

dont cite code that is not "exactly" what you enter. No one can troubleshoot that.

For example you have foo.filtered.sorted.bam that is not what is in the IGV.

Perhaps the GTF file is different. Make sure to visualize the exact file you are using as well.

In general there are two reasons why read counts do not show up:

  1. the intervals are not where you expect them to be
  2. the reads have properties that makes them non countable (maybe multiply mapped, overlaps with multiple features etc)
ADD REPLY
0
Entering edit mode

Thanks Istvan,
The GTF file I'm using is from the same release as my reference genome, which is what I used to create my index when aligning with hisat2.
Do you think that there are parameters in featureCounts that may be throwing out these reads due to read overlap, etc?

ADD REPLY
0
Entering edit mode

Run featurecounts from the command line, it is quite straightforward and will let you better see what is going on.

ADD REPLY

Login before adding your answer.

Traffic: 2589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6