FeatureCounts, zero counts from Hisat2 alignment
0
0
Entering edit mode
4.3 years ago
af1065 • 0

I am seeking a way to correct the formats for inputs to FeatureCounts.

I have sourced data from a resource for sweet potato genome annotations, and aligned my RNA-seq data to the available reference genome. However, it seems FeatureCounts cannot report on the SAM files I input.

The reference genome FASTA file does not have chromosomes clearly in the headers. I believe this to be the reason for FeatureCounts not working, however, I want to be sure before I go scripting in different directions to correct the files I have.

To be clear, I am providing the genome annotation file along with each of my aligned samples.

Example reference genome sequence header:

>itf11g12620.t1

The above gene identifier shows that it is on chromosome 11.

Example GFF:

##gff-version 3

Chr11 gt4sp_v3 mRNA 1467 2621 . + . ID=itf01g00010.t1;Name=itf01g00010.t1;Parent=itf01g00010

I am not getting errors from FeatureCounts, but it is possible that Chr11 should be in the FASTA header. Also, I do convert the GFF3 file to GTF so that FeatureCounts has a GTF as input.

RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode

The reference genome FASTA file does not have chromosomes clearly in the headers. I believe this to be the reason for FeatureCounts not working,

That is correct. Chromosome identifiers i.e. Chr11 need to match in your reference and the annotation.

ADD REPLY
0
Entering edit mode

I have corrected my headers, yet still I am not getting features to be counted.

sequence header: >itb13g17640.t1|chr13

GTF: chr13 gt4sp_v3 CDS 26079446 26082424 . - 0 transcript_id "itb13g19070.t1"; gene_id "itb13g19070";

Do you have another suggestion for recourse?

ADD REPLY
0
Entering edit mode

This entire string >itb13g17640.t1|chr13 needs to match first column in your GTF. If possible get rid of >itb13g17640.t1 and leave chr13 behind.

ADD REPLY
0
Entering edit mode

Thank you! Seems counter-intuitive that the sequences do not need gene ids.

ADD REPLY
1
Entering edit mode

Just to make sure, is your reference sequence really the genome ? Or the transcriptome/cDNA ? Having headers like >itf11g12620.t1 makes it looks like it is a transcriptome sequence, for which renaming the header as chr13 does not make sense indeed.

In addition, if you aligned on the transcriptome, you do not need to use featureCounts because reads are already assigned to transcripts. Instead, you can simply use samtools idxstats to obtain the number of reads by sequence (after perhaps filtering against unwanted – multimappers, chimeric, etc... – reads).

ADD REPLY
0
Entering edit mode

Gene ID's are provided by ID=itf01g00010.t1 in last field in your GTF. Be sure to use -g ID with featureCounts.

ADD REPLY

Login before adding your answer.

Traffic: 2335 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6