Hi, I'm working with sunflower (Helianthus annuus) RNAseq data. I mapped my read to the XRQ genome, and then tried to count reads using featureCounts.
gff=/path/to/file/HanXRQr1.0-20151230-EGN-r1.2.gff3
genome=/path/to/file/HanXRQr1.0-20151230.fa
bamdir=/path/to/bams/
featureCounts -T 40 -g ID -t gene -a $gff -p --primary --byReadGroup -J -G $genome-o all.pe.gene $bamdir/*.bam > all.pe.gene.log 2>&1 ;
The software runs without problem and produces the result files, but at the end of the log file, the penultimate line exactly, I got the warning:
WARNING contig 'HanXRQCP' is not found in the provided genome file!
- The HanXRQCP chromosome (chloroplastic chromosome) is in the genome fasta file and in the gff file.
- The results file show the chloroplastic genes normally, with counts and all, so the program is correctly reading the HanXRQCP in the fasta, in the bams, and in the gff.
The only thing I noted is that the order of the chromosomes in the .fasta, and in the .gff is different
Contig order in genome.fasta:
- Han 1 - 17 croms
- Han 1512 fragmentos
- MT
- CP
Contig order in .gff:
- CP
- Han 1512 fragmentos
- Han 1 - 17 croms
- MT
¿Any clue what is going on?¿Should I trust the results?
Thanks in advance
The WARNING message indicates that featureCounts didn't find the HanXRQCP sequence in the fasta file but it is mentioned in the BAM file.
Can you try this:
Maybe the format is a little tricky so featureCounts misinterpreted the sequence name.
Is the chromosome/config name identical in both files? Does it contain spaces like
HanXRQCP 123456bp further information
? Are the names from the alignment altered?