Newbie here. I am trying to use featureCounts to assign reads to features (exons). My reference is E. Coli. I downloaded the gtf file from here: https://www.ncbi.nlm.nih.gov/genome/167?genome_assembly_id=161521 I am noticing that my % assigned is very low after running featureCounts. However, I also noticed in the gtf file there are only ~300 exons listed. Is this correct for E. coli? I can't find any resources online to help me with this. Here is the featureCounts command I run:
There is a lot of inconsistency when it comes to biological data. Different groups may choose to annotate their organisms differently.
In this case, only some regions of non-coding RNAs are annotated as exons, and for genes, only the coding sequences are properly annotated.
in short, use CDS instead of exon when counting with featurecounts
Here is how to verify the statements. Using the bio package (see: https://www.bioinfo.help/) one can quickly visualize the following:
# Fetch the data from NCBI.
bio fetch NC_000913
# Get all features in GFF format.
bio convert NC_000913 --gff > all.gff
# Get exons only in GFF format.
bio convert NC_000913 --gff --type exon > exon.gff
# Get the reference file in FASTA format.
bio convert NC_000913 --fasta > ref.fa
Wow this helps a lot thanks. I guess one more question I have is that I am doing a differential expression analysis and I have read that you should not specify "CDS" when doing DE. Is this true?
technically true, but for procaryotes in general and in this particular case where exons are not even annotated, and where gene and CDS all the share the same coordinates it makes no difference.
Wow this helps a lot thanks. I guess one more question I have is that I am doing a differential expression analysis and I have read that you should not specify "CDS" when doing DE. Is this true?
technically true, but for procaryotes in general and in this particular case where exons are not even annotated, and where gene and CDS all the share the same coordinates it makes no difference.
You could also use gene instead of CDS or exon.
Okay great, thanks very much for your help!