I am trying to go from raw reads to counts and then to TPM/TMM values of gene expression for a prokaryotic organism (via mapping the RNA sequencing reads to the reference genome). I have read that an annotation file (gtf or gff3) is needed, and encountered this issue firsthand when STAR threw an error saying that my gtf file doesn't have any exon lines.
My question is, how should I go about this process of mapping reads to counts and then to TPM or TMM, given that I can't find a gtf file with exon lines? I am open to using any of the reputable alignment packages e.g. I've heard of Bowtie2 and STAR. I should mention that the gff3 file does have exon lines, but I can't understand if STAR will be happy to use this file.
The exact error that STAR throws is this:
Fatal INPUT FILE error, no exon lines in the GTF file: /Users/fastq/gtf_file.gtf
Solution: check the formatting of the GTF file, it must contain some lines with exon in the 3rd column.
Make sure the GTF file is unzipped.
If exons are marked with a different word, use --sjdbGTFfeatureExon .
You don't have to use STAR per se since you are not looking for a splice aware aligner. So you could align with any aligner and then use the SAF (simple annotation format) for featureCounts to do read counting.
Thanks, this then may be a silly question but does featureCounts require the exon column?
Also, I initially chose to use STAR since it is provably far faster than any other aligner, however I wonder if prokaryotic organisms ever see this benefit.
See the link included in my comment above for an explanation of SAF format. Simple answer is no. You can make up a file in SAF format yourself by choosing gene names (chromosome would be one in your case unless you have plasmids), gene start and stops.
There are plenty of other aligners that are fast.
bwa mem
,bbmap.sh
would fit the bill.