Question

[Prokaryote] How to go from reads to counts given that there is no exon column in gtf file?

0

Entering edit mode

4.1 years ago

akibio • 0

I am trying to go from raw reads to counts and then to TPM/TMM values of gene expression for a prokaryotic organism (via mapping the RNA sequencing reads to the reference genome). I have read that an annotation file (gtf or gff3) is needed, and encountered this issue firsthand when STAR threw an error saying that my gtf file doesn't have any exon lines.

My question is, how should I go about this process of mapping reads to counts and then to TPM or TMM, given that I can't find a gtf file with exon lines? I am open to using any of the reputable alignment packages e.g. I've heard of Bowtie2 and STAR. I should mention that the gff3 file does have exon lines, but I can't understand if STAR will be happy to use this file.

The exact error that STAR throws is this:

Fatal INPUT FILE error, no exon lines in the GTF file: /Users/fastq/gtf_file.gtf
Solution: check the formatting of the GTF file, it must contain some lines with exon in the 3rd column.
          Make sure the GTF file is unzipped.
          If exons are marked with a different word, use --sjdbGTFfeatureExon .

RNA-Seq alignment • 1.4k views

ADD COMMENT • link updated 4.1 years ago by Juke34 8.9k • written 4.1 years ago by akibio • 0

0

Entering edit mode

You don't have to use STAR per se since you are not looking for a splice aware aligner. So you could align with any aligner and then use the SAF (simple annotation format) for featureCounts to do read counting.

ADD REPLY • link 4.1 years ago by GenoMax 147k

0

Entering edit mode

Thanks, this then may be a silly question but does featureCounts require the exon column?

Also, I initially chose to use STAR since it is provably far faster than any other aligner, however I wonder if prokaryotic organisms ever see this benefit.

ADD REPLY • link 4.1 years ago by akibio • 0

0

Entering edit mode

See the link included in my comment above for an explanation of SAF format. Simple answer is no. You can make up a file in SAF format yourself by choosing gene names (chromosome would be one in your case unless you have plasmids), gene start and stops.

There are plenty of other aligners that are fast. bwa mem, bbmap.sh would fit the bill.

ADD REPLY • link 4.1 years ago by GenoMax 147k

score 0 · Answer 1 · 2020-10-15

0

Entering edit mode

4.1 years ago

Juke34 8.9k

You can use agat_convert_sp_gxf2gxf.pl from AGAT , it will re-create the exon features

ADD COMMENT • link 4.1 years ago by Juke34 8.9k