Question

Manually modified gtf cannot be loaded into featureCounts STAR

0

Entering edit mode

7 months ago

francesco.limone • 0

Hi everyone,

I have manually created a mixed species GTF (code below) and using it to align genone with STAR.

Alignment goes well and I get BAM files. I then try to get counts with "featureCounts" with subread but it cannot read my gtf file.

featureCounts -p -t gene -g gene_id -F gtf -s 2 -a HumanMouse.gtf -o *.bam $files

There is a problem with my GTF format but it looks decently built to me. third column says "gene" and 9 columns says "gene_id" and it still has an "exon" column. It is tab delimited file (attached creenshot of head). What am I doing wrong?

enter image description here

I have removed headers from the gtf but it still won't work. It does work with the original gtf files with or withouth header. I am assuming it is the way I am adding M or H to the gtf that is messing it up? But I need that to be able to recognize genomes. It clearly worked with STAR to align and generate BAM files so I am confused on why it wouldn't work...

This is the error I get

Load annotation file HumanMouse.gtf ... ||

ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..

The porgram has to terminate.

gtf subread featurecounts star • 1.1k views

ADD COMMENT • link updated 7 months ago by GenoMax 152k • written 7 months ago by francesco.limone • 0

0

Entering edit mode

I have manually created a mixed species GTF

Can you clarify if you just added one gene (e.g. one human gene added to mouse) or something else?

ADD REPLY • link 7 months ago by GenoMax 152k

0

Entering edit mode

I modified the first line to add an M or an H to distinguish in between the and it works fine with STAR (below). For some reason featurecounts doesn't recognize the files that contsain the "1M" or "1H" as GTF file even when I specify it with -F and they have a similar structure to the original file.

# modify files, mark human with H in fasta and gtf, and mouse with M. It will create new files called "human.fa" and "mouse.fa"
awk '{if ($0 ~ /^>/) {print $1 "H"} else {print $0}}' Homo_sapiens.GRCh38.dna.primary_assembly.fa > human.fa
awk '{if ($0 ~ /^>/) {print $1 "M"} else {print $0}}' Mus_musculus.GRCm39.dna.primary_assembly.fa > mouse.fa

#modify files, mark human with H in fasta and gtf, and mouse with M. it will create two new files called "human.gtf" and "mouse.gtf"
awk '{sub("$", "H", $1)}; 1' Homo_sapiens.GRCh38.113.gtf > human.gtf
awk '{sub("$", "M", $1)}; 1' Mus_musculus.GRCm39.113.gtf > mouse.gtf

#remove header, header won't matter for STAR indexing and alignment but subread and feature counts can't read the concatenated gtf with headers
grep -v "#" human.gtf > human1.gtf
grep -v "#" mouse.gtf > mouse1.gtf

# catenate gtf's in one:
cat human1.gtf mouse1.gtf > HumanMouse.gtf

ADD REPLY • link updated 7 months ago by GenoMax 152k • written 7 months ago by francesco.limone • 0

0

Entering edit mode

Are the chromosome identifiers matching in your BAM and this GTF file? More often than not issues with featureCounts boil down to that issue.

Can you copy/paste a part of your GTF (instead of the screenshot above)? Also show an alignment or two from your BAM.

ADD REPLY • link 7 months ago by GenoMax 152k

0

Entering edit mode

Answer below. I am getting the feeling that by modifying the gtf files I might have accidentally modified the delimiter...? Maybe it is not a tab anymore? How can I change the first line in my gtf to have 1H or 1M and still maintain tab as delimiter?

ADD REPLY • link 7 months ago by francesco.limone • 0