Manually modified gtf cannot be loaded into featureCounts STAR
0
0
Entering edit mode
2 days ago

Hi everyone,

I have manually created a mixed species GTF (code below) and using it to align genone with STAR.

Alignment goes well and I get BAM files. I then try to get counts with "featureCounts" with subread but it cannot read my gtf file.

featureCounts -p -t gene -g gene_id -F gtf -s 2 -a HumanMouse.gtf -o *.bam $files

There is a problem with my GTF format but it looks decently built to me. third column says "gene" and 9 columns says "gene_id" and it still has an "exon" column. It is tab delimited file (attached creenshot of head). What am I doing wrong?

enter image description here

I have removed headers from the gtf but it still won't work. It does work with the original gtf files with or withouth header. I am assuming it is the way I am adding M or H to the gtf that is messing it up? But I need that to be able to recognize genomes. It clearly worked with STAR to align and generate BAM files so I am confused on why it wouldn't work...

This is the error I get

Load annotation file HumanMouse.gtf ... ||

ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..

The porgram has to terminate.

gtf subread featurecounts star • 278 views
ADD COMMENT
0
Entering edit mode

I have manually created a mixed species GTF

Can you clarify if you just added one gene (e.g. one human gene added to mouse) or something else?

ADD REPLY
0
Entering edit mode

I modified the first line to add an M or an H to distinguish in between the and it works fine with STAR (below). For some reason featurecounts doesn't recognize the files that contsain the "1M" or "1H" as GTF file even when I specify it with -F and they have a similar structure to the original file.

# modify files, mark human with H in fasta and gtf, and mouse with M. It will create new files called "human.fa" and "mouse.fa"
awk '{if ($0 ~ /^>/) {print $1 "H"} else {print $0}}' Homo_sapiens.GRCh38.dna.primary_assembly.fa > human.fa
awk '{if ($0 ~ /^>/) {print $1 "M"} else {print $0}}' Mus_musculus.GRCm39.dna.primary_assembly.fa > mouse.fa

#modify files, mark human with H in fasta and gtf, and mouse with M. it will create two new files called "human.gtf" and "mouse.gtf"
awk '{sub("$", "H", $1)}; 1' Homo_sapiens.GRCh38.113.gtf > human.gtf
awk '{sub("$", "M", $1)}; 1' Mus_musculus.GRCm39.113.gtf > mouse.gtf

#remove header, header won't matter for STAR indexing and alignment but subread and feature counts can't read the concatenated gtf with headers
grep -v "#" human.gtf > human1.gtf
grep -v "#" mouse.gtf > mouse1.gtf

# catenate gtf's in one:
cat human1.gtf mouse1.gtf > HumanMouse.gtf
ADD REPLY
0
Entering edit mode

Are the chromosome identifiers matching in your BAM and this GTF file? More often than not issues with featureCounts boil down to that issue.

Can you copy/paste a part of your GTF (instead of the screenshot above)? Also show an alignment or two from your BAM.

ADD REPLY
0
Entering edit mode

Answer below. I am getting the feeling that by modifying the gtf files I might have accidentally modified the delimiter...? Maybe it is not a tab anymore? How can I change the first line in my gtf to have 1H or 1M and still maintain tab as delimiter?

ADD REPLY

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6