Issue with StringTie to get gene_count matrix for DeSEQ
1
0
Entering edit mode
7.0 years ago
pixie@bioinfo ★ 1.5k

Hello,

I had posted this issue previously in a number of forums, but could not find a way out. I am interested in a gene_level analysis (not interested in novel genes/isoforms)

I have ran the StringTie tool thrice with the following commands:

stringtie -p 4 -G transcripts_exon_for_analysis.gtf  -o test_out.gtf accepted_hits.bam

stringtie --merge -p 4 -G transcripts_exon_for_analysis.gtf -o rice_merged.gtf mergelist.txt

stringtie -e -B -p 4 -G rice_merged.gtf -o ballgown/root_rep1/root_rep1.gtf root1_rep1.bam

After this I have used prepDE.py to obtain the gene_count matrix which is an input for DeSeq. Most of my IDs in the matrix are StringTie IDs, Majority of the genes in the annotation file are not picked up. Is there a way I can map the IDs back to my annotation file ?

Also, can I use featureCounts R package to get a gene count matrix incase StringTie doesnt work? I am not interested in novel genes/isoforms

rna-seq stringtie • 2.7k views
ADD COMMENT
0
Entering edit mode
7.0 years ago
Tm ★ 1.1k

Gtf file which you get from stringtie have information for both known and novel transcripts. If 1st column contains Stringtie ID instead of Transcript ID, then that means it represents the novel transcript or isoform which is not of your interest. So after getting gene_count matrix from preDE.py, you can remove those.

However stringtie gtf entry with Transcript ID (representing known transcripts) gives you Gene Symbol which you can be used to correlate them with genes in the annotation file of genome using simple script or shell command

ADD COMMENT

Login before adding your answer.

Traffic: 1660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6