'gene_name' is missing in StringTie output file 't_data.ctab'
2
2
Entering edit mode
6.0 years ago

Hello, I used New Tuxedo protocol in which StringTie is used for quantification step using:

stringtie -e -B -p 8 -G merged_gtf -o SRRXXX.gtf SRRXXX.bam

gives the output files as following:

e2t.ctab  e_data.ctab  i2t.ctab  i_data.ctab  SRRXXX.gtf  t_data.ctab

t_data.ctab columns are used for making countdata for DESeq2 using command

I tried to import t_data.ctab for DESeq2 with the help of tximport manual

   tx2gene <- tmp[, c("t_name", "gene_name")]

but my t_data.ctab contains '.' in 'gene_name' column, which is inappropriate for creation of countdata. Therefore I can't proceed my differential expression of genes. My question is Can I use 'gene_id' column instead of 'gene_name' from t_data.ctab. Or am I supposed to directly switch the quantification tool itself, if yes then which tool will be better as compared to StringTie?

RNA-Seq StringTie New Tuxedo protocol DESeq2 • 2.5k views
ADD COMMENT
0
Entering edit mode
6.0 years ago

To extract read count information, you can use the script provided by the StringTie authors.

ADD COMMENT
0
Entering edit mode
4.5 years ago
Johan Zicola ▴ 70

The problem may come from the fact that you are using non-human data. I am working with Arabidopsis and I also noticed the empty field for gene_name column in t_data.ctab file from StringTie. I thought it was a bug but if you look at this Ballgown documentation, you can see the description gene_name: HUGO gene name for the transcript, if known. HUGO annotation is restricted to human gene nomenclature. Hopefully, it will save time for someone else.

ADD COMMENT

Login before adding your answer.

Traffic: 1618 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6