Question

Outputing gexpr() with gene_name (or gene symbol) instead of MSTRG.x gene_id

0

Entering edit mode

6.3 years ago

ever_wudi ▴ 10

Hi, I am trying to use Ballgown to output gene-sample expression matrix. What I did is geneexp = gexpr(bg), then write.csv(geneexp, "output.csv", row.names = TRUE). However, I could only get it output matrix with MSTRG.x gene ids as identifiers. How can I output the matrix with gene_name (or gene symbol) as identifiers (since MSTRG.x ids really have no use for me)?

Thanks! Di

RNA-Seq Ballgown Stringtie gene_name • 2.2k views

ADD COMMENT • link updated 3.5 years ago by HaroonPakistan • 0 • written 6.3 years ago by ever_wudi ▴ 10

0

Entering edit mode

Hi did you solve this issue?

ADD REPLY • link 3.5 years ago by HaroonPakistan • 0

score 0 · Answer 1 · 2018-08-10

I figured out one way to do it. I used whole_tx_table = texpr(my.humandata, 'all') to extract everything into whole_tx_table then do final_fpkm_table = whole_tx_table[c("gene_name","sample

1","sample 2", ..)] to slice out only the gene_name and fpkm values, then write final_fpkm_table to a .cvs table. However, one problem I found in the final_fpkm_table.cvs table is that the

gene_names are not unique, there can be many rows for the gene 'Btf3l4' like below. What should I do with these values? Should I take sum, average, or max on the duplicate values to generate unique

gene_name-expression matrix? Also, can EdgeR, FPKM_count.py, or RSEM be used to generate unique gene_name-expression matrix?

Thanks for any advice.

        Sample 1    Sample 2    Sample 3    Sample 4
Btf3l4  7.267802    7.386622    9.815619    9.739746
Btf3l4  0.941536    1.256349    1.365669    1.3953
Btf3l4  0.897259    0.718018    0.025479    0.168297
Btf3l4  0.823937    0.744246    1.132339    1.020087
Btf3l4  0.42134 0.351375    0.236908    0.517893
Btf3l4  1.219011    1.331794    2.030579    1.207322