R ballgown - Obtain number of transcripts per gene per sample?

0

Entering edit mode

2.8 years ago

Sam • 0

It seems like this should be feasible, however, I'm not well versed in R and have only begun dabbling with ballgown.

Currently, my only thought is to pull data out of ballgown and create intermediate files containing two columns:

a column of genes
- a column of associated transcript IDs

   seqnames  id
NC_007175.2  1
NC_007175.2  2
NC_007175.2  3
NC_007175.2  4
NC_007175.2  5
NC_035780.1  6

and then do something more "basic" like using Bash awk '{print $1}' transcript-to-genes.txt | sort | uniq -c.

I'd prefer a full R solution in order to keep things tidy (i.e. not have intermediate files, not switch between languages).

Any suggestions/help would be much appreciated.

EDITED: Made Bash code accurate.

ballgown R • 621 views

ADD COMMENT • link updated 2.8 years ago by Vitor1 ▴ 120 • written 2.8 years ago by Sam • 0

0

Entering edit mode

Hi, I suppose you used stringtie for the assembly of the trasncripts, correct? If so, you can find the transcripts directly in the .gtf file created for each sample, with the genes associated with it in different column. It's already a "column of genes and a column of trasncripts" file.

ADD REPLY • link 2.8 years ago by Vitor1 ▴ 120

Login before adding your answer.