Entering edit mode
2.8 years ago
Sam
•
0
It seems like this should be feasible, however, I'm not well versed in R
and have only begun dabbling with ballgown
.
Currently, my only thought is to pull data out of ballgown
and create intermediate files containing two columns:
a column of genes
- a column of associated transcript IDs
seqnames id
NC_007175.2 1
NC_007175.2 2
NC_007175.2 3
NC_007175.2 4
NC_007175.2 5
NC_035780.1 6
and then do something more "basic" like using Bash awk '{print $1}' transcript-to-genes.txt | sort | uniq -c
.
I'd prefer a full R
solution in order to keep things tidy (i.e. not have intermediate files, not switch between languages).
Any suggestions/help would be much appreciated.
EDITED: Made Bash code accurate.
Hi, I suppose you used stringtie for the assembly of the trasncripts, correct? If so, you can find the transcripts directly in the .gtf file created for each sample, with the genes associated with it in different column. It's already a "column of genes and a column of trasncripts" file.