Hi,
I am a newbie in R and RNA-Seq. I use StringTie to assemble my RNA-Seq data and Ballgown for DEGs. I have been trying to add data from related table to the data.frame. Below is the sample steps:-
pheno_data = read.csv("Den.csv")
Below is the content of my Den.csv:
"ids","pgroup","type"
"RS5206790","Res","IF"
"RS5206791","Res","NIF"
"RS5206792","Res","IF"
"RS5206795","Res","NIF"
"RS5206798","Res","IF"
"RS5206799","Res","NIF"
"RS5206828","Sus","IF"
"RS5206829","Sus","NIF"
"RS5206830","Sus","IF"
"RS5206831","Sus","IF"
"RS5206832","Sus","NIF"
"RS5206833","Sus","NIF"
Then, i merge it with my results generated by StringTie:
bg_tb <- ballgown(dataDir = "../ATB/Ballgown", samplePattern = "RS", pData = pheno_data)
Then, i knock out the low abundance genes:
bg_tb_filt= subset(bg_tb, "rowVars(texpr(bg_tb)) > 1", genomesubset=TRUE)
and identify transcripts:
results_transcripts = stattest(bg_tb_filt, feature = "transcript", covariate = "pgroup", adjustvars = c("type"), getFC = TRUE, meas = "FPKM")
then, add genes info:-
results_transcripts = data.frame(geneNames=ballgown::geneNames(bg_tb_filt), geneIDs=ballgown::geneIDs(bg_tb_filt), results_transcripts)
and the sample results as follows:-
geneNames geneIds feature id fc pval qval
MTND1P23 MSTRG.30 transcript 89 1.2628495 0.185639798 0.59743102
MTND2P28 MSTRG.31 transcript 90 1.3679515 0.038550274 0.34349762
MTCO1P12 MSTRG.32 transcript 91 1.2878662 0.102384645 0.50014745
MTCO2P12 MSTRG.34 transcript 93 0.8824411 0.544662788 0.83330385
AL6698317 MSTRG.35 transcript 116 1.1581505 0.268141448 0.67138119
The issue is I need to add "pgroup" data (Den.csv) into the above sample result. is it possible? I want to plot the genes according to the pgroup and I have been working on it for many days but to no avail.
Would appreciate your kind help and advise. Thanks
Can you provide a little more context? I.e., what are the columns of
Den.csv
representing (samples? genes?) It might also help to see the structure of the intermediate files (e.g.str(results_trasnscripts)
,str(bg_tb)
etc., after each step). Currently it is not clear to me how the genes are related to the entries ofDen.csv
Hi Friederike,
The first column (ids) of Dens.csv is representing RNA-Seq samples. There are 12 samples altogether. The expressed genes/transcripts that were generated by StringTie came from these samples.
The structure of the files as follows:-
As for str(bg_tb_filt), the content is too big to put it here so I removed some of it and did the screenshot below:
https://ibb.co/9VwYz2Q
the Dens.csv data can be seen under @indexes>$pData
thanks.
I'm still a bit lost. This line seems to extract the logFC for comparing the two pgroups you have (Res, Sus):
results_transcripts = stattest(bg_tb_filt, feature = "transcript", covariate = "pgroup", adjustvars = c("type"), getFC = TRUE, meas = "FPKM")
So, why would you want to add the individual experiment information to this since the logFC are the results of the comparison of multiple experiments? Can you perhaps sketch out the ideal table you'd want in the end?
It is because in my current results, I do not know how to compare the deferentially expressed transcripts between the two groups. For instance, in my results_transcripts above, based on fc values, MTCO2P12 is expressed at a low level. How can I know whether it is belongs to Res or Sus?
The table that i was thinking is to include additional column indicating whether the transcripts belongs to Res or Sus. For instance (table 1):
pgroup geneNames geneIds feature id fc pval qval
Res MTND1P23 MSTRG.30 transcript 89 1.2628495 0.185639798 0.59743102
Res MTND2P28 MSTRG.31 transcript 90 1.3679515 0.038550274 0.34349762
Res MTCO1P12 MSTRG.32 transcript 91 1.2878662 0.102384645 0.50014745
Sus MTCO2P12 MSTRG.34 transcript 93 0.8824411 0.544662788 0.83330385
Res AL6698317 MSTRG.35 transcript 116 1.1581505 0.268141448 0.67138119
However, it would be great if there is another way to get the results that I wanted without generating a new table like Table 1 above.
Also, I need to draw a volcano plot for each of the pgroup (Res & Sus) individually in order to compare the significance of differential expression in both Res & Sus against the fc.
thanks