Entering edit mode
10.1 years ago
Dear friends,
I am basically a botanist and new to NGS. And I am sorry if any similar question was discussed before.
Comparing Resistant and susceptible plant genotypes infected with Fusarium (Fungi)
exp_study = DGEList(counts=rnaseqMatrix, group=conditions)
exp_study = calcNormFactors(exp_study)
et = exactTest(exp_study, dispersion=0.1)
tTags = as.data.frame(topTags(et,n=NULL))
Images in the link: https://www.dropbox.com/sh/bab4nocxe307vsw/AABWinfxpswy77eSEoxtuHJza?dl=0
I wonder if there is any problem in my input file containing the counts. I find that there are many 0 in the counts.
Is it a problem? How the logFC is calculated in pairwise comparison, if any one of the count is 0?
I believe the logFC value is calculated on the fitted model coefficient, rather than on the raw data
Stated differently, the logFC is the fit coefficient, possible with a prior applied.
Thanks Devon for prompt reply
The outsourced sequencing company provided me 2 types of files after the Abundance estimation using RSEM.
(i have 8 pools/different combinations). So what they have done is they find the abundance estimation of each pools separately which gives different headers such as gene_id,len,effect_length,expected counts,etc)
Then they merged the column
expected_counts
from all the pools.So they gave me
isoform.results
andgenes.results
.I proceeded with the
isoform.results
I would like to know whether the input I am using for logFC calculation is a good one?
How the fit model coefficient method is calculated and please suggest me the input to it.
Thanks and I'm afraid if my explanation is clear.
Good night!
Ah, the RSEM results (isoform.results) aren't used by edgeR at all. While it may be the case that you have 0 counts in the data used in edgeR, it may still not yield the expected infinite logFC (I'd have to look up whether edgeR constrains the optimization, shrinks the fold-changes a la DESeq2, or does something else entirely) even with all samples in 1 group having 0 counts.
Dear Dr. Devon
Thank you for the reply. Oh :(
So I'm in wrong way. Thanks a lot for the guidance.
Kindly let me know, how I can proceed with my datasets available.
Thanks again!
Sayuj
It'd help if you described the biological experiment (how it was structured, how many samples you have, what species you're using, etc.) and what has been done so far in terms of data processing (e.g. reads were aligned with tophat and the resulting BAM file was processed with RSEM ...).
Dear Dr.Devon
I am working on the interaction b/w Orchid and Fusarium
We have 2 types of Fusarium (Pathogenic (causes root rot of plant) and Non pathogenic)
Plant type: Resistant and susceptible
Pools used (combinations)
Pool 1: Susceptible plant vs Pathogenic fungus (we inoculated healthy plant with fungus): Disease is induced
Pool 2; Resistant plant vs Pathogenic fungus (we inoculated healthy plant with fungus): No disease
.........
Pool 8: Control Resistant plant (no fungus): No inoculation
Now work sheet provided by sequencing company
Reads from each sample were independently mapped on the reference transcriptome using bowtie (RSEM driven).
Now, I performed the EdgeR analysis with the output file provided by RSEM output as mentioned in the previous message.(I started working the first 2 pools in extreme interest) I am interested to find the defense genes in plants involved in resistance mechanism.
Hope this is clear!
Thanks a lot again;
Sayuj
Ah, that clears things up. What I would recommend is taking the estimated counts from RSEM and using them with limma (first processing them with the voom() function). This is a pretty reasonable method and is similar to what you would use for TCGA data (where all you can download is the output from RSEM). Have a look through the limma tutorial, it's a very powerful (though sometimes difficult to use) package. You can start with the same first two datasets and, once you have that figured out, then move to working on all of the datasets at once and using contrasts for specific questions.
Ok sir, Thanks for the points.
I will let you know, once I handle the data with limma.
Sayuj
Thanks Russ
I would also like to clarify my input file used for the logFC calculation, as mentioned below for Dr. Devon Ryan
Please help me in this regard. I'm totally confused b/w raw counts and expected counts!
Thanks a lot
Sayuj