Entering edit mode
10.5 years ago
Biogeek
▴
470
Hey ,
Still getting the hang of the whole RNA-seq and gene annotation process. One factor I have been thinking about lately and after reading a publication I would like to ask a question in regards to:
- In regards to filtering FPKM values in a spreadsheet, what value determines if a transcript is present/expressed / silenced and up-regulated. I generally assumed a value 'greater than' 0 would mean the transcript is expressed but I have read different. Can anyone shed light on this? Although different from FPKM, one paper was using the assumption that if RPKM > or equal to 3 then the transcript was over expressed. Can anyone explain?
- I also have 3 different conditions: control, low and medium. When making comparisons I am looking at the top 1000 up-regulations and down-regulations and comparing these between Medium/Control Vs Low/Control ( Taking FPKM values,working out fold changes and Log2 values) using Blast2GO software. I have also thought about just comparing the top 1000 expressed transcripts of control Vs Medium Vs low based on FPKM values without working out top upreg/ downregs via fold-changes.Ultimately what would be the best way to go about comparisons, does my method make sense? What logical approaches have you guys taken to analysing an experiment with 3 treatments?
So far I have created some Venn diagrams to show transcripts present in each condition, shared between conditions, present in all conditions, heat maps using EdgeR and I am currently making use of Blast2GO for GO and annotation comparisons.I just need to be 100% confident in my methods.
Apologies for the very noobish questions but I guess I have to learn somewhere :D
Kind regards.
In principle you cannot use FPKM, to define cutoffs for non-transcribed/transcribed genes, see: Does FPKM scale incorrectly in case of unequal mapping rates?, what FPKM means in one sample. It is not clear to me if there can be any sensible way of determining such cutoff other than arbitrarily from a single sample alone.
So in essence am I safe going ahead with FPKM values? I used Trinity followed by RSEM and EdgeR, data was normalised before EdgeR was applied. I took for granted that by loading the .FPKM file into excel I could use custom filters to sort the data to what I needed to interpret. Is this wrong, and calculating fold change then log2 in Excel wrong too? Any help greatly appreciated. My idea is to see the change in gene representation over the different conditions as GO terms.
Can I also ask, we get a gene and also a transcripts FPKM file, am I best going with the gene FPKM file for analysis rather than transcripts?
Again, thank you.
I can't recommend using edgeR with normalized estimated counts. Perhaps you get vaguely correct results, perhaps not, it's tough to know since the counts kind of violate the statistical model used by edgeR.
If you're interested in looking at GO enrichment, just use the gene-level metrics. While one could theoretically hope to find different GO annotation per-transcript, this never occurs (practically, at least).