hello, I am working on analysis of RNAseq from three strains of aspergillus niger with three replicates for each condition (treatment and control). I have the count file for each strain as table limited file and the sequences for each transcripted are stored as text file with their transcript id. I want to use R for analysis the differential expressed gene in each strain, also combine the same transcripts from each strain in file to compare their expression through use of heatmap. further, I would like to do gene ontology analysis for the different genes for each strain as the organism is non-model organism. could someone help me in this task? thank you for your help.
thank you a lot for your help, Just I don't have raw data and the gene ontology for aspergillus niger is lacking. so is it possible to use trinity for that purpose?
Hi
Can you show us a bunch of transcript IDs from each strain?
edit: Is important to know if for transcript quantification a reference strain of aspergillus niger was used.
thank you for your answer, the reference strain was N402 and its transcript and count as fellow
for the second strain KCN5:
for the third strain KJC3:
and for each strain, there is separated file with gene id and their sequences.
If the reference strain was N402, then KCN5 and KJC3 transcript IDs should look like those in Table 1 (N402).
This is only possible if the N402, KCN5 and KJC3 count tables share the same transcript ID.
This is not very helpful since you have transcript level counts
thank you for your reply, how is it possible to combine the same transcripts based on sequences? there is a tool or script for it?
any homology-based approach would do the job but it is a subpar solution in my opinion. I would contact the guy who did the mapping for transcript quantification and ask him to keep the transcript IDs of the reference genome.
thank you, just the person who generated this data isn’t around anymore. is there a tutorial for similar analysis?
I don't get this. It can't possibly be easier to ask on the internet than to get in touch with a person who worked in the same lab. Did they not leave any notes or contact address?
Yes, that is the problem, there is no notes plus the person who generate the data didn't left any contact and didn't left any note on how he generate the data counts. can you please let me know which format of table can be made to be used for R? like should i make table with the three strains and their counts number? for gene ontology, should have same processes? thank you and sorry for the trouble.
You said that you have counts - that should work with these scripts. The only other thing you need is a feature file (GFF or GTF) and you are good to go for DGE analysis. You will need something for gene ontology - Trinity is a de novo RNAseq assembler.
thank you for the reply, except count and genes sequences, I don't have the feature file. for gene ontology, is it possible to use the reference strain N402?