Hello scientists,
I ran RSEM to calculate gene and isoform expression level,
Code to Prepare reference:
rsem-prepare-reference --gtf mm9.gtf --transcript-to-gene-map knownIsoforms.txt --bowtie2 mm9.fa musmus
Downloaded the fasta file from: http://hgdownload.soe.ucsc.edu/goldenPath/mm9/chromosomes/
Known isoforms.txt from: http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/
gtf file from UCSC table browser.
code to calculate expression:
rsem-calculate-expression --paired-end --bowtie2 forward_1.fastq reverse_2.fastq ref/musmus cellnumber1
Results:
Following output I got from cellnumber1.genes.results file.
gene_id transcript_id(s) length effective_length expected_count TPM FPKM
1 uc007aet.1,uc007aeu.1 3621.00 3338.70 0.00 0.00 0.00
10 uc011whv.1 26.00 0.00 0.00 0.00 0.00
100 uc007amd.1,uc007ame.1 4355.00 4072.70 1.80 0.32 0.17
1000 uc007dac.1 1403.00 1120.70 0.00 0.00 0.00
10000 uc008ajp.1,uc012ajs.1 1415.50 1133.20 0.00 0.00 0.00
10001 uc008ajq.1 2046.00 1763.70 0.00 0.00 0.00
10002 uc008ajr.1,uc008ajs.1,uc008ajt.1,uc008aju.1,uc012ajt.1 6290.60 6008.64 0.00 0.00 0.00
And I don't see any gene name in the gene_id column, rather it shows only numbers! I don't know why!, Is this a correct output? how do I get gene information! (In some tutorials the output looks different from this)
thanks in advance! please help!
But you have transcript Ids right, e.g. uc007aet.1 and uc008ajq.1. Those are Knowngene identifiers, corresponding to the knowngene transcriptome you downloaded.
Yeah WouterDeCoster, But I want to do differential expression analysis, so I want them as gene name, I may map them to gene name (using some tools/ucsc table browser) but a single line contains multiple transcript ID which is separated by comma. I don't know how to do !
p.s I have 70 sequences, If it is not working, I should redo with ensembl reference! please help me
thanks for your response