Entering edit mode
2.4 years ago
smurph50
▴
50
My goal is to get a counts matrix with gene symbols from bulk RNA seq data using Kallisto.
I ran kallisto and got an abundance.tsv with target_id ensembl transcript names (ENST00000361624.2, ENST00000355349.4).
When I convert to gene symbol, I have to drop the isoform portions (ENST00000361624.2 --> ENST00000361624). This results in multiple rows with the same gene symbol.
Can I map directly to the gene symbol instead of ensembl transcript name?
See tximport to correctly summarize transcript abundances to gene level. Also, the number after the period in the ensembl transcript id is the transcript version, which keeps tracks of cases where the model of the transcript has changed in the reference genome.