Entering edit mode
4.4 years ago
mathavanbioinfo
▴
80
Hello Biostars, I am currently working on WGCNA analysis for RNA-Seq data, the input gene count matrix files contain Ensemble gene ids and MSTRG ids. Now I want to create the annotation files. I have stringTie merged files, can I use it as annotation file?
If possible means that file contains many repetitions gene symbols and Ensembl ids. My query is, How to extract ensemble gene ids and corresponding gene names from the StringTie file into separate CSV files.
annotation file format
S.No Ensembl id Gene name
Which StringTie files have you? - please elaborate. Also, obviously, a pre-requisite to using WGCNA is to have already completed the WGCNA tutorials.
Thank you very much for your response I used to follow this protocol https://www.nature.com/articles/nprot.2016.095 at one step I have to make stringTie merge to combine all the samples with the reference annotation file of h38 downloaded from Ensembl ftp.
$ stringtie --merge -p 8 -G chrX_data/genes/chrX.gtf -o stringtie_merged.gtf chrX_data/mergelist.txt
Please refer this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown