I am working on my NGS data for gene set enrichment analysis now. I wonder how can I convert my .fastq CleanData received from sequencing company into .gct file for GSEA software?
I know some software like GenePattern can convert some file format such as .arff and .pcl into .gct file, but it seems no such tool is available for .fastq file!
As WouterDeCoster said, you are really skipping too many steps.
If all you care about is GSEA, you can rank your genes using some criteria (such as from DEG analysis) and prepare a rank file. You can then feed the ranked gene list to GSEA.
Thanks! It's really helpful. I tried this script and it works but the row names and column names in the output file contains quote for each gene/cell. How to remove the quotes as GSEA gene set doesn't use quote.
You are skipping an awful lot of steps of your analysis.
Your fastq data first needs to be aligned to the genome, reads need to be counted and you need to perform differential expression analysis. The results of the last step can be used in GSEA.
Thank you so much! I have worked it out, what you said is really true. It was my first time working on NGS data. I finally found out the sequencing company actually processed the data somewhat, such as alignment and reads counting.
As WouterDeCoster said, you are really skipping too many steps.
If all you care about is GSEA, you can rank your genes using some criteria (such as from DEG analysis) and prepare a rank file. You can then feed the ranked gene list to GSEA.