Hi,
I am trying to setup pipelines for identifying differential gene expression analysis and novel transcripts.
Here is my procedures:
Downloaded data (GSE121980) from NCBI-SRA one control and one treatment .fasta files and a reference genome (S.scrofa 10.2).
After quality assessment using fastQC, aligned these files (control and treatment .fasta) with the reference S.scrofa.fa using HISAT2. In the results I have retrieved .sam and .bam files.
I tried Cufflinks and FeatureCounts (http://subread.sourceforge.net/).
I am not sure next, how I can proceed for identifying differential gene and expression analysis. I am aware about DESeq, edgeR, however I am not sure what input files are required to RUN these pipelines. I got count.txt from FeatureCounts (one control and one treatment).
Please advice how I should proceed for differential gene and expression analysis.
Please let me know any assembled script or steps to RUN DESeq2 to analysis data. I am not sure, what type of file is required for DESeq2. I have output file (count.txt) from FeatureCounts.
Here is some lines of the counts.txt file (output of FeatureCounts). Is that seem correct as input of DESeq?
I am more familiar with EdgeR, the input is a table with gene ids and counts, like:
Then in R:
You can use featureCounts output for DESeq2; however, you need to produce a counts matrix from your featureCounts output files, and then use the
DESeqDataSetFromMatrix()
function from DESeq2 for importing. There is information in the vignette in the Count Matrix Input section.Can you please let me know how to produce a counts matrix from the featureCounts output files? I have no much idea about R programming.
Thank you!
To produce the counts matrix can be done outside R. Unfortunately, we cannot really take you through everything step by step. First, identify the counts column in each featureCounts file, and then explore shell ('BASH") commands that will permit you to merge these counts columns together into a single file. Take a look at the
cut
andpaste
commands, which will help you here.If you are really a beginner in R, then you should look for tutorials about how to learn basic skills in R. If that is not enough, then I encourage you to seek help locally.
My implication is that how to prepare a matrix file from my count.txt file like you are showing below.
What I understood, do I need to add numbers of Chr column in my file of featureCounts output file and marge in a single file to prepare a input file for EdgeR.
Is that correct?
no, chr column is Chromosome ID, the featurecount table has the counts at the last columns, in your example:
cut -f1,7 table > counts
you can generate the full matrix using __cut__ and __paste__, like:
Thank you very much for your help. It works well.
Thanks JC! / ¡Gracias!