Entering edit mode
2.5 years ago
fifty_fifty
▴
70
I have 10 fastq files from RNA sequencing, I need to do differential analysis between two groups of samples. Each group has 5 patients. What would be the best pipeline for this? Do I align each fastq to the genome and then somehow assemble the 5 resulting files? I suppose it should be straightforward.
Basic question before you begin - do you have 10 files or 10 pairs of files? Make sure you have sequencing information for all your samples.
Look into a simple pseudo-count + DESeq2 or STAR/RSEM + DESeq2 pipeline to go from raw sequence to counts to DE analysis.
Always do some exploratory analysis (such as PCA) to assess whether your biological replicates belonging to the same sample group cluster well together before eventually merging them. Also when you perform DE analysis downstream (with DESeq2 for example), always provide a count matrix including every single biological replicate. Based on the experiment design file (often referred as coldata) DESeq2 can carry appropriate DE analysis between two sample groups and perform appropriate statistics.