Normally i do it kind layman or novice , I put all the command in a script lets say i have 5 files then 5 command for each, make directory for each sample its tedious, but it is difficult when i have to handle large number of files
RD1_R1.fastq
RD1_R2.fastq
RD2_R1.fastq
RD2_R2.fastq
RD3_R1.fastq
RD3_R2.fastq
RD4_R1.fastq
RD4_R2.fastq
RD5_R1.fastq
RD5_R2.fastq
RD6_R1.fastq
RD6_R2.fastq
RD7_R1.fastq
RD7_R2.fastq
RD8_R1.fastq
RD8_R2.fastq
RD9_R1.fastq
RD9_R2.fastq
RD10_R1.fastq
RD10_R2.fastq
So i have a list of these paired end files ,i would like to make directory put paired end for in a single directory , then run fastqc , then would like to run tophat over the paired end files which would be in a single folder , lets say for files RD1_R1 & RD1_R2 which are paired end files , they are in RD1 folder then i would run fastqc , then tophat over the files and the result in the same folder. Like this i would like to do over all the paired end files..
I know basics of shell scripting like making folder for respective files, single file operation but this is bit complex i mean putting two files together and doing operation .
Any help or suggestion would be highly appreciated .
IMHO, Much of the work flows in NGS are parallel.
1) For creating 10 directories for 20 samples, following is the code:
You can achieve the same by:
2) Now try to move each pair of files into their respective directories.
3) Like wise, you can parallelize fastqc as well.
thanks let me do this , these are normal things yet i have to break my head break my head because i dont know how to put things together
Just take the parallel command and start with checking the output of:
Step by step, try to figure out how each component works. With this understanding, it will get easier for you to modify this for different scenarios later on. A very helpful resource is explainshell, in which you can copy paste shell commands which are then, well, explained to you.
will try your solution
if you are not sure, what parallel is doing, append --dry-run to parallel command. This would display what it is going to do, but would not execute the command. Once you are sure, then remove --dry-run.
for eg.
PS: Btw, you mentioned tophat in your OP. I guess hisat2 is in, tophat is passe (as per the tophat devs) unless you have a valid requirement. You may also refer to https://digibio.blogspot.in/2015/08/rna-seq-data-analysis-tophat-htseq-and.html for bash solution for RNA seq analysis. It involves bash, loops and tophat-cufflinks-cummerbund analysis in addition to tophat-htseq-deseq2 workflow. It is written 2 yr ago, almost.