Hi all,
I know there have been multiple questions about demultiplexed FASTQ files for Illumina paired end reads, but as far as I've seen, none seems to deal specifically with the following issue: I have nearly 300 samples of which the forward and reverse reads should be mated, and I then should create a barcode label based on the filename (=sample) of the PEAR output files to use with either USEARCH (and/or QIIME) or a names/groups file to use with mothur.
I use PEAR to mate my reads (I might check some alternatives later). Is there a way (a 'for loop' I presume) in which I could take the 'sample+lib' part of the filename (e.g. NG-7611_SAMPLE1_lib53965_2904_1_1.fastq
) of all files in a directory and use this as input for PEAR (or other).
A basic PEAR command simply goes like
pear -f NG-7611_SAMPLE1_lib53965_2904_1_1.fastq -r NG-7611_SAMPLE1_lib53965_2904_1_2.fastq -o SAMPLE1.fastq
I guess I should set a variable which retrieves the 'sample+lib#' in each loop for the specific input files, and a second one with only the sample name for the output, and fit this in the pear command.
I guess I could then use this script as a template for adding the barcodelabel too:
sed "-es/^@(.*)/@\1;barcodelabel=SAMPLE_1;/" < $in/SAMPLE_1.assembled.fastq > $P/SAMPLE_1.fastq
I've just started learning Python (and shell scripting), and sed and grep to (a.o.) do this kind of automation, but I guess it might take a couple of weeks until I'd be able to do this, so if anyone would want to help me out or set me on my way?
Thanks!
Thanks. Needs some preparation, but does the job very well! Thanks