Question

Automating Paired End mating

0

Entering edit mode

10.7 years ago

Furor ▴ 40

Hi all,

I know there have been multiple questions about demultiplexed FASTQ files for Illumina paired end reads, but as far as I've seen, none seems to deal specifically with the following issue: I have nearly 300 samples of which the forward and reverse reads should be mated, and I then should create a barcode label based on the filename (=sample) of the PEAR output files to use with either USEARCH (and/or QIIME) or a names/groups file to use with mothur.

I use PEAR to mate my reads (I might check some alternatives later). Is there a way (a 'for loop' I presume) in which I could take the 'sample+lib' part of the filename (e.g. NG-7611_SAMPLE1_lib53965_2904_1_1.fastq) of all files in a directory and use this as input for PEAR (or other).

A basic PEAR command simply goes like

pear -f NG-7611_SAMPLE1_lib53965_2904_1_1.fastq -r NG-7611_SAMPLE1_lib53965_2904_1_2.fastq -o SAMPLE1.fastq

I guess I should set a variable which retrieves the 'sample+lib#' in each loop for the specific input files, and a second one with only the sample name for the output, and fit this in the pear command.

I guess I could then use this script as a template for adding the barcodelabel too:

sed "-es/^@(.*)/@\1;barcodelabel=SAMPLE_1;/" < $in/SAMPLE_1.assembled.fastq > $P/SAMPLE_1.fastq

I've just started learning Python (and shell scripting), and sed and grep to (a.o.) do this kind of automation, but I guess it might take a couple of weeks until I'd be able to do this, so if anyone would want to help me out or set me on my way?

Thanks!

Illumina Paired-End automation Assembly • 3.9k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by Furor ▴ 40

Ram · Answer 1 · 2014-12-26

0

Entering edit mode

10.7 years ago

marina.v.yurieva ▴ 580

Off the top of my head the easiest (not the most elegant though) way in bash would be:

for a in sample1 sample2 ... etc
do
  for b in lib1 lib2 lib3 ... etc
  do
    pear -f NG-7611_${a}_${b}_1_1.fastq -r NG-7611_${a}_${b}_1_2.fastq -o $a.fastq
  done
done

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by marina.v.yurieva ▴ 580

0

Entering edit mode

Thanks. Needs some preparation, but does the job very well! Thanks

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.5 years ago by Furor ▴ 40