Pipe output from find to input of awk
1
0
Entering edit mode
7.3 years ago

I have 20 fastq files (paired end reads) and to add a unique number onto the end of the sequence identifier in the fastq files.

So I want this from genome 1:

simulated.2618103/1

To look like this:

simulated.2618103/1.1

I have an awk command that will do the above:

awk '{ if  (NR%1==4) gsub("$",".1",$1); print }' in.fq > renamed_in.fq

I want a way to find all the genome 1-10 files and execute the awk command so that each fastq file gets the unique identifier.

So genome 1 should have .1 at the end of its sequence identifier, genome 2 should have .2 at the end of its sequence identifier, etc.

I have tried this:

find . -name "sub_NC_001539*" -exec awk ' { if (NR%4==1) gsub("$", ".1", $1); print } '

The problem isnt the awk command. I just don't know how to get find to pipe correctly to awk and to keep the output as paired end reads

Thanks

awk • 4.3k views
ADD COMMENT
0
Entering edit mode

Just a modification to the Pierre's answer, as you also need to have the uniqe ID with in fastq,

var=1
find . -type f  -name "sub_NC_001539*" | while read F
do
awk -v id=${var} ' { if (NR%4==1) gsub("$", "."id, $1); print } ' ${F} > $(dirname ${F})/new_$(basename ${F})
((var+=1))
done
ADD REPLY
0
Entering edit mode

Thank you. Do you mind explaining the code? I am new to coding and don't quite understand that

ADD REPLY
2
Entering edit mode
7.3 years ago

loops.

find . -type f  -name "sub_NC_001539*" | while read F
do
  awk ' { if (NR%4==1) gsub("$", ".1", $1); print } ' ${F} > $(dirname ${F})/new_$(basename ${F})
done
ADD COMMENT
0
Entering edit mode

Thanks, this code worked

ADD REPLY

Login before adding your answer.

Traffic: 1637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6