Altering fastq sequence identifier
1
0
Entering edit mode
7.3 years ago

I am attempting to determine false positive/negative of various alignments and want to add a unique sequence identifier onto each fastq file.

I have ten genomes which I have synthetically sequenced (so 20 fq files). The current sequence identifiers look like this:

@simulated.2618103/1

I want to change it so that it looks like this

@simulated.2618103/1.1

Each of the ten genomes will have a sequence identified 1-10. I have tried reading about how to do this with awk but don't seem to understand the program.

Thanks

fastq • 2.8k views
ADD COMMENT
3
Entering edit mode
7.3 years ago

Its a bit tricky with fastq as you need to alter only the 1st line of every record ( each record is represented in 4 lines )

So, what you can do is :

awk '{ if (NR%4==1) gsub("$",".1",$1); print }' in.fq > renamed_in.fq

Change the gsub() according to your needs,

ADD COMMENT
0
Entering edit mode

THANK YOU!

do you mind explaining the parts of your awk script? I am really struggling to learn this. Do you know of any good learning material?

ADD REPLY
0
Entering edit mode

You can read any basic awk tutorials to understand the awk syntax and inbuilt variables.

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6