Modifying barcode sequence in fq files
2
1
Entering edit mode
8.6 years ago
AP ▴ 100

Hello,

I have several .fq files containing 5bp inline barcodes at the beginning of each read such as (barcodes are between *) :

@gi|110640213|ref|NC_008253.1|_418_952_1:0:0_1:0:0_0/1
*CCAGG*CAGTGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCATCTGGTAGCGATGAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_31_476_0:0:0_0:0:0_1/1
*CAGAT*GGTTGGTGATTTTGGCGGGGGCAGAGAGGACGGTGGCCACCTGCCCCTGCCTGGCATTGCTTTCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_210_743_2:0:0_1:1:0_2/1
*CATTA*CCACCACCATCACCATTACCACAGGAAACGGTGCGGGCTGACGCGTACAGGAAACACCGAAAAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222

I would like to modify these sequences in order to have the same for each read (here it would start by AAAAA):

@gi|110640213|ref|NC_008253.1|_418_952_1:0:0_1:0:0_0/1
*AAAAA*CAGTGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCATCTGGTAGCGATGAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_31_476_0:0:0_0:0:0_1/1
*AAAAA*GGTTGGTGATTTTGGCGGGGGCAGAGAGGACGGTGGCCACCTGCCCCTGCCTGGCATTGCTTTCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@gi|110640213|ref|NC_008253.1|_210_743_2:0:0_1:1:0_2/1
*AAAAA*CCACCACCATCACCATTACCACAGGAAACGGTGCGGGCTGACGCGTACAGGAAACACCGAAAAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222

I want to make sure that only the sequence at the beginning of the reads are modified and not throughout the read itself. The barcode sequence might be present within reads and I don't want to modify it.

Do you know any easy way to do this? Thanks!

fastq barcode • 2.1k views
ADD COMMENT
4
Entering edit mode
8.6 years ago
Gabriel R. ★ 2.9k

I assume that the * are not part of the sequence and are just there to highlight them :-) Then use awk:

zcat [in fasta file]  |awk '{if(NR%4==2){print "AAAAA"substr($0,5)}else{print $0}}' |gzip > [output fasta].gz
ADD COMMENT
0
Entering edit mode

Works like a charm! Thanks Gabriel. I was trying things with awk but I was not successful. This solves my issue. Also yes, the * are not part of the sequence :-)

ADD REPLY
0
Entering edit mode

you are most welcome, mark the question as answered if you please :-)

ADD REPLY
0
Entering edit mode
8.5 years ago
AP ▴ 100

From Gabriel R:

zcat [in fasta file] |awk '{if(NR%4==2){print "AAAAA"substr($0,5)}else{print $0}}' |gzip > [output fasta].gz

ADD COMMENT

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6