Modifying FASTQ header
3
0
Entering edit mode
2.2 years ago
SaltedPork ▴ 170

My FASTQ header:

@M03972:198:000000000-CJVFC:1:1101:13861:2151 1:N:0:TAGCGCTC+GCGTAAGA

Desired FASTQ header:

@M03972:198:000000000-CJVFC:1:1101:13861:2151/1

There are two steps I wish to perform, the first is to remove the extra Illumina details after the ' 1', the second is to convert ' 1' to '/1'.

  1. Are there any programs available that can do this? I have looked at Seqkit and FASTX-Toolkit, neither do what I am after.

  2. I have considered using sed commands, with the regex :N:0:\w+\+\w+ Is there a better command?

sed fastq awk • 922 views
ADD COMMENT
3
Entering edit mode
2.2 years ago
GenoMax 147k

Using BBMap suite:

$ more test.fq
@M03972:198:000000000-CJVFC:1:1101:13861:2151 1:N:0:TAGCGCTC+GCGTAAGA
ACGATCGAGC
+
IIIIIIIIII

$ reformat.sh -Xmx2g in=test.fq out=stdout.fq trd=t | reformat.sh -Xmx2g in=stdin.fq out=final.fq addslash=t int=f

Will produce

@M03972:198:000000000-CJVFC:1:1101:13861:2151 /1
ACGATCGAGC
+
IIIIIIIIII

replace test.fq with your own file.

ADD COMMENT
2
Entering edit mode
2.2 years ago

seqkit answer

seqkit replace -p "(^\S+)\s([0-9]+).+" -r '$1/$2' file.fastq
ADD COMMENT
1
Entering edit mode
2.1 years ago
Zhitian Wu ▴ 60

I think there is no right or wrong with your :N:0:\w+\+\w+, if it works, then just use it.

  • you might want to use [ATCG]+ instead of \w+
  • you can add $ at the end for anchoring
ADD COMMENT

Login before adding your answer.

Traffic: 2402 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6