Barcode Fastq Header - Adding Characters
1
0
Entering edit mode
5.0 years ago
zach ▴ 10

I'm new with Linux and would appreciate any help with this.

My forward fastq file has this type of header line for each sample: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

But my barcode fastq file only has:

@DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870

This is not allowing me to demultiplex with qiime2. I'm assuming the solution is to add '1:N:0' to the header lines of the barcode fastq for each sample. How do I do this on the command line with Linux?

Thank you!

sequencing • 3.4k views
ADD COMMENT
1
Entering edit mode

If you just need 1:N:0 then you could use reformat.sh from BBMap suite.

addcolon=f              Append ' 1:' and ' 2:' to read names, if not already present.
ADD REPLY
0
Entering edit mode

try with small fastq file: @ zach

$ sed '1~4 s/$/ 1:N:0:CGTCGTATGAAT/g' test.fq

Just to add 1:N:0 try:

$ sed '1~4 s/$/ 1:N:0/g' test.fq
ADD REPLY
1
Entering edit mode
5.0 years ago
 awk '{print $0 (NR%4==1?" 1:N:0:CGTCGTATGAAT/1":"")}' in .fq
ADD COMMENT
0
Entering edit mode

Hi all. Thank you so much for the commands and quick help - I used 'awk' and it worked to make the barcode header lines: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0

However, when demux-ing, I received another error message for mismatched seq description: N:0, N:0:CGTCGTATGAAT, and N:0:CGTCGTATGAAT

I think it's because of the index sequence in the headers of the forward and reverse header lines. Eg.

Forward: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

Reverse: @DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 2:N:0:CGTCGTATGAAT

To make matching descriptions for all 3 files and samples within them, what command line could I use to replace "1:N:0:CGTCGTATGAAT" (forward) and "2:N:0:CGTCGTATGAAT" (reverse) with just "1:N:0" (barcode) ?

I appreciate your time and effort in helping me and making this forum amazing.

ADD REPLY
1
Entering edit mode

Since you have index data in a separate file, set the fastq header to 1:N:0:CGTCGTATGAAT in that file. As long as you have just one index.

ADD REPLY
0
Entering edit mode

The sequence index in the header for all the other samples are different, eg:

@DGZN8DQ1:549:H7C23BCXX:2:1101:1087:1870 1:N:0:CGTCGTATGAAT

vs

@DGZN8DQ1:549:H7C23BCXX:2:1101:1126:1870 1:N:0:TTTGCATCAGGG

vs

@DGZN8DQ1:549:H7C23BCXX:2:1101:1189:1870 1:N:0:CCGTCTATGTTT

and so on corresponding to the barcodes. I'm guessing I have 2 options and it would be great to try them out if one doesn't work:

a) make all (barcodes, forward, reverse) header descriptions "1:N:0"

b) add description of index sequence (already in forward, reverse) to the barcodes

If I follow option b which you suggested, how do I write a command line for this in Linux? Thanks!

ADD REPLY
0
Entering edit mode

Do you have separate files for each sample or are these indexes all in one set of files. R1,R2 and I1?

ADD REPLY
0
Entering edit mode

All the samples are in R1. The same is for R2, and I1 as well.

ADD REPLY
0
Entering edit mode

Did deML not work then? A: Demultiplexing Illumina data should be the best solution.

ADD REPLY
0
Entering edit mode

Thanks. I have looked at the post you linked and have tried using deML. I ran into several errors and am currently discussing with the developer about this

ADD REPLY
0
Entering edit mode

deML does not seem to work with my data for some reason. It seems that I should still attempt to add the index sequence to the header of each sample, within the barcode file.

ADD REPLY

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6