Error with demuxbyname.sh
2
0
Entering edit mode
5.9 years ago
mb2subi ▴ 10

Hi everyone,

I am trying to demultiplex a fastq.gz paired-end reads with demuxbyname.sh but I obtained an empty output, I attach the log :

demuxbyname.sh prefixmode=t in=data/raw/ctrl4_MLH1_SNP_R1.fastq.gz in2=data/raw/ctrl4_MLH1_SNP_R2.fastq.gz out=data/raw/ctrl4_MLH1_SNP_%_R1.fastq.gz out2=data/raw/ctrl4_MLH1_SNP_%_R2.fastq.gz outu=data/raw/ctrl4_MLH1_SNP_un_R1.fastq.gz outu2=data/raw/ctrl4_MLH1_SNP_un_R2.fastq.gz names=data/raw/barcode.txt'

Error:

Set INTERLEAVED to false
Input is being processed as paired
Time:               17.335 seconds.
Reads Processed:    11907482    686.92k reads/sec
Bases Processed:    893061150   51.52m bases/sec
Reads Out:    0
Bases Out:    0

And the head of the raw files and barcode:

R1

@NS500645:134:HCG2VBGX5:1:11101:18816:1127 1:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
+
AAAAAEAAEEEEEAEEEEAEEEEEAEAEEE/EEEAEE/EEEEE</EE<A//66/A/<EAE/6//<E<AA/<<//6
@NS500645:134:HCG2VBGX5:1:11101:11383:1143 1:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEEAEAEAE/EEEEEEEEAAEAA<EAEEEEE</EEEE/E
@NS500645:134:HCG2VBGX5:1:11101:14548:1152 1:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
+
AAAAAEEEEEEEEEEAEEE6EEEEEEEEEE/EEEE6EEEEEEEEE/EEEEEE<EAE<EEAEE/EEEEEAEE/E<E

R2

@NS500645:134:HCG2VBGX5:1:11101:14548:1152 2:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
GTACCTAGTTAATTCCTATTTATCCTTCATATTTCAAAAAATATTTCTTCAAAGAACCTTCTCTAATGATCTCTA
+
AAAAAEE6EEAEEEE<EEEEEEEEEEEEEEAEEEEEEEEEEE/EEEEEEEAEEEEEEEEEEEEE<AEAEEEEEEE
@NS500645:134:HCG2VBGX5:1:11101:5816:1154 2:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
ATGATGGGCTGCTTAATTTCAAAATCTTTAAAGTTTCAGTTTGGTTTCACAATGCCTCCAAATTCTTCCATGCAC
+
AA/AAAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEE//EAEE<E/EEEAEEAEEEEEEEEEE<A6E<
@NS500645:134:HCG2VBGX5:1:11101:9776:1160 2:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
AGGATCTTGCCTTGTCTTTCCACCTCCCCAGTGATGATCTCTAACGCGCAAGCGCATATCCTTCTAGGTAGCGGG
+
A/AAAEAEEEEEEEEEAEEEEEEEEEEEEE/EEAEEAEEEEEAEAEAEEAA/EEEAEAEA<EEEEAAEAE6/EAE

Barcode:

I tried both:

CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTT 
CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTC

and

TCAGTGCCTCGTGCTCACGTTCTTCCTTT 
TCAGTGCCTCGTGCTCACGTTCTTCCTTC
demuxbyname demultiplex • 4.1k views
ADD COMMENT
0
Entering edit mode
@NS500645:134:HCG2VBGX5:1:11101:18816:1127 1:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
+
AAAAAEAAEEEEEAEEEEAEEEEEAEAEEE/EEEAEE/EEEEE</EE<A//66/A/<EAE/6//<E<AA/<<//6

This does not look like a correct fastq record. The header line should only have

@NS500645:134:HCG2VBGX5:1:11101:18816:1127 1:N:0:CTTGTA

Where is this additional stuff in the header coming from?

TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
ADD REPLY
0
Entering edit mode

It's UMI-4C data. The point it's that I want to demultiplex by a SNP in the additional 'stuff'. It's is possible to demultiplex using the sequence by itself, omitting the header?

ADD REPLY
0
Entering edit mode

demuxbyname.sh is a simple tool that demultiplexes data based on standard Illumina indexes present in header. It is not going to be of help in this case.

It looks like you will need to use umi4cPackage for data analysis. It can be found here.

ADD REPLY
0
Entering edit mode

I want to analyse separately the reads that present one allele in a determinate point and the other allele, for this reason I want to first demultiplex according to the allele and then use umi4cpackage for the analysis.

ADD REPLY
1
Entering edit mode

Then this will likely require some custom work on your part. You could try to remove the space (or add a +) between Illumina index and stuff to make it into a long string and then use demuxbyname.sh.

CTTGTA+TCAGTGCCTCGTGCTCACGTTCTTCCTTT 
CTTGTA+TCAGTGCCTCGTGCTCACGTTCTTCCTTC

You could see if deML (https://github.com/grenaud/deML ) is able to work with the space or if not that would be another option after you remove the spaces in read header between Illumina index and other sequence.

ADD REPLY
0
Entering edit mode

I did some small bash script for trying the + option, tomorrow I'll try demuxbyname with the modification

ADD REPLY
3
Entering edit mode
5.9 years ago

Hello,

instead of using prefixmode=t use substringmode=t.

fin swimmer

ADD COMMENT
0
Entering edit mode

This works but will result in files with spaces in names. A minor irritant that can be taken care of later.

The point it's that I want to demultiplex by a SNP in the additional 'stuff'

mb2subi : Do you know how many variants there are or do you expect the SNP to be anywhere in the stuff? Depending on the answer deML may be better.

ADD REPLY
0
Entering edit mode

Yes, I was thinking about to include a note in my post about always use standard formats. But somehow I forgot it...

ADD REPLY
0
Entering edit mode

Perhaps the data gets generated in this format? Not sure if this is something specific for UMI-4C data.

ADD REPLY
0
Entering edit mode

I know the variants, T and C

ADD REPLY
0
Entering edit mode

Then use demuxbyname.sh with parameter suggested by @fin. You will include both versions in your barcode file.

ADD REPLY
0
Entering edit mode

Hi fin swimmer,

I tried it but I obtained the same output:

java -ea -Xmx1200m -cp /software/debian-8/bio/bbmap/current/ jgi.DemuxByName substringmode=t in=data/raw/ctrl4_MLH1_SNP_R1.fastq.gz in2=data/raw/ctrl4_MLH1_SNP_R2.fastq.gz out=data/raw/ctrl4_MLH1_SNP_%_R1.fastq.gz out2=data/raw/ctrl4_MLH1_SNP_%_R2.fastq.gz outu=data/raw/ctrl4_MLH1_SNP_un_R1.fastq.gz outu2=data/raw/ctrl4_MLH1_SNP_un_R2.fastq.gz names=data/raw/barcode.txt
Executing jgi.DemuxByName [substringmode=t, in=data/raw/ctrl4_MLH1_SNP_R1.fastq.gz, in2=data/raw/ctrl4_MLH1_SNP_R2.fastq.gz, out=data/raw/ctrl4_MLH1_SNP_%_R1.fastq.gz, out2=data/raw/ctrl4_MLH1_SNP_%_R2.fastq.gz, outu=data/raw/ctrl4_MLH1_SNP_un_R1.fastq.gz, outu2=data/raw/ctrl4_MLH1_SNP_un_R2.fastq.gz, names=data/raw/barcode.txt]

Set INTERLEAVED to false
Input is being processed as paired
Time:               24.465 seconds.
Reads Processed:    11907482    486.71k reads/sec
Bases Processed:    893061150   36.50m bases/sec
Reads Out:    0
Bases Out:    0
ADD REPLY
0
Entering edit mode
5.9 years ago
mb2subi ▴ 10

Thanks to everyone, it finally works

Set INTERLEAVED to false
Input is being processed as paired
Time:               78.217 seconds.
Reads Processed:    11907482    152.24k reads/sec
Bases Processed:    893061150   11.42m bases/sec
Reads Out:    18230680
Bases Out:    1367301000

I changed the header from:

CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT

To

CTTGTA+TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT

And then, using the parameter:

substringmode=t
ADD COMMENT
0
Entering edit mode

Original header should have worked as well (I had tested it with your example reads above).

You should accept @fin's answer above as "accepted" since that gave you the critical info needed.

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6