Hi everyone,
I am trying to demultiplex a fastq.gz paired-end reads with demuxbyname.sh but I obtained an empty output, I attach the log :
demuxbyname.sh prefixmode=t in=data/raw/ctrl4_MLH1_SNP_R1.fastq.gz in2=data/raw/ctrl4_MLH1_SNP_R2.fastq.gz out=data/raw/ctrl4_MLH1_SNP_%_R1.fastq.gz out2=data/raw/ctrl4_MLH1_SNP_%_R2.fastq.gz outu=data/raw/ctrl4_MLH1_SNP_un_R1.fastq.gz outu2=data/raw/ctrl4_MLH1_SNP_un_R2.fastq.gz names=data/raw/barcode.txt'
Error:
Set INTERLEAVED to false
Input is being processed as paired
Time: 17.335 seconds.
Reads Processed: 11907482 686.92k reads/sec
Bases Processed: 893061150 51.52m bases/sec
Reads Out: 0
Bases Out: 0
And the head of the raw files and barcode:
R1
@NS500645:134:HCG2VBGX5:1:11101:18816:1127 1:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
+
AAAAAEAAEEEEEAEEEEAEEEEEAEAEEE/EEEAEE/EEEEE</EE<A//66/A/<EAE/6//<E<AA/<<//6
@NS500645:134:HCG2VBGX5:1:11101:11383:1143 1:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEEAEAEAE/EEEEEEEEAAEAA<EAEEEEE</EEEE/E
@NS500645:134:HCG2VBGX5:1:11101:14548:1152 1:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
+
AAAAAEEEEEEEEEEAEEE6EEEEEEEEEE/EEEE6EEEEEEEEE/EEEEEE<EAE<EEAEE/EEEEEAEE/E<E
R2
@NS500645:134:HCG2VBGX5:1:11101:14548:1152 2:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
GTACCTAGTTAATTCCTATTTATCCTTCATATTTCAAAAAATATTTCTTCAAAGAACCTTCTCTAATGATCTCTA
+
AAAAAEE6EEAEEEE<EEEEEEEEEEEEEEAEEEEEEEEEEE/EEEEEEEAEEEEEEEEEEEEE<AEAEEEEEEE
@NS500645:134:HCG2VBGX5:1:11101:5816:1154 2:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTCAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
ATGATGGGCTGCTTAATTTCAAAATCTTTAAAGTTTCAGTTTGGTTTCACAATGCCTCCAAATTCTTCCATGCAC
+
AA/AAAEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEE//EAEE<E/EEEAEEAEEEEEEEEEE<A6E<
@NS500645:134:HCG2VBGX5:1:11101:9776:1160 2:N:0:CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTTAGCTGTAGCTTACGCCATCCAGCCCCACCCTTCAGCGGCAGCTATT
AGGATCTTGCCTTGTCTTTCCACCTCCCCAGTGATGATCTCTAACGCGCAAGCGCATATCCTTCTAGGTAGCGGG
+
A/AAAEAEEEEEEEEEAEEEEEEEEEEEEE/EEAEEAEEEEEAEAEAEEAA/EEEAEAEA<EEEEAAEAE6/EAE
Barcode:
I tried both:
CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTT
CTTGTA TCAGTGCCTCGTGCTCACGTTCTTCCTTC
and
TCAGTGCCTCGTGCTCACGTTCTTCCTTT
TCAGTGCCTCGTGCTCACGTTCTTCCTTC
This does not look like a correct fastq record. The header line should only have
Where is this additional stuff in the header coming from?
It's UMI-4C data. The point it's that I want to demultiplex by a SNP in the additional 'stuff'. It's is possible to demultiplex using the sequence by itself, omitting the header?
demuxbyname.sh
is a simple tool that demultiplexes data based on standard Illumina indexes present in header.It is not going to be of help in this case.It looks like you will need to use
umi4cPackage
for data analysis. It can be found here.I want to analyse separately the reads that present one allele in a determinate point and the other allele, for this reason I want to first demultiplex according to the allele and then use umi4cpackage for the analysis.
Then this will likely require some custom work on your part. You could try to remove the space (or add a
+
) between Illumina index andstuff
to make it into a long string and then usedemuxbyname.sh
.You could see if
deML
(https://github.com/grenaud/deML ) is able to work with the space or if not that would be another option after you remove the spaces in read header between Illumina index and other sequence.I did some small bash script for trying the + option, tomorrow I'll try demuxbyname with the modification