Demultiplex of V4 reads
1
0
Entering edit mode
6.4 years ago
agata88 ▴ 870

Hi all!

I have reads in R1 and R2 files, and I1 file including 12nc indexes (also fastq). In order to divide reads into separate files, I've performed merging step.

I used join_paired_ends.py program with following command:

join_paired_end.py -f <R1> -r <R2> -o demultiplexed/ -m fasts-join -b <I1> -p 15

It ended up with almost 94% of joined reads. Next, i've performed split_libraries_fastq.py script. When I looked at the histogram the number of reads per sequence length was very diverse:

Length  Count
249.0   14545440
.
.
.
489.0   1467

The amplification was performed for V4 region which length is around 291 nc, the sequencing was 2x250bp. So my question is, how come I have 1467 reads with almost 500 length? Is this a contamination?

Should I discard all read longer than 300bp for further analysis? What do you think about it?

Thanks in advance! Best, Agata

miseq demultiplex 16S • 1.2k views
ADD COMMENT
0
Entering edit mode
6.4 years ago
h.mon 35k

You have 0.01% of 489 bp reads, compared to only 249 bp reads. If you consider the interval from, say, 249-333 bp, this percentage will probably be even smaller. This is pretty minimal, and some contamination / artifacts is usual for next generation sequencing.

You could provide more stringent parameters for join_paired_end.py. You used the tool defaults, which is to use fastq-join, and fastq-join default minimum overlap is 6 bases, if I am not mistaken.

ADD COMMENT

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6