Entering edit mode
6.5 years ago
pennakiza
▴
60
Hi all,
I was wondering if splitting a long read to multiple ones would have any consequences on the quality of the read and I would like to read your opinions on that.
Just a very simple example, say I have a 10 base sequence:
@read1
ATGTGGATCA
and I split it into two 5 base ones:
@read1_1
ATGTG
@read1_2
GATCA
Thanks
Peny
Why would you want to do that? You don't tell your goal, so it is difficult to evaluate what would be the consequences of splitting the reads. The "quality" of the reads remain the same as before, if you also split the fastq qualities.
However:
If you intend to use these reads for mapping or assembly, the results of those analyses would be poorer: shorter reads results in less specific, more multi-mapping reads; and also results in more fragmented assemblies. Downstream analyses based on those mappings or assemblies would also be negatively affected.
I have very long reads that I would like to pass through a fusion gene detection tool, which will not take the reads as they are, mainly because of the aligner that it uses. However, I am thinkinv of splitting them in quite large chucks (1000 bases, as opposed to the 10000 bases sequences ghat I have now).
Then your example of 5 and 10 nucleotide reads is not representative of the real question.
Just as an example, obviously my real-life reads are huge, around 10000 nt each. However, I was planning to chunk them into pieces of 1000 nts each.