Splitting long reads into shorter chunks

0

Entering edit mode

6.5 years ago

pennakiza ▴ 60

Hi all,

I was wondering if splitting a long read to multiple ones would have any consequences on the quality of the read and I would like to read your opinions on that.

Just a very simple example, say I have a 10 base sequence:

@read1
ATGTGGATCA

and I split it into two 5 base ones:

@read1_1
ATGTG
@read1_2
GATCA

Thanks
Peny

long-read • 1.6k views

ADD COMMENT • link updated 8 months ago by Ram 44k • written 6.5 years ago by pennakiza ▴ 60

1

Entering edit mode

Why would you want to do that? You don't tell your goal, so it is difficult to evaluate what would be the consequences of splitting the reads. The "quality" of the reads remain the same as before, if you also split the fastq qualities.

However:

If you intend to use these reads for mapping or assembly, the results of those analyses would be poorer: shorter reads results in less specific, more multi-mapping reads; and also results in more fragmented assemblies. Downstream analyses based on those mappings or assemblies would also be negatively affected.

ADD REPLY • link 6.5 years ago by h.mon 35k

0

Entering edit mode

I have very long reads that I would like to pass through a fusion gene detection tool, which will not take the reads as they are, mainly because of the aligner that it uses. However, I am thinkinv of splitting them in quite large chucks (1000 bases, as opposed to the 10000 bases sequences ghat I have now).

ADD REPLY • link 6.5 years ago by pennakiza ▴ 60

0

Entering edit mode

Then your example of 5 and 10 nucleotide reads is not representative of the real question.

ADD REPLY • link 6.5 years ago by WouterDeCoster 47k

0

Entering edit mode

Just as an example, obviously my real-life reads are huge, around 10000 nt each. However, I was planning to chunk them into pieces of 1000 nts each.

ADD REPLY • link 6.5 years ago by pennakiza ▴ 60

Login before adding your answer.