Question

How does the calculated inner-distance between reads change after trimming (in relation to insert size)

1

Entering edit mode

10.2 years ago

jomaco ▴ 200

If we had an insert length of 426bp and an adapter size of 65bp, then the length of the central region between two 100bp reads would be 96bp (426bp - 130bp - 100bp - 100bp).

So providing the mean length of reads in the library is in fact 100bp, then the avg. distance between the reads for that library would be 96bp.

As Istvan noted (How does the insert size parameter change after trimming (MATS tool)), if you were to trim off a fixed number of bases, for example resulting in an avg. read length of 98bp then presumably this distance would increase to 100bp.

If this increase in distance is correct, then how is it any different if we assume those removed bases were actually errors?

The 100bp read length is being reduced to 98bp so you would assume that the reads would map to a reference genome with 4bp more in-between them. Should I therefore find the average post-trimming read length of the library and use this to calculate the inner distance, or am I missing something here?

Thanks

trimming insert-length paired-end • 5.5k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by jomaco ▴ 200

0

Entering edit mode

I believe you have a good point and you should re-calculate the insert size for any software that is sensitive to it. However, this issue only matters for poor-quality/long reads, where you trim like 20-50bp. As an example, with MiSeq 300bp paired end one rarely could get good quality of last 50 bp, so here it matters a lot.

ADD REPLY • link 10.2 years ago by mikhail.shugay 3.5k

0

Entering edit mode

In others words, if the "-r" option in TopHat (for example) was set to 92bp instead of the original 96bp, then this would not make a large enough difference to worry about? (Whereas -100bp would clearly make a big difference).

ADD REPLY • link 10.2 years ago by jomaco ▴ 200

0

Entering edit mode

For illumina short-insert reads, tools rarely use inner distance because it is not a well defined number. External distance makes much more sense. Experimentally, it is the length of the DNA fragment subjected to sequencing. This length is not affected by read lengths, low-qual bases at the tail or 3'-end adapters which occur much more frequently than 5'-end adapters.

ADD REPLY • link 10.2 years ago by lh3 33k

0

Entering edit mode

That makes sense, but in TopHat the "-r" parameter "is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. The default is 50bp." Should I be concerned about the removal of these few bp?

ADD REPLY • link 10.2 years ago by jomaco ▴ 200

Ram · Answer 1 · 2014-10-16

IMHO inner distance between reads is a weird and confusing measure that should preferably be avoided - it is not all that informative.

It is a measure that is very sensitive to 3' trimming whereas clipping the ends of a read does not alter the insert size.

A simple example: the same library could be sequenced with different read lengths and the insert size would be the same in each run whereas the inner distance would depend on the actual read lengths for each pair.

Also to correct my answer there How does the insert size parameter change after trimming (MATS tool) trimming the 3' ends would not change the apparent insert size either, only trimming the 5' ends.