Question

fragment size estimation during vg mpmap and rpvg

0

Entering edit mode

8 months ago

Juhyun • 0

Hi,

I have a question about fragment size estimation during vg mpmap and rpvg using short-read RNA-seq data. Does the fragment size refer to the cDNA size in the library, rather than the size of a single read in a paired-end setup?

In my data, the single reads are 150 bp, and I estimate the fragment size to be around 300–500 bp. However, during vg mpmap, the fragment length was estimated to be approximately 200 bp. I wonder if this discrepancy could be related to the errors I reported in the rpvg GitHub issue.

Would it be better to set the mean and standard deviation of the fragment size during vg mpmap and rpvg?

vg • 582 views

ADD COMMENT • link updated 6 months ago by Jordan M Eizenga ▴ 740 • written 8 months ago by Juhyun • 0

score 1 · Answer 1 · 2024-11-18

The measurement that vg mpmap uses internally for the fragment length is the minimum distance through the graph, which is often somewhat shorter than the true length of the fragment. This tends not to be a serious problem since the bounds that vg mpmap uses on fragment size are fairly wide. Moreover, it's beneficial to use fragment length parameters that match the properties of the fragment length measurements that are used internally, even though they are somewhat biased downward.

By default, with paired-end reads, rpvg re-estimates the fragment length based on read pairs with unambiguous transcript assignments. This estimation tends to be much closer to the true fragment length, and it doesn't depend on the fragment length parameters estimated by vg mpmap.

Upshot: I don't think there's much to be gained by providing your own fragment length in vg mpmap or rpvg, unless you have single-end reads.