Why sample only part of the genome with RepeatModeler?
0
0
Entering edit mode
2.5 years ago
caleigh • 0

Hi everyone,

I am running RepeatModeler2 to create a de novo TE library for a PacBio bird genome. My goal is to curate the library and create a high quality transposable element annotation of the genome, as well as using the repeat library to mask the genome before gene annotation.

According to the RepeatModeler2 paper, the default option is to sample 363 total Mbp from the genome, which works out to less than 20% of most vertebrate genomes. However, there is an option to modify the sample size, including sampling the entire genome.

I want to make sure I understand the tradeoff here. Is the rationale that you are likely to find the majority of all high copy number repeats within a small sample of the genome, such that a larger sample size leads to diminishing returns? If I have the server time, would it be ideal to run RepeatModeler with complete coverage? Or is there some downside to covering the whole genome that I am unaware of?

If you have experience and a moment to respond, I appreciate it greatly!

repeat repetitive transposable • 723 views
ADD COMMENT
1
Entering edit mode

Not sure, but I also think that the rationale is that you'll probably find representatives of most repeat families in these 363 Mb, so it is not necessary to use the whole genome. Using only 363 Mb saves time. Though, results may be somewhat more accurate if you use the whole genome.

ADD REPLY
0
Entering edit mode

Yeah, that's what I was thinking! I don't see people talk about their sample size often in papers, so I may try a few different ones and see how different the results are.

ADD REPLY

Login before adding your answer.

Traffic: 2321 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6