Question

Finding Contig Repeat Counts By Mapping Contigs To The Reference Genome

-1

Entering edit mode

12.0 years ago

misaghb ▴ 20

Hi guys, I have a set of contigs of genome G (using de novo assembly by Velvet) and I also have the complete sequence of the reference genome G. I want to know the repeat count (an integer number) of each contig in the reality by mapping them to the reference genome and finding and counting exact matches.

Which tools are easier to use? At the moment I'm just interested to have a 2 column result, one column showing the contig names and the other showing an integer number which is the repeat count of that contig in the reference genome. Everything else is just a bonus. Would you please let me know which tool is better or how I can easily produce this result based on MUMmer or BLAST output?

Thanks.

contigs mapping alignment reference repeats cnv • 3.7k views

ADD COMMENT • link 12.0 years ago by misaghb ▴ 20

0

Entering edit mode

This question is unclear to me. What is your research question? What exactly are you try to do? What kind of data do you have: genomic, transcriptomic, etc.? Are you just trying to determine read depth at a given locus? What does "repeat count of each contig in the reality" mean -- are you identifying repeat regions within contigs? Is there a strain difference in genome "G" -- why not map sequence reads onto reference instead of contigs?

Please edit your question above. Thanks.

ADD REPLY • link 12.0 years ago by Josh Herr 5.8k

0

Entering edit mode

Thanks for replying Josh. As I said the data are genomic sequences. You can completely forget about sequence reads,read mapping, and read depth. For the repeat I mean # of times the contig Ci is observed in the reference genome G. (%100 match or some threshold e.g. %98)

ADD REPLY • link 12.0 years ago by misaghb ▴ 20