Hi guys, I have a set of contigs of genome G (using de novo assembly by Velvet) and I also have the complete sequence of the reference genome G. I want to know the repeat count (an integer number) of each contig in the reality by mapping them to the reference genome and finding and counting exact matches.
Which tools are easier to use? At the moment I'm just interested to have a 2 column result, one column showing the contig names and the other showing an integer number which is the repeat count of that contig in the reference genome. Everything else is just a bonus. Would you please let me know which tool is better or how I can easily produce this result based on MUMmer or BLAST output?
Thanks.
This question is unclear to me. What is your research question? What exactly are you try to do? What kind of data do you have: genomic, transcriptomic, etc.? Are you just trying to determine read depth at a given locus? What does "repeat count of each contig in the reality" mean -- are you identifying repeat regions within contigs? Is there a strain difference in genome "G" -- why not map sequence reads onto reference instead of contigs?
Please edit your question above. Thanks.
Thanks for replying Josh. As I said the data are genomic sequences. You can completely forget about sequence reads,read mapping, and read depth. For the repeat I mean # of times the contig Ci is observed in the reference genome G. (%100 match or some threshold e.g. %98)