My first question is what are the uses/applications of reference genome mapping?
Second, assuming that I have sequenced an environmental sample for metagenomic analysis, since I have no idea of its taxonomic composition, how would I decide which genome I should use as a reference for mapping my reads?
Hi Monzoor. Here are a few tips for writing better questions. 1) Describe what you are doing. For people to help you, they must have a good understanding of what you are trying to accomplish. 2) Describe what you have tried so far, so that you don't get answers that tell you to do stuff you have already done. 3) Ask a specific question, including a question mark! This seems obvious, but I had to add 3 question marks to your questions and remove one from a sentence that was clearly not a question. 4) In order to get informative answers, you must write informative and well formated questions. :)
Hi Monzoor,
answers from my experience, hope they help:
1)In my understanding single reference genome mapping in used, when you have sequence reads that can be linked to a single (or very few reference genomes). Applications for reference genome mapping are among others:
re-sequencing of genomic DNA
genome sequencing of individuals for variant detection (SNPs, copy number variations)
sequencing of new species closely related to reference to guide assembly
RNA-seq
ChIP-seq
2) Metagenomics was not in that list. You named the reason for this already, you cannot know what you will see in that complex mixture of DNA. As you cannot decide a priori, the best bet is to BLAST against all available sequences. For example, use the full NT Blast database. Often, metagenomics is interested in microbial communities, so you could restrict the search to prokaryote sequences, while neglecting any eukaryote contribution or contamination.
Reference genome mapping can be useful even in metagenomics when you for example find a predominant taxonomica assignment in the first step. Then you can try map reads or assemble them using the predominant taxon/organism. We tried this for example in an analysis of the metagenome of a biogas plant and there other publications doing something similar.
Citing the abstract as an example:
A significant portion of contigs was
allocated to the genome sequence of
the archaeal methanogen Methanoculleus
marisnigri JR1. Mapping of single
reads to the M. marisnigri JR1 genome
revealed that approximately 64% of the
reference genome including
methanogenesis gene regions are deeply
covered.
Hi Monzoor. Here are a few tips for writing better questions. 1) Describe what you are doing. For people to help you, they must have a good understanding of what you are trying to accomplish. 2) Describe what you have tried so far, so that you don't get answers that tell you to do stuff you have already done. 3) Ask a specific question, including a question mark! This seems obvious, but I had to add 3 question marks to your questions and remove one from a sentence that was clearly not a question. 4) In order to get informative answers, you must write informative and well formated questions. :)
And you can edit, as soon as you have the time, your question (i.e. this question) to take Eric's suggestions into account.
I agree. Will be careful next time.