Hi All,
Just a general query regarding best practice;
What is the BEST way to align WGS data to an individual gene or fragment?
Example: I have several libraries of high coverage shotgun sequence from my favourite organism. I have previously aligned this data to the latest high quality draft genome. I can use the annotations and locations to easily extract sequences, variants or whatever from the genome. If a gene is not annotated I can also easily locate it within the draft genome using homology searches and then extract whatever information I am interested in.
But what if the gene I am interested in is present within the organism and previously sequenced but missing from the draft genome. What is the best way to use my WGS sequences to inspect this gene?
I can think of a couple of strategies neither are fully satisfying..
Map my libraries directly to the single gene but I end up with crazy high coverage, some weird calls and I do not entirely trust this method.
Or do I add the gene as a mock chromosome to the reference sequence and re-align to this new genome, removing high coverage issues and hopefully only recruiting the correct reads to the gene of interest? I would still miss reads that overlap the ends of the mock chromosome.
Any thoughts?
Ciaran
This is the right approach. Aligners try very hard to place every read, and if you map all of the reads to only your segment of interest, you're probably going to get lots of reads incorrectly mapped there that would otherwise map well to other portions of the genome.