I need to align to a reference a large number of reads obtained from a custom exon capture array. Sequence capturing is a technique that allows one to specify which sequences they wish to retain for sequencing.
One possibility is to map the reads to the whole genome, then sub select the regions that correspond to the exons in question but that seems like a lot of unnecessary work.
I could also build my custom genome for this: each exon gets its own sequence id (there will be about 10K of them). In this case I am concerned that most short read aligners may be optimized to treat the reference genome as few but long sequences rather many-many short ones.
Other suggestions and tips are welcome. Thanks.
This is a good point. Thanks for the link.