Hi,
I am only interested in aligning DNAseq reads to certain genes. If I split my reference genome based on the coordinates of my gene of interest (as present in the GTF/GFF file) and then use BWA for aligning my reads to the resulting 'smaller reference genomes', will it be a good idea?
If yes, is there a threshold to the number of bases upstream and downstream of the gene coordinates that should be considered? And what caveats does this method involving splitting the reference genome can have that I should pay attention to?
My motto for using this method is to reduce alignment time as I am only interested in say 20-30 genes and not all genes.
what you can do is removing the reads after bwa and before sorting
Makes sense Pierre..Thank you for letting me know the caveat..