Splitting reference genome for alignment
1
1
Entering edit mode
7.1 years ago
prasundutta87 ▴ 670

Hi,

I am only interested in aligning DNAseq reads to certain genes. If I split my reference genome based on the coordinates of my gene of interest (as present in the GTF/GFF file) and then use BWA for aligning my reads to the resulting 'smaller reference genomes', will it be a good idea?

If yes, is there a threshold to the number of bases upstream and downstream of the gene coordinates that should be considered? And what caveats does this method involving splitting the reference genome can have that I should pay attention to?

My motto for using this method is to reduce alignment time as I am only interested in say 20-30 genes and not all genes.

alignment genome gene • 1.8k views
ADD COMMENT
2
Entering edit mode
7.1 years ago

for aligning my reads to the resulting 'smaller reference genomes', will it be a good idea?

NO, you'll get false positives. It's the same as : Exome Sequencing: Masking The Non-Genic Sequences ? (you're 'masking' a whole chromosome) . Citing Heng Li:

This will lead to wrongly mapped sequences, spurious SNPs/indels calls and all sorts of problems. I cannot think of a single use case when masking [before mapping] may lead to better outcomes."

ADD COMMENT
0
Entering edit mode

I am only interested in aligning DNAseq reads to certain genes

what you can do is removing the reads after bwa and before sorting

bwa (...) | samtools view -L my.bed (...) | samtools sort (...)
ADD REPLY
0
Entering edit mode

Makes sense Pierre..Thank you for letting me know the caveat..

ADD REPLY

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6