I am processing Illumina reads from many lanes. We are mainly interested to study SNPs, recombination etc in chromosomes (2L, 2R, 3L, 3R, 4 and X). I have a basic question regarding "mapping of reads to the Drosophila genome": Do I need to include chromosomes Het, U and Extra's for mapping or exclude them and map to the rest of the genome. How does this affect? I need your thaughts in support or against.
thanks @casey and @brentp: I am trying both on a test set and will see how it will reflect in results
I am trying to map using "bwa" is there a way I can implement similar to "bowtie -m 1"
i dont use bwa much, but it looks like the samse and sampe commands have a -n parameter which does close to that.
thanks - @brentp
One of the things to watch out for when analyzing D. mel U sequences is that they contain non-fly bacterial DNA from sequencing plasmids. See the following post for thoughts on this latent problem.
Uextra is aspecially problematic in notes to release it reads:
"we have not excluded scaffolds which may be redundant with euchromatic or other heterochromatic regions. Nor can we exclude the possibility of contaminations from other organisms.
We are making this data available as a resource for analysis of region which cannot be assembled well, such as satelites or simple repeats.
Since some of this data is low quality, researchers are encouraged to contact either BDGP or DHGP for further details on this resource."