I recently did a histone ChIPseq experiment. I study Drosophila and used Arabidopsis chromatin as spike-in. After trimming the raw reads and aligning with bwa (reads were 150 PE), for some of my input samples, I got that 60% of my reads aligned to Drosophila and 50% aligned to Arabidopsis. Do you guys know a way of extracting the reads that aligned uniquely to each of the genomes so I get rid of the reads that overlap?
You could make a combined genome where you prefix or suffix all the chromosome names with the species and put them into a single fasta file, then when you do the alignment reads which map to both species will be multi-mapped reads and can be removed with a quality filter.
Another method would be to extract the read names of the alignments from the bam file of the alignments for each species, then you can use some tools (such as the sort and uniq bash commands) to identify read names that are unique to one of the bam files.
I'm curious why you want to do this, though. You don't want to analyze regions of the genome that are conserved between the species in your downstream analysis?
no, the plant chromatin is used for signal normalization, it is supposed to account for IP technical variation between samples. I pretty much add same amount of plant chromatin to each one of my "real" Drosophila samples. Currently I am doing the genome combination and filtering, we'll see what happens,
Thanks!!
no, the plant chromatin is used for signal normalization, it is supposed to account for IP technical variation between samples. I pretty much add same amount of plant chromatin to each one of my "real" Drosophila samples. Currently I am doing the genome combination and filtering, we'll see what happens, Thanks!!