Entering edit mode
9.6 years ago
kanwarjag
★
1.2k
What is easiest way to exclude/mask satellite sequences in bam file of ChIP-seq data?
Thanks
Kanwar
What is easiest way to exclude/mask satellite sequences in bam file of ChIP-seq data?
Thanks
Kanwar
You can use "bedtools maskfasta". You can get the regions from the UCSC table browser.
I use subtractBed
. This helps me retain the bed file architecture with my reads and I can insert it anytime/anywhere in my pipeline after mapping. maskfasta
give a fasta output.
Assuming hg19
, grab repeat data, if needed:
$ mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg19 -e 'select genoName, genoStart, genoEnd, repName, swScore, strand from rmsk' | tail -n +2 > repeats.bed $ head -3 repeats.bed chr1 10000 10468 (CCCTAA)n 1504 + chr1 10468 11447 TAR1 3612 - chr1 11503 11675 L1MC 437 -
Then perform set operations with BEDOPS:
$ bedops -n 1 <(bam2bed < reads.bam) repeats.bed > reads_that_do_not_overlap_repeats.bed
Then convert the result to the desired end format.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.