Question

Generating Mappability Files

3

Entering edit mode

13.7 years ago

Ying W ★ 4.3k

I have some chip-seq data and I want to do some peak calling. To do peak calling with ZINBA, peakseq, MOSAiCS I need to have mappability data.

This data is given for hg19 ( http://www.bios.unc.edu/~nrashid/map50_hg19.tgz from this page http://code.google.com/p/zinba/wiki/UsingZINBA) however, the chip-seq data I use thus far have been aligned to 1000 genome's version of hg19 ( ftp://ftp.sanger.ac.uk/pub/1000genomes/tk2/main_project_reference/) and I do not wish to realign all the data I have (previously I have been using MACS, I want to see results from other peak callers)

I have tried to run ZINBA using hg19 mappability files but I get a segfault pretty early on. Thus, I am trying to generate these mappability files: I get the feeling that these files are unique to whatever you used as reference (so hg19 with chr1-22+x,y mappability files would not be interchangeable with chr1-22+x,y,M)

I was wondering if someone could shed some insight into what these mappability files are and if they are interchangeable (because if they are then I would only have to generate M+supercontigs for 1000 genome's version of hg19 and use chr1-22 from the available hg19).

Additionally, I was wondering if this was the right program to generate these files: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/

Thanks.

chip-seq • 8.4k views

ADD COMMENT • link updated 8.4 years ago by inesdesantiago ▴ 270 • written 13.7 years ago by Ying W ★ 4.3k

3

Entering edit mode

I would contact the ZINBA developers, the project seems recent so it you will be likely able to reach those that can help you.

ADD REPLY • link 13.7 years ago by Istvan Albert 102k

3

Entering edit mode

The ZINBA webpage indicates that "these files were generated using code from Peakseq , developed by the Gerstein Lab", so the answer to your last question is yes, you should be able to use this code and generate your own mappability files. Alternatively, as suggested by Istvan, ask directly to the developers...

ADD REPLY • link 13.7 years ago by Nico ▴ 190

0

Entering edit mode

@nico - i could not find any links from the peakseq website that directly links to the program code I pasted above so I was unsure. thoes files are stored in a different directory than the main peakseq files

ADD REPLY • link 13.7 years ago by Ying W ★ 4.3k

0

Entering edit mode

Can anyone please answer this " if someone could shed some insight into what these mappability files are and if they are interchangeable" and what are the advantages of using these files.

ADD REPLY • link 13.4 years ago by Vikas Bansal ★ 2.4k

0

Entering edit mode

Ah! This question is little bit old, should I post a new one?

ADD REPLY • link 13.4 years ago by Vikas Bansal ★ 2.4k

0

Entering edit mode

to get started, its probably best to read the peakseq paper and website: https://sites.google.com/a/brown.edu/bioinformatics-in-biomed/peakseq-for-chip-seq

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 13.4 years ago by Ying W ★ 4.3k

Ram · Answer 1 · 2011-11-10

3

Entering edit mode

13.7 years ago

Aaron H ▴ 170

I've generated mappability files for ZINBA with the code you linked to. You just need to follow the directions in the README. Split the human reference fasta into separate chromosomes with BioPython:

from Bio import SeqIO
seqs = SeqIO.parse(open(fasta_file), format='fasta')
for seq in seqs:
   name = seq.id+'.fasta'
   SeqIO.write(seq, open(name, 'w'), format='fasta')

Then make the PeakSeq code and run it for your needed K:

$ python compile.py 50

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 13.7 years ago by Aaron H ▴ 170

0

Entering edit mode

where is the README?

ADD REPLY • link 13.7 years ago by Ying W ★ 4.3k

0

Entering edit mode

Do you mean the README in the Map code? I think it should be name = seq.id+'.fa'

ADD REPLY • link 13.7 years ago by Ying W ★ 4.3k

0

Entering edit mode

yes and yes. It takes quite a while to generate the mappability files.

ADD REPLY • link 13.7 years ago by Aaron H ▴ 170

0

Entering edit mode

Is ZINBA still maintained? It looks like the latest update was done two years ago...

ADD REPLY • link 9.1 years ago by boczniak767 ▴ 880

score 2 · Answer 2 · 2017-02-23

This is how I have done it:

#download code and make 
mkdir Mappability_Map
curl http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/Mappability_Map.tar.gz
tar -zxvf Mappability_Map.tar.gz -C Mappability_Map
make 

#download reference genome
curl http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/hg38.analysisSet.chroms.tar.gz
mkdir hg38.analysisSet.chroms
tar -zxvf hg38.analysisSet.chroms.tar.gz -C hg38.analysisSet.chroms

#Move chromosomes 1-22,X,M to the same directory of the python scripts
cd  hg38.analysisSet.chroms
chr=($(seq 1 1 22) X Y M)
for i in ${chr[*]}; do mv "chr"$i".fa" ../Mappability_Map ; done 

#chmod everything to make sure it runs
cd ../Mappability_Map
chmod 777 compile.py
chmod 777 *.c
export PATH=/<mydirectory>/Mappability_Map/:$PATH

#generete mappability files for 36mer length (replace by your desired number)
python compile.py 36

score 0 · Answer 3 · 2016-06-30

0

Entering edit mode

9.1 years ago

boczniak767 ▴ 880

What is "K", after the python compile.py?

ADD COMMENT • link 9.1 years ago by boczniak767 ▴ 880

0

Entering edit mode

This is a 4+ year old thread. It is possible that you are not going to get an answer for this question.

As for ZINBA, it is probably no longer maintained (students graduate and move on etc). You may want to switch to MACS2, which is what most people probably use for peak calling.