I have tried to run ZINBA using hg19 mappability files but I get a segfault pretty early on. Thus, I am trying to generate these mappability files: I get the feeling that these files are unique to whatever you used as reference (so hg19 with chr1-22+x,y mappability files would not be interchangeable with chr1-22+x,y,M)
I was wondering if someone could shed some insight into what these mappability files are and if they are interchangeable (because if they are then I would only have to generate M+supercontigs for 1000 genome's version of hg19 and use chr1-22 from the available hg19).
The ZINBA webpage indicates that "these files were generated using code from Peakseq , developed by the Gerstein Lab", so the answer to your last question is yes, you should be able to use this code and generate your own mappability files. Alternatively, as suggested by Istvan, ask directly to the developers...
@nico - i could not find any links from the peakseq website that directly links to the program code I pasted above so I was unsure. thoes files are stored in a different directory than the main peakseq files
Can anyone please answer this " if someone could shed some insight into what these mappability files are and if they are interchangeable" and what are the advantages of using these files.
I've generated mappability files for ZINBA with the code you linked to. You just need to follow the directions in the README. Split the human reference fasta into separate chromosomes with BioPython:
from Bio import SeqIO
seqs = SeqIO.parse(open(fasta_file), format='fasta')forseqin seqs:
name = seq.id+'.fasta'
SeqIO.write(seq, open(name, 'w'), format='fasta')
Then make the PeakSeq code and run it for your needed K:
#download code and make mkdir Mappability_Map
curl http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/Mappability_Map.tar.gz
tar -zxvf Mappability_Map.tar.gz -C Mappability_Map
make#download reference genomecurl http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/analysisSet/hg38.analysisSet.chroms.tar.gz
mkdir hg38.analysisSet.chroms
tar -zxvf hg38.analysisSet.chroms.tar.gz -C hg38.analysisSet.chroms
#Move chromosomes 1-22,X,M to the same directory of the python scriptscd hg38.analysisSet.chroms
chr=($(seq 1 1 22) X Y M)for i in${chr[*]};domv"chr"$i".fa"../Mappability_Map ;done#chmod everything to make sure it runscd../Mappability_Map
chmod 777 compile.py
chmod 777 *.c
export PATH=/<mydirectory>/Mappability_Map/:$PATH#generete mappability files for 36mer length (replace by your desired number)python compile.py 36
This is a 4+ year old thread. It is possible that you are not going to get an answer for this question.
As for ZINBA, it is probably no longer maintained (students graduate and move on etc). You may want to switch to MACS2, which is what most people probably use for peak calling.
I would contact the ZINBA developers, the project seems recent so it you will be likely able to reach those that can help you.
The ZINBA webpage indicates that "these files were generated using code from Peakseq , developed by the Gerstein Lab", so the answer to your last question is yes, you should be able to use this code and generate your own mappability files. Alternatively, as suggested by Istvan, ask directly to the developers...
@nico - i could not find any links from the peakseq website that directly links to the program code I pasted above so I was unsure. thoes files are stored in a different directory than the main peakseq files
Can anyone please answer this " if someone could shed some insight into what these mappability files are and if they are interchangeable" and what are the advantages of using these files.
Ah! This question is little bit old, should I post a new one?
to get started, its probably best to read the peakseq paper and website: https://sites.google.com/a/brown.edu/bioinformatics-in-biomed/peakseq-for-chip-seq