Wondering if anything has been able to successfully run this TF foot printing package outlined in the following paper?
http://www.cell.com/cell-reports/fulltext/S2211-1247(17)30609-5
Wondering if anything has been able to successfully run this TF foot printing package outlined in the following paper?
http://www.cell.com/cell-reports/fulltext/S2211-1247(17)30609-5
EDIT: I called victory too soon... It took me two full days to figure out why this package would not run on our data. It was the first time I've ever had to really use the debugging functions in RStudio (thanks, Hadley!). After extensive tinkering and innumerous tracebacks, I found that the problem was one the input files generated out of of macs_output2csv.R
had a duplicated column. This fixed the package on my end, but I have a lab mate that is trying to use it and currently running into a different issue.
Bottom line: make absolutely sure that your input matches exactly the example files down to the minimum details. Pay specific attention to the .meme files that you want use as input. They must have all the fields in the example files (i.e. alength=
, w=
, nsites=
, and E=
) down to the space between the equal sign and the value (this will change depending on your MEME version, so be aware).
I managed to get it working in here (v0.9.6). You need to run the steps in bagfoot_prep_example.R
using your own BAM files and then use the output in bagfoot_run_example.R
, changing the necessary file paths, of course.
The main issue I found is that parallel processing in the run()
function used in the run script is broken for unknown reasons, so you have to specify mc.cores=1
. Other than that, the necessary scripts seem to working as expected, without having to dig too much into the code.
On a separate note, if you only want to do TF binding predictions on a single sample, you might want to use some other software. My understanding is that BaGFoot is made to provide a framework for comparing footprints between two (or more?) datasets. Gordon Hager's lab already have a footprinting method called DNase2TF, which was written by the same guy (Songjoon Baek). Part of my thesis work is comparing TF binding prediction methods, and so far CENTIPEDE is very solid compared to what's out there - that would be my recommendation. We used it for a couple of recent papers (see Varshney et al, 2017 PNAS, Tomonori & Albanus et al 2018 Sci Reports).
Hope this helps.
Cheers
It looks as though the author posted meme files on sourceforge, but you can also do something like the following.
I used a motifDB object that contains the homer motifs. This is available in the motifbreakR package. Then, I export motifs as meme format, using motifDB.
library(motifbreakR)
data(homer)
meme.text = MotifDb::export(homer, "homer.meme", format= 'meme')
Then for bagfoot:
motifDBhomer_bagfoot = MotifDB(motiflistfile = "homer.meme")
# use motifDB=motifDBhomer_bagfoot as argument in GFoot function call.
Good luck!!!
Hi guys,
thank you very much for the meme motif solution, Ricardo! I am very close to run bagfoot... there is still on step I can not get through: how to merge the 2 Hexamer bias frequency tables (that I already produced)? Do I have to calculate a mean of "correctionFactor"? Add up some numbers? I am completly newbie to ATACseq. Thanks a lot in advance,
Mathilde
[UPDATE] In case someone is interested, I think I found the answer: I just add the observed cuts in the 2 files (2nd column), and I recalculated the different ratio (found in bagfoot.R code)
Hi Mathilde,
Could you please provide more details on how to concatenate the two Hexamer tables? I'm still confused. What's more, do you find any difference when generating the hexamer bias frequency table with or without mappability? Thanks a lot in advance!
Zihan
Hi Zihan,
I generated the hexamer tables using the mappability only. The output of bagfoot is nice and makes sense with the ATACseq data I used. Regarding the merging of the tables, basically for each hexamer you have to add up the observed cuts (2nd column) and then recalculate the ratio. Be careful to not use the hexamer "other" when you calculate the total of observed cuts and the total of genomic positioin count. Here a peace of my code to merge the Hexamer tables (in Java), let me know if you have question
double genPosSum=0;
double cutSum=0;
for (String hexamer: hexamerToGenomicPosCount.keySet()){
if (!hexamer.contains("other")) {
cutSum += hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
genPosSum +=hexamerToGenomicPosCount.get(hexamer);
}
}
BufferedWriter out = new BufferedWriter(new FileWriter(outFile));
out.write(header+Consts.ls);
for (String hexamer: hexamerToGenomicPosCount.keySet()){
out.write(hexamer);
int sum =hexamerToObCuts1.get(hexamer) + hexamerToObCuts2.get(hexamer);
out.write(" "+sum+" "+hexamerToGenomicPosCount.get(hexamer));
double ObCutRatio = (double) sum / cutSum;
double GPRatio = (double)hexamerToGenomicPosCount.get(hexamer)/genPosSum;
double corrFactor=GPRatio/ObCutRatio;
if (!hexamer.contains("other")) {
out.write(" " + String.format("%.12f",ObCutRatio) + " " + String.format("%.12f",GPRatio) + " " + String.format("%.12f",corrFactor) + Consts.ls);
}else{
out.write(" NA NA 1"+Consts.ls);
}
}
out.close();
best, Mathilde
Hi Mathilde,
Thanks a lot for the quick reply and code. One more question, I'm not sure where to download the mappability file that ends with .out format as mentioned in the bagfoot_prep_example.R. The only website that I found is this: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/M.musculus/, but it comes from a different package called PeakSeq. Should I generate the .out files by myself? If so, how to do that? Thank you so much!
Zihan.
Hi Zihan,
yes you have to make them, it took me one entire week on my server... I did hg19. Here some tips to make them:
MAPPABILITY files: - download all .fa files for each chr (hg19 genome): http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/ - download mappability scripts: http://archive.gersteinlab.org/proj/PeakSeq/Mappability_Map/Code/ -> Make the three executables chr2hash, oligoFindPLFFile, mergeOligoCounts, using the Makefile -> In the directory containing the fa files, do: python compile.py 35 (35 for 35mer) * VERY LONG: 1 WEEK! **
I have one question for you, now I would like to use a more recent motif database. I got the meme files, but I dont know how to get/generate the related meme_index.file. Would you know by chance or did you use only the meme files provided by Bagfoot? Thanks! Mathilde
Hi Mathilde,
Thanks for the tips, very useful. For the meme files, I download it from JASPAR: http://jaspar.genereg.net/search?q=Mus+musculus&collection=CORE&tax_group=vertebrates&tax_id=all&type=all&class=all&family=all&version=all This is the example for mouse, you can do the same for human genome. At the bottom left of this page, you can see a csv button, I download this csv file as my index file. I guess you just need to change the row names of this csv file as the meme_index files that the author provided. Hope this helps.
Zihan
Thanks for reply,Mathilde,
I exported meme file from MotifDb of JASPAR2018 and I downloaded index file from JASPAR website, the motif number and rank is a little different and I failed some of the motifs (Actually I indeed got some Fimo results) when runFIMOandMakeMotifList.R. So, I want to ask whether you re-ranked your index file to coordinate with your meme file? Thank you!
First of all many thanks to Mathilde, Zihan and other contributors to this really helpful discussion, which I essentially used as a manual for using BaGFoot.
I have a follow-up question. Did anyone figure out how to calculate p- and q-values for each TF as the authors did in Fig 3D (https://ars.els-cdn.com/content/image/1-s2.0-S2211124717306095-gr3.jpg) and other figures? I see a text on that in the supplementary material, but the complex statistics that they have used is hard for me to understand. I wonder if a piece of code for calculating p- and q-values and plotting them on Bagplot already exists and if anyone is willing to share that!
I would highly appreciate any help. Thank you.
Best wishes, Ravi
Hi Ravi,
I have sucessfully used this package before and I may give you some suggestions.
The default result of bagfoot is without p-value / q-value, however, the code for statistical analysis exists in original bagfoot code. Just use gen_bagplot_chisq()
instead of gen_bagplot()
.......
I spent about a week (off and on) and gave up. A colleague of mine has apparently managed to get usable output; after tweaking the R code. I don't like saying bad things about peoples efforts, but the documentation was atrocious.
EDITED (March 2019):
Possible alternative: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6099720/
Not sure how well it compares, but I found http://pythonhosted.org/pyDNase/ to be very straightforward to run.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Ricardo,
Did you publish the comparison of TF footprinting methods, particularly related to CENTIPEDE's performance? Or do you happen to know of any other published work on this?
Thank you! Duygu
I've managed part of the way using my own data, but I'm stuck on generating the hexamer table for the hotspot regions. The relevant part of the demo code is:
Moreover, I don't see anything in the package that would generate this file: "example_data/Hexamer_fed_fasted_merged_withMap.csv".
Perhaps this is just concatenating the hexamer files, but this would not produce as csv file.
It's a nice paper, and so close to being a useful too...
As far as other tools, I've found HINT to be useful, and it does bias correction for ATAC-seq data. The package authors are very responsive as well.
Hi rbronste,
I have managed it to work recently after painful 5 days on debugging, I noticed this post is 15 months ago, So, if you still need some help, please email me at
hughiez047@gmail.com
and I will provide you my detailed records :-)Hope this helps, best wishes!