Could you please explain where I made a mistake analyzing ChiP-seq data.
I took fastq file from modencode (set 2639), then mapped it to dm3 genome (bowtie2). Then I took .sam file from its output and called peaks (MACS2).
Final .bed file (with narrow peaks) is completely different from gff3 file from modencode:
Number of peaks differs about 2 times.
So, if I have only raw data from modencode, should I use their guidelines to analyze it? I know my question seems stupid, sorry for that, I'm a newbie in bioinformatics.
It's not stupid, you can use any method for analysing, if you know what you are doing. But using different methods to get the exactly same result is difficult. Mapping might be still fine (using different mapping algorithms) but peak callers can generate huge variability, depending on parameters and use of controls.
Ok, I understood you. To be clear: my goal is to define which genes are regulated by the TF. I'm trying to determine binding sites for TF. But if results of peak calling are completely different I suppose that the final results of analysis will differ. Am I right or not?
And could you please advise how to determine which parameters I should use in MACS2 (mfold, q-value, p-values etc.) and in bowtie2 (alignment options, input options etc.)? Maybe any manual exists or learning videos. I tried using only default parameters earlier.