Question

How To Prepare Input Files For The Genetrack Peak Caller

0

Entering edit mode

10.8 years ago

kandoigaurav ▴ 150

I would like to use Genetrack for calling nucleosome positions and was wondering how can I prepare the input files for it starting from SRA sequence reads!

sra sam bam • 2.9k views

ADD COMMENT • link updated 10.8 years ago by Istvan Albert 101k • written 10.8 years ago by kandoigaurav ▴ 150

score 1 · Answer 1 · 2014-01-24

1

Entering edit mode

10.8 years ago

Istvan Albert 101k

The following steps are necessary:

align reads to a reference and produce BAM alignment files
transform the BAM file to BED format with say bedtools bamtobed or other methods
sort the BED file by coordinate sort -k1,1 -k2,2g -o out.bed in.bed

you can then load the resulting BED file into the genetrack command line tool.

ADD COMMENT • link 10.8 years ago by Istvan Albert 101k

0

Entering edit mode

Thank you Dr. Albert! This should be of immense help.

ADD REPLY • link 10.8 years ago by kandoigaurav ▴ 150

0

Entering edit mode

corrected the sort command as shown here: http://cassjohnston.wordpress.com/2011/05/10/unix-sort-bed-file/

ADD REPLY • link 10.8 years ago by Istvan Albert 101k

0

Entering edit mode

I was hoping to utilize an approach similar to that described in the paper, 'A compiled and systematic reference map of nucleosome positions across the Saccharomyces cerevisiae genome' to construct a compiled consensus map. To this end, I've generated nucleosome maps for few Drosophila datasets using GeneTrack.

However, I'm unable to understand the methodology used to generate a reference map using these predicted maps. I see that GeneTrack is used for defining a new consensus position, but I fail to realize how should I format my genetrack input file for the same?

ADD REPLY • link 10.8 years ago by kandoigaurav ▴ 150

0

Entering edit mode

there are two different unrelated steps of the process.

one is to define a positions based on the signal. This will produce a number of intervals over the genome. For this one uses a peak caller.
out of the peaks that one obtained in step 1 they need to refine them: keep some based on some conditions, label them based on relative positions of some other features like 1st, 2nd, 3rd etc, account for the presence or absence of other potentially overlapping features, etc. this second step is a data analysis problem, has little to do with a peak caller. It is basically an interval intersect problem with many facets.

There are very few tools to automate the 2nd step, one needs to implement their own methodology. The reason for this is that calling a peak is a reasonable objective task, but filtering and naming these peaks by various conditions etc is a lot more subjective and it is difficult to write code that is both sufficiently robust while being flexible and correct.

Adding to the problems is that it is probably impossible to publish a tool that only does this latter step, although I would agree that is more important than step 1. Alas the way science works is sometimes counterintuitive.

ADD REPLY • link 10.8 years ago by Istvan Albert 101k