Hi all,
I'm very new to bio-informatics, but have many years of coding experience. I am studying un-referenced parts of the genome. I.e. satellite repeats which aren't included in reference genomes. I want to align some raw chip seq data to some specific set of sequences. Basically, I want to make my own reference genome that's based on a small set of sequences and use that to perform chip seq.
I want to create a small custom genome, rather than add to an existing genome, so that I can save computational time.
Can anyone give me some pointers of where to get started? Am I thinking about this the right way?
Any tips/info/thoughts would be greatly appreciated!
Agreeing with genomax here. Aligning to a subset of regions is always problematic because off-target effects, unspecific pulldown of regions and random DNA sequences that somehow found their way into the library could come from regions not included in the custom reference. The aligner will still try to find best matches in the given reference and this leads to false-positive alignments. Better do as suggested, add your custom sequences to a reference genome (just append them to the genome fasta file as separate sequences), make a new index and align against that.