I have created a position weight matrix based on transcription factor binding sites in the FANTOM 4.
In my code (R), I have trained my PWM with TFBS in chr1, chr2. Now, I want to use this PWM to scan chr3 - chr22 to analyze the accuracy of my PWM.
What is the best way to retrieve a "stitched" string of chr3 - chr22. (or even individual ones if a single string is too large).
I tried using the DAS server but it doesn't work without giving coordinates.(http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=chr2)
Doing my own homework, I see that both BioConductor and SeqinR package for R can do this. But I can't seem to figure out what the right workflow/code is to retrieve this information.
For what its worth, I do have hg18 downloaded as separate .fa files. I am fairly certain that there exists a function in SeqinR/BioConductor to read these fasta files. Is this the best way to do this?
What do you mean with 'stitched' string of chr3-chr22? Apparently, you want the genomic sequence of the chromosomes pasted together but I think you want something else.