A genomic position of a DNA sequence fragment on a chromosome is given and the orientation is specified as reverse strand
For example: GGCACGCATCAGCAGCTGCTGGCACAGAAA ; genomic position: 87133609 ; Orientation (-). Now, I have obtained a text file containing chromosome sequence and it is in forward direction and if I have to verify the position of the sequence fragment. I will have to search for the mirror repeat of the sequence. Am I right ?
Thanks in advance for the help
First of all thanks for your reply. I am totally new to this field and working on a project, a part of which is to look at some DNA fragments that align with Human hg19 in this example it aligns with Chr 7 (as you pointed out). My boss has given me various textiles one of which is "chr7.fa" and it contains genomic sequence in forward direct (I assume). Also, I have been given few fragments in EXCEL file that has orientation written as either positive or negative and the genomic positions corresponding to it. If I do a search, I am able to find all the fragments which are recorded with a positive sign (that is I guess refers to forward direction) in the given genomic sequence. However, I am not able to find any fragment that is marked in EXCEL with a negative sign since I assume it must be on reverse strand. So, I have to find reverse complement of the fragment to locate the position of fragment in the genomic sequence; if the EXCEL file marks the fragment as being in the negative orientation. To clarify on your question I am not informed about the Dataset where the files are coming from. Also, I dont understand what is one based coordinates are. So my basic question was if the orientation is recorded as positive for a fragment then I can directly search the same fragment in the chromosome file (for example Chr7.fa) but if the recorded orientation is negative, I will have to look for reverse complement of the fragment? Thank you
Seems like it. The easiest is of course the use a mapping program, that searches on both strands automatically without you having to do anything. Convert the Excel to fasta and use blat. Or try galaxy.
Thanks a lot for the reply. I am writing a program in JAVA that takes this fragment as an input and aligns with the genomic sequence. However, if I use BLAT I will have to individually submit fragment each time and they are loads for example 20,000 in number. I dont know if there is any procedure where I can submit say 1000 fragments recorded in a text file or EXCEL file to BLAT where each fragment may align with different chromosome and I can get an output separately from BLAT. If that works then I can submit it 20 times and it will work still quicker I guess rather than writing a program. I am really thankful for all your help
then use galaxy: reformat to fasta, then align to hg19 with bowtie. Or do it on the command line.
Sorry for the trouble, but can you please elaborate on your comment and can I get help or any tutorial on how to perform the same it will be of great help. Thanks for the reply