Hi Biostars,
I recently followed this question in order to obtain sequences for exon-capture probes sold in Agilent kits. Column headers plus one line of the file looks like this:
TargetID ProbeID Sequence Replication Strand Coordinates
mRNA|AL390972 A_36_B233385 <a string of 120 ACGTs> NA + chr1:100111836-100111955
I want to know which way the probes are aimed: will a given probe amplify regions towards lower genomic coordinates or higher? Since I couldn't find any sort of readme file, I need help figuring out where the 3' and 5' ends are and how that relates to the coordinates. For a simple example, suppose I were to convert this 'drawing' into a file of the sort that Agilent publishes.
coords: 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70
sequence: AAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGGGGAAAAGG
probe: 5-AGGGGAAAA-3 ==>
probe on reverse strand: <== 3-TTTCCCCTT-5
probe on forward strand: 5-AAAAGGGGA-3 ==>
direction of replication: ==> or <==
Should (a few columns of) the file look like this? This way, everything goes 5' to 3', and I've reverse-complemented the probe on the minus strand.
sequence coords strand
AGGGGAAAA chr22:12-20 +
AAAGGGGAA chr22:33-41 -
AAAAGGGGA chr22:49-57 +
Thank you for your help.
Generally the sequence itself is that of the probe, which will always be 5'->3' (as is the case for all nucleotide sequences). So just check if the strand matches that of the probe itself or what if detects accordingly. If you want to be absolutely sure then just ask Agilent.