I am a software engineer with limited knowledge of biology. I have been asked by a scientist to process assay sequences stored in a database through In Silicio PCR. Currently this is being done manually via: http://genome.ucsc.edu/cgi-bin/hgPcr?command=start
I am working with Ruby and found a gem which seems to do the job: https://github.com/wwood/bioruby-ipcress
The part I am confused about is the program requires a "template sequence, specified in as a FASTA file". Should I be using the SNP masked files from http://hgdownload.cse.ucsc.edu/goldenPath/hg19/snp138Mask/ or similar? How would I know which file to use.
These are the files in the directory:
chr1.subst.fa.gz 01-Aug-2013 12:03 75M
chr1_gl000191_random.subst.fa.gz 01-Aug-2013 12:05 33K
chr1_gl000192_random.subst.fa.gz 01-Aug-2013 12:05 178K
chr2.subst.fa.gz 01-Aug-2013 12:06 80M
chr3.subst.fa.gz 01-Aug-2013 12:07 66M
chr4.subst.fa.gz 01-Aug-2013 12:07 63M
chr4_ctg9_hap1.subst.fa.gz 01-Aug-2013 12:07 199K
chr4_gl000193_random.subst.fa.gz 01-Aug-2013 12:07 61K
chr4_gl000194_random.subst.fa.gz 01-Aug-2013 12:07 65K
chr5.subst.fa.gz 01-Aug-2013 12:07 60M
chr6.subst.fa.gz 01-Aug-2013 12:07 56M
chr6_apd_hap1.subst.fa.gz 01-Aug-2013 12:07 819K
chr6_cox_hap2.subst.fa.gz 01-Aug-2013 12:07 1.6M
chr6_dbb_hap3.subst.fa.gz 01-Aug-2013 12:07 1.4M
chr6_mann_hap4.fa 01-Aug-2013 12:07 4.6M
chr6_mcf_hap5.fa 01-Aug-2013 12:07 4.7M
chr6_qbl_hap6.fa 01-Aug-2013 12:08 4.5M
chr6_ssto_hap7.fa 01-Aug-2013 12:08 4.8M
chr7.fa 01-Aug-2013 12:08 155M
chr7_gl000195_random.fa 01-Aug-2013 12:08 182K
chr8.fa 01-Aug-2013 12:08 142M
chr8_gl000197_random.fa 01-Aug-2013 12:08 37K
chr9.fa 01-Aug-2013 12:08 137M
chr9_gl000198_random.fa 01-Aug-2013 12:08 90K
chr9_gl000199_random.fa 01-Aug-2013 12:08 169K
chr9_gl000200_random.fa 01-Aug-2013 12:08 186K
chr9_gl000201_random.fa 01-Aug-2013 12:08 36K
chr10.subst.fa.gz 01-Aug-2013 12:03 44M
chr11.subst.fa.gz 01-Aug-2013 12:03 44M
chr11_gl000202_random.subst.fa.gz 01-Aug-2013 12:03 13K
chr12.subst.fa.gz 01-Aug-2013 12:04 44M
chr13.subst.fa.gz 01-Aug-2013 12:04 32M
chr14.subst.fa.gz 01-Aug-2013 12:04 30M
chr15.subst.fa.gz 01-Aug-2013 12:04 27M
chr16.subst.fa.gz 01-Aug-2013 12:04 27M
chr17.subst.fa.gz 01-Aug-2013 12:04 26M
chr17_ctg5_hap1.subst.fa.gz 01-Aug-2013 12:04 516K
chr17_gl000203_random.subst.fa.gz 01-Aug-2013 12:04 13K
chr17_gl000204_random.subst.fa.gz 01-Aug-2013 12:04 26K
chr17_gl000205_random.subst.fa.gz 01-Aug-2013 12:04 58K
chr17_gl000206_random.subst.fa.gz 01-Aug-2013 12:05 13K
chr18.subst.fa.gz 01-Aug-2013 12:05 25M
chr18_gl000207_random.subst.fa.gz 01-Aug-2013 12:05 1.5K
chr19.subst.fa.gz 01-Aug-2013 12:05 18M
chr19_gl000208_random.subst.fa.gz 01-Aug-2013 12:05 24K
chr19_gl000209_random.subst.fa.gz 01-Aug-2013 12:05 47K
chr20.subst.fa.gz 01-Aug-2013 12:06 20M
chr21.subst.fa.gz 01-Aug-2013 12:06 12M
chr21_gl000210_random.subst.fa.gz 01-Aug-2013 12:06 9.0K
chr22.subst.fa.gz 01-Aug-2013 12:07 12M
chrM.fa 01-Aug-2013 12:09 17K
chrUn_gl000211.fa 01-Aug-2013 12:08 166K
chrUn_gl000212.fa 01-Aug-2013 12:08 186K
chrUn_gl000213.fa 01-Aug-2013 12:08 164K
chrUn_gl000214.fa 01-Aug-2013 12:08 137K
chrUn_gl000215.fa 01-Aug-2013 12:08 172K
chrUn_gl000216.fa 01-Aug-2013 12:08 172K
chrUn_gl000217.fa 01-Aug-2013 12:08 171K
chrUn_gl000218.fa 01-Aug-2013 12:08 161K
chrUn_gl000219.fa 01-Aug-2013 12:08 179K
chrUn_gl000220.fa 01-Aug-2013 12:08 161K
chrUn_gl000221.fa 01-Aug-2013 12:08 155K
chrUn_gl000222.fa 01-Aug-2013 12:08 186K
chrUn_gl000223.fa 01-Aug-2013 12:08 180K
chrUn_gl000224.fa 01-Aug-2013 12:08 179K
chrUn_gl000225.fa 01-Aug-2013 12:08 210K
chrUn_gl000226.fa 01-Aug-2013 12:08 15K
chrUn_gl000227.fa 01-Aug-2013 12:08 128K
chrUn_gl000228.fa 01-Aug-2013 12:08 129K
chrUn_gl000229.fa 01-Aug-2013 12:08 20K
chrUn_gl000230.fa 01-Aug-2013 12:08 44K
chrUn_gl000231.fa 01-Aug-2013 12:08 27K
chrUn_gl000232.fa 01-Aug-2013 12:08 41K
chrUn_gl000233.fa 01-Aug-2013 12:08 46K
chrUn_gl000234.fa 01-Aug-2013 12:08 40K
chrUn_gl000235.fa 01-Aug-2013 12:08 34K
chrUn_gl000236.fa 01-Aug-2013 12:08 42K
chrUn_gl000237.fa 01-Aug-2013 12:08 46K
chrUn_gl000238.fa 01-Aug-2013 12:08 40K
chrUn_gl000239.fa 01-Aug-2013 12:08 34K
chrUn_gl000240.fa 01-Aug-2013 12:08 42K
chrUn_gl000241.fa 01-Aug-2013 12:08 42K
chrUn_gl000243.fa 01-Aug-2013 12:08 43K
chrUn_gl000244.fa 01-Aug-2013 12:08 40K
chrUn_gl000245.fa 01-Aug-2013 12:08 37K
chrUn_gl000246.fa 01-Aug-2013 12:08 38K
chrUn_gl000247.fa 01-Aug-2013 12:09 36K
chrUn_gl000248.fa 01-Aug-2013 12:09 40K
chrX.fa 01-Aug-2013 12:09 151M
chrY.fa 01-Aug-2013 12:09 58M
If I am completely off-base or if there is an easier way please let me know. The command interface to this library will be easy to integrate into our existing code, so I would like to use it (assuming it is the correct tool).
Thank you
primer_set = Bio::Ipcress::PrimerSet.new(
'GGTCACTGCTA','GGCTACCTTGTTACGACTTAAC'
)
# Run ipcress on a template sequence, specified in as a FASTA file
results = Bio::Ipcress.run(
primer_set,
'Methanocella_conradii_16s.fa', #this file is in the test/data/Ipcress directory
{:min_distance => 2, :max_distance => 10000})
I can see there is a single file per chromosome, so I can use those, eg: chr19.subst.fa.gz for ch19, but what are the other files for eg: chr19_gl000208_random.subst.fa.gz, chr19_gl000209_random.subst.fa.gz