Question

how to identify a 14-20 nt specific for chromosoe #1 which have the maximum recognation sites for Chr. #1

0

Entering edit mode

8.4 years ago

Dan • 0

How to identify a 14-20 nt specific for chromosoe #1 which have the maximum recognation sites for Chr. #1 .

The linux command would be preferable but any software suggestions and kind help really appreciated.

Best,

Dan

genome sequence blast • 1.3k views

ADD COMMENT • link 8.4 years ago by Dan • 0

0

Entering edit mode

Which genome? Doubt one can find a sequence like that in terms of the size specified, specificity (only for chr 1) and with maximum recognition sites.
Any additional information you can provide to clarify the request?

ADD REPLY • link 8.4 years ago by GenoMax 148k

0

Entering edit mode

Hi,

Thank you very much for your prompt reply. Chr#1 was just an example. I more interested in X and Y Chromosome to do FISH. right now the FISH probe that I am using and I received from someone who dose not want to reveal the sequence to me is not specific enough and has low intensity.
My intention is to find a/sets of 14-20 nt sequence with high X/Y chr. recognition site(s) to i) get specific signal and ii) higher intensity due to higher number of recognition sites (repetitions in X/Y Chr.)

ADD REPLY • link 8.4 years ago by Dan • 0

0

Entering edit mode

That is useful information. Someone else may need to help you further. Perhaps this site, and this software may help while you wait to hear from experts.

ADD REPLY • link 8.4 years ago by GenoMax 148k

1

Entering edit mode

Some guidelines for FISH probe design (from http://www.exiqon.com/custom-fish). These may be universally applicable.

Detection probes are typically 20-25 nucleotides in length. However, shorter or longer probes can also be used.
Avoid stretches of 3 or more Gs or Cs.
Avoid stretches of more than 4 LNA™ bases, except when very short (9-10 nt) oligonucleotides are designed.
Avoid LNA™ self-complementarity. LNA™ hybridizes very tightly to other LNA™ residues.
Keep the GC-content between 30-60 %.
A Tm of approximately 75 °C is recommended.
No LNA™ bases should be placed in palindromes (G-C base pairs are more critical than A-T base pairs).

ADD REPLY • link 8.4 years ago by GenoMax 148k

0

Entering edit mode

Thank you very much for your kind helps.

Best

ADD REPLY • link 8.4 years ago by Dan • 0

score 1 · Answer 1 · 2016-07-13

Break the chromosomes into k-mers of desired length (e.g. with jellyfish)
Combine non-target k-mers into one file
Sort target and non-target k-mer files
Use comm to find k-mers unique to target chromosome
From the unique k-mers select the one with the highest count

If you run out of memory, skip step 2 and instead do the comparisons one by one always using the remaining unique k-mers as the other input file for comm.

score 1 · Answer 2 · 2016-07-13

1

Entering edit mode

8.4 years ago

Pierre Lindenbaum 164k

brute force, not fully tested: scanning all the kmers in a fasta stream.

g++ -O3 -Wall biostars201509.cpp 
./a.out < human_g1k_v37.fasta
(...)
CTGGACATAACA

ADD COMMENT • link 8.4 years ago by Pierre Lindenbaum 164k

score 0 · Answer 3 · 2016-07-14

0

Entering edit mode

8.4 years ago

Dan • 0

Thank you so much Pierre Lindenbaum. You are awesome! Best regards,

Dan

ADD COMMENT • link 8.4 years ago by Dan • 0