Hi,
I have 200 sequences in fasta format (1000 bp) and a list of 10 patterns (20 bp) I try to replace 20 bp of each sequence by one pattern from the list (the idea is to insert a pattern in each sequence and the output sequences should be 1000 bp). the choice of the pattern to insert and the position should be do randomly ? can you help me please ?
Thank you in advance
UPDATED with new information:
Hello all, thank you very much for your reply. i have a system and i want to test this system to detect 20 patterns. so, i want to have 200 sequences (ADN)whith a same size (1000 bp). in each sequence, we have one pattern from the set in a randomly position. first, i have prepare 200 random sequences but my objective is to insert patterns in sequences.
i give you an exemple i have 200 sequences :
seq1 ATCGATCGAT........GCTGCATGCAT (1000 bp)
seq2 ATCGGTCGTAT........GCTGGATCGCAT (1000 bp)
:
:
:
seq199 ATCGCATGAT........GCTGCATGCATGT (1000 bp)
seq200 ATCGATCGAT........GCTGCATGCATACG (1000 bp)
and a set of 20 patterns
ATGTC
ATCGT
ATGCT
...
and i want to insert 1 patterns (randomly choice) in each sequence in a randomly position. for exemple
seq1 ATCGATCGAT.....ATGTC...GCTGCATGCAT (1000 bp)
seq2 ATCGGTCGTAT....ATCGT....GCTGGATCGCAT (1000 bp)
:
:
:
seq199 ATCGCATGAT.....ATGCT...GCTGCATGCATGT (1000 bp)
seq200 ATCGATCGAT....ATGTC....GCTGCATGCATACG (1000 bp)
We can reason otherwise, prepare 200 sequence (with 995 bp), and insert patterns randomly in sequences (without substituting) to obtain in tne end 200 sequences 1000bp (because patterns have the same size 5 bp, so 995+5=1000bp).
thank you
Your question contains insufficient information to get an answer. Including an example is helpful too!
Writing a script is the best and easiest way.
Substitution operation is not needed.
I post a way down there using seqkit and shell, it's dirty but works.