Hello all,
I have a large number of protein made up of 1000 to 2000 amino acids. Now in these I want to extract only a small region for example '150-300' from each protein. How can I do this through terminal? I need to extract different region from different protein what I have to do.
Well do you know which sequences are involved in your processing and which regions you want to splice out? If yes then:
NOTE: code not tested!!
the idea is to locate a sequence:
$1 =~ /my_seq_id/
and then extract a region starting at position 0 of length 100 (print splice @a, 0,100;
) Please note that the above line of code can be extended to extract the same position within any sequence sharing some regex in its header.Thank you sir, but if I need to extract different region from different protein what I have to do?
Repeat the procedure:
Extract 122-222 from sequence xxcx:
Extract 22-522 from sequence yyy:
Write a script like python or perl and run it from terminal, which will do your job!