PERL: how to print specific length of residues before and after a motif?
2
0
Entering edit mode
8.4 years ago
riaasuddin • 0

Hello, I am looking for printing a specific length of amino acids before and after a motif. For example, See below sequence where the bold is motif. I need to print 83 residues before it and 254 residues after it.

HTRVTGGSAAHATNTFTSFFSQGAKQGVQLVNTNGSWHVNRTALNCNASLETGWVAGLFYYHKFNSSGCP ERMASCRPLADFDQGWGPISYANGSGPEHRPYCWHYPPKPCGIVPAQTVCGPVYCFTPSPVVVGTTDKFG VPTYNWGENETDVPVLNNTRPPLGNWFGCTWMNSSGYTKVCGAPPCVIGGVGNNTLHCPTDCFRKHPEAT YSRCGSGPWITPRCLVDYPYRLWHYPCTINYTLFKVRMYVGGVEHRLEAACNWTRGERCDPDDRDRSELS PLLLSTTQWQVLPCSFTTLPALTTGLIHLHQNIVDVQYLYGMGSSIVSWAIKWEYVILLFLLLADARICS CLWMMLLI

Thanks.

Reaz

RNA-Seq sequencing • 2.3k views
ADD COMMENT
0
Entering edit mode

Do you have a file containing the motifs or something like that?

ADD REPLY
0
Entering edit mode

Yes I do have a file. How can I share?

ADD REPLY
0
Entering edit mode

I see you want to use perl, in which case I can't help you. But as a follow up question to get your problem a bit clearer, it's always the same motif, and the sequences are in fasta format?

In Python (which I can help with) you could do things like (pseudocode) seqstring.index('QGWGP') to get the position of your motif in the sequence fragment, which then (knowing the total length) would be pretty straightforward to calculate the length before and after.

But very very likely perl will have an alternative (or multiple alternatives) to do the same thing.

ADD REPLY
0
Entering edit mode

Thanks I am working on other suggestions.

ADD REPLY
1
Entering edit mode
8.4 years ago
Prasad ★ 1.6k

if you have a motif positions, create bed file with positions, motif_start-83 and motif_stop+254. Den use fastaFromBed

ADD COMMENT
0
Entering edit mode

Thanks but motif position never remained same. Is there any solution in PERL?

ADD REPLY
0
Entering edit mode

What are your input file. R u looking for one motif in multi fasta file or multiple motifs u r checking in multi fasta file??

ADD REPLY
0
Entering edit mode

I have multifasta file.

ADD REPLY
1
Entering edit mode
8.4 years ago
Naren ▴ 1000

You can use this code:

while (<DATA>){
if ($_=~/(.{5})(QGWGP)(.{5})/i){
print "Before:$1\nAfter: $3\n";
}
}
__DATA__
SSGCPERMASCRPLADFDAAAAAQGWGPBBBBBISYANGSGPEHRPY

You can set motif as well as {number of characters you want before and after} in this code.

ADD COMMENT
0
Entering edit mode

He needs the number of characters before and after the motif I understand correctly.

ADD REPLY
0
Entering edit mode

No, he needs those residues. 83 residues before and 254 residues after motif.

ADD REPLY
0
Entering edit mode

Oh yeah, excuse me, you're right.

ADD REPLY
0
Entering edit mode

Yes Nari is right. I need residues not the number. Nari I am going to try your suggestion and let you know after. Thanks.

ADD REPLY
0
Entering edit mode

Hello Nari, I tried and it is working fine. I am just wondering now how can I implement it on a multifasta file? I appreciate for your help.

ADD REPLY
3
Entering edit mode

Just add to Nari's code to handle multifasta

open NN, fasta_file; $/=">"; foreach(<NN>) {chomp;  next unless (my ($q_header, $q_sequence)= /(.*?)\n(.*)/s);## $q_sequence=~s/[\d\s>]//g;## if ($q_sequence=~/(.{5})(QGWGP)(.{5})/i){print "$q_header\tBefore:$1\tAfter: $3\n";}}close NN;
ADD REPLY

Login before adding your answer.

Traffic: 2348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6