I am trying to find a region in a sequence . lets say I have this sequence
>sp|P14306|CPYI_YEAST Carboxypeptidase Y inhibitor OS=Saccharomyces cerevisiae (strain ATCC 204508 / S288c) OX=559292 GN=TFS1 PE=1 SV=2
MNQAIDFAQASIDSYKKHGILEDVIHDTSFQPSGILAVEYSSSAPVAMGNTLPTEKARSK
PQFQFTFNKQMQKSVPQANAYVPQDDDLFTLVMTDPDAPSKTDHKWSEFCHLVECDLKLL
NEATHETSGATEFFASEFNTKGSNTLIEYMGPAPPKGSGPHRYVFLLYKQPKGVDSSKFS
KIKDRPNWGYGTPATGVGKWAKENNLQLVASNFFYAETK
I want to find the amino acid number of NTLIEYMGPAPPKGSG
and make this region lower case
Or remove the amino acid of this 144 to 159 from the sequence
I made a better example
lets call this one df1.txt
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
I have a second text file like below lets call it df2
>sp|O15304|SIVA_HUMAN (87-93) Best selected with ratio of 10 , 12 and 14
IGPDGR
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU (135-168) Not selected with ratio of (12,14)
NKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWM
I want to merge them both in one file. So few keys are into merging them The output, I am trying to make is like below
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1 (135-168) Not selected with ratio of (12,14)
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLnkelflkinivgggliflvvsaysf
ffscicnderkalsysasltilffvldmvgklewmKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2 (87-93) Best selected with ratio of 10 , 12 and 14
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLigpdgrLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
So, first we look at the df1.txt, we find a name like |A0A1B1L9R9| we match the data in the df2.txt and it has the same name. The we find the region that is specific in df2 in parentheses like (87-93) and make it lower case
@Alex Reynolds I do appreciate your help very much. I am just trying to figure out how to solve it. it has taken so much time from me. I really did not mean to offend you by any mean. You Rock. I appreciate your help for every single word you say. I wish I was able to communicate with you through email. Do you think it is possible?
I followed your post from the start. First, you didn't describe your problem properly, and at each solution Alex Reynolds provided, you added more requirements. This is terrible, because it wastes both your and his time. Be thoughtful when asking questions, lay out the problem in detail from the start, and provide an example of the input data and intended output. A great resource is Tutorial: How To Ask Good Questions On Technical And Scientific Forums.
If you do, upvote his answers / comments, when useful. It is the best way to show appreciation. And his answer and comments were useful - had you asked a good question from the beginning, I bet he would solve your problem right away.
You should use shenwei's answer. I've never been a huge fan of frameworks to handle text files, but his kit is good stuff.
@Alex Reynolds ok thanks
You're very welcome!