Entering edit mode
4.3 years ago
Peter
▴
20
Hello,
I have two files:
sequences.fasta
intervals.txt
I used the "seqinR" package to obtain the protein sequences of interest (~ 3500 proteins sequences) and saved them to "sequences.fasta":
> sp | O94827 | PKHG5_HUMAN
MDDQSPAEKKGLRCQNPACMDKGRAAKVCHHADCQQLHRRGPLNLCEACDSKFHSTMHYDGHVRFDLPPQG
SVLARNVSTRSCPPRTSPAVDLEEEEEESSVDGKGDRKSTGLKLSKKKARRRHTDDPSKECFTLKFDLNVDIETEIVPAMKKKSLGEVLLP
VFERKGIALGKVDIYLDQSNTPLSLTFEAYRFGGHYLRVKAPAKPGDEGKVEQGMKDSKSLSLPILRPAGTGPPAL
ERVDAQSRRESLDILAPGRRRKNMSEFLGEASIPGQEPPTPSSCSLPSGSSGSTNTGDSWKNRAASRFSGFFSS
GPSTSAFGREVDKMEQLEGKLHTYSLFGLPRLPRGLRF
> sp | P10515 | ODP2_HUMAN
MWRVCARRAQNVAPWAGLEARWTALQEVPGTPRVTSRSGPAPARRNSVTTGYGGVRALCGWTPSSGATPRNRLLLQLL
GSPGRRYYSLPPHQKVPLPSLSPTMQAGTIARWEKKEGDKINEGDLIAEVETDKATVGFESLEECYMAKILVAEGTRDVPIGA
IICITVGKPEDIEAFKNYTLDSSAAPTPQAAPAPTPAATASPPTPSAQ
My "intervals.txt" file has the protein ID, the starting and ending position of my peptide of interest:
ID start end
O94827 1 69
P10515 2 120
I would like to know if there is any way to obtain in a third .txt file the protein ID and the sequence that corresponds to that interval? I would appreciate it if someone helped me!
Thanks in advance!
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.