Hi all,
I have a protein fasta file (protein.txt) like:
>a
mnspq
>b
rstuvw
>c
mnqa
Note that the length of a, b and c proteins are 5,6 and 4 respectively (total length = 15)
now I have extracted some ranges (calculation is based on total length) and save it (file1.txt) as:
2-3
4-10
11-14
The length of each protein (within the total length) as seen in protein file is saved in another file (file2.txt) as:
a 1-5
b 6-11
c 12-15
Now from file1 values, I want to modify the file2 values and try to calculate individual range for each protein sequence, For the above input, the output will be:
a 2-3,4-5
b 1-5, 6
c 2-5
In other words, if I first concatenate my all sequences and derermine some ranges from the concatenated file, how can I find individual range of locations in each protein sequence
Thanks for your consideration.
Well, just write a script.
As long as you have unique headers in your multi-fasta file
samtools faidx region
should do the extraction part. See this: Extract User Defined Region From An Fasta File @Matt Shirley also has a python basedpyfaidx
solution.I am not exactly certain what you are trying to do in the subsequent steps.
Edit: Re-reading your original post I am not sure this is what you need. But I will leave this here for now to see if it helps.