FASTA from PDB (download only PDB+FASTA with full structure)
0
0
Entering edit mode
8.3 years ago
Behoston • 0

Does FASTA files contains full sequence or sequence only for resolved regions when I download it from PDB? May be better idea is to download FASTA from UniProt (I want to check if it's full protein structure)?

Generally I'm trying to download some 100% resolved structures from PDB (only protein with resolution grater than x, longer than y and identity cut off z). Now I'm querying PDB using this XML:

        <orgPdbCompositeQuery version="1.0">
        <queryRefinement>
            <queryRefinementLevel>0</queryRefinementLevel>
            <orgPdbQuery>
                <version>head</version>
                <queryType>org.pdb.query.simple.ResolutionQuery</queryType>
                <description>Resolution is x or less</description>
                <refine.ls_d_res_high.comparator>between</refine.ls_d_res_high.comparator>
                <refine.ls_d_res_high.max>%d</refine.ls_d_res_high.max>
            </orgPdbQuery>
        </queryRefinement>
        <queryRefinement>
            <queryRefinementLevel>1</queryRefinementLevel>
            <conjunctionType>and</conjunctionType>
            <orgPdbQuery>
                <version>head</version>
                <queryType>org.pdb.query.simple.SequenceLengthQuery</queryType>
                <description>Sequence Length is x and more</description>
                <v_sequence.chainLength.min>%d</v_sequence.chainLength.min>
            </orgPdbQuery>
        </queryRefinement>
        <queryRefinement>
            <queryRefinementLevel>2</queryRefinementLevel>
            <conjunctionType>and</conjunctionType>
            <orgPdbQuery>
                <version>head</version>
                <queryType>org.pdb.query.simple.ChainTypeQuery</queryType>
                <description>Chain Type: there is a Protein chain but not any DNA or RNA or Hybrid</description>
                <containsProtein>Y</containsProtein>
                <containsDna>N</containsDna>
                <containsRna>N</containsRna>
                <containsHybrid>N</containsHybrid>
            </orgPdbQuery>
        </queryRefinement>
        <queryRefinement>
            <queryRefinementLevel>3</queryRefinementLevel>
            <conjunctionType>and</conjunctionType>
            <orgPdbQuery>
                <version>head</version>
                <queryType>org.pdb.query.simple.HomologueEntityReductionQuery</queryType>
                <description>Representative Structures at x Sequence Identity</description>
                <identityCutoff>%d</identityCutoff>
            </orgPdbQuery>
        </queryRefinement>
    </orgPdbCompositeQuery>

Then, download PDB and FASTA and compare sequence length. It works probably fine (my Python script log some proteins differ length in FASTA vs PDB) but I find 5BU8 chain A. PDB says that there is two unique chains, but with same Uniprot ID and different length - FASTA file from PDB has 199 for chain A and 233 for chain B.

I really don't know what I should do now...

fasta pdb uniprot • 4.1k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
0
Entering edit mode

If you open your pdb file with any text editor, you will find there is "REMARK 465" section where you will get information about missing residues of your pdb chain and that will help you to understand why that region is not visible in 3D structure. For more information, you have to read that Article.

ADD REPLY
0
Entering edit mode

Ok, but for 5BU8 chain A there is no REMARK 465

ADD REPLY
0
Entering edit mode

Take a look at Protein Feature View to understand better how UniProt and PDB data are related.

http://www.rcsb.org/pdb/protein/Q9AYZ3?addPDB=5BU8

both chains are missing the first 56 residues in the ATOM section. Besides this, the SEQRES records are of different length for the two chains.

You could also take a look at the 'Wild Type Protein' search to identify PDB entries that cover a certain % of a UniProt sequence.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6