|
Retrieving individual structures (MMDB)
|
|
|
One of the biggest questions facing new users as they begin using Cn3D is how and where to get the type of data that the program reads. Cn3D intentionally does not read PDB-format files directly, but instead uses NCBI's MMDB database. Briefly, MMDB takes data from the Protein Data Bank, parses each PDB file in order to perform extensive validation and error correction, and stores the information in a more computer-friendly format. See MMDB's homepage to learn more about the motivation for this project, and for documentation on the options and links in MMDB's structure summary pages.
There are a variety of ways to retrieve structures from MMDB, both directly and through links in NCBI's Entrez service. This page describes some of the typical methods a molecular biologist might use to find protein structures via different types of queries. We will use as an example the human PTEN protein (Lee et al., 1999).
|
|
|
From an Entrez literature search
|
|
|
Structure data is integrated into Entrez, so that literature searches can lead to MMDB structure summaries. The simplest case is to use Entrez to search structure annotations by keywords, authors, etc. For example, go to Entrez, select "Structure" in the "Search" pull-down menu, then type "PTEN" in the input box and hit "Go" to do the search. In this case, the result is trivial, as 1D5R is the only structure that comes up, with a link to the MMDB summary page.
Structure crosslinks also appear for literature searches. Select "PubMed" in the "Search" pull-down menu, and use "PTEN structure" as the query. Down in the resulting list of articles is the crystal structure paper: Lee et al., "Crystal structure of the PTEN tumor suppressor" (Lee et al., 1999). Note that there is a "Structure" crosslink on the right - following this link will lead again to the MMDB summary page for 1D5R.
From this summary page, select the "Launch Viewer" option, then click on the "View/Save Structure" button to download the data and launch Cn3D with the structure - assuming Cn3D is installed and configured properly as a helper application. You can also use the "Save File" option to save the downloaded data to disk, where you can load it into Cn3D manually using the File:Open dialog. Two windows should appear: the main Cn3D structure window where the protein is displayed, and a sequence window that shows the protein chain's amino acid sequence. These should look very much like:
If this were a multi-chain protein, there would be several sequences shown simultaneously in the sequence window.
|
|
|
From an Entrez sequence neighbor
|
|
|
Suppose we didn't already know about the crystal structure, and were studying diseases linked to PTEN mutations: for example, follow this link to an article on Cowden disease: Liaw et al., 1997. Click on the "Protein" crosslink above and to the right of the abstract, then follow the "O00633" link to the GenPept summary of PTEN. On this page is a great deal of information on PTEN literature and known mutations.
In this case there is no direct link to structure from here, because the protein used for the crystal structure is not quite the same as the natural protein that this GenPept report describes. However, one can look for a sequence with known structure in a list of precomputed GenBank sequence neighbors to this protein. From the GenPept summary, click on "BLink" in the upper right. Hit the "3D Structures" button on the top of the resulting page, and see that the only known related structure is indeed 1D5R, chain A. You can see the alignment in Cn3D if you click on the little blue dot in the result line.
|
|
|
From a BLAST search
|
|
|
We can look for a structure based directly on the PTEN sequence by doing a BLAST search against the PDB. The advantage here is that one can use any sequence as the query, even a new or proprietary sequence that has not been deposited in GenBank. Importantly, one can also examine the BLAST alignment and scores to judge the degree of sequence homology between query and structure.
There are many ways to do this search, but for this example let's start with NCBI's BLAST service. First note in the GenPept summary above that the accession code for PTEN's amino acid sequence is "O00633" (the first character is a capital letter O, the second two are zeros). Go to the BLAST query page, and follow the link to "Standard protein-protein BLAST [blastp]". Then select "pdb" in the "Database" menu (to search against known structures), and type "O00633" in the input box, and finally hit "BLAST!" to start the search. The query will be sent to the BLAST queue, where after a specified time interval depending on computer load, the results will be available by clicking the "Format!" button. Just below the graphical summary of hits is a list of sequences found. The one at the top with the best score and E-value, at least at the time this document was written, is the now-familiar PDB entry 1D5R. Following the "pdb|1D5R|A" link leads to the GenPept summary for this structure, from which the "Structure" link on the right leads ultimately to the MMDB summary for 1D5R.
This is a trivial example since the PDB structure found is of exactly the protein whose sequence was used as the BLAST query. But this is a very powerful general method for finding structures whose proteins are related to the query closely enough so that structural properties can be inferred by homology. See the Alignment chapter of this document to learn how to display an alignment of the query sequence with the protein sequence in Cn3D's sequence window.
|
|
|
From a known PDB identifier
|
|
|
If the PDB identifier for a protein is already known, then the most straightforward way to find a structure is directly from MMDB. Simply input the four-character PDB code in the input box, and hit "Go".
In the journal article on the crystallization and structure determination of PTEN (Lee et al., 1999), we find that the authors have deposited the structure data in the PDB with the identifier "1D5R". Putting this in the MMDB query box leads directly to the MMDB summary for 1D5R.
|
|