MSA3D Help (Protein Explorer)

Guide to Using MSA3D
in Protein Explorer -- by Eric Martz, August 2000

Contents Snapshot

Tutorial for First-Time Users: Enolase.
Pasting in an alignment and correcting mismatches with "sliding".
Design characteristics of the MSA3D tool.
Preparing an alignment in Biology Workbench.

Tutorial for First-Time Users: Enolase
Thanks to Garry Duncan, Nebraska Wesleyan College, for providing the enolase alignment.

Please start by reading carefully the overview on the "MSA3D Procedure" page.
Now click on the link "MSA3D ALIGNMENT FORM" and carefully read the "MSA3D Form" page. You can skip the "Advanced Options" section at the bottom. If portions of this page are unclear, they will become clear as we proceed.
Now, in the "MSA3D Form" window, please find the Ready-Made Examples and click the link "Enolase". Accept the offer to fetch 4enl.pdb via Internet, or else you will need to load a local copy. Notice that clicking "Enolase" caused the relevant alignments to be pasted into both boxes.
Press the button "Color Alignment & Molecule". A new window will appear containing the "MSA3D Alignment Listing". Read carefully the explanation at the top, and see also the summary counts and percentages at the bottom of this window.
After you have scrutinized the Listing, click on the molecule to bring it to the foreground. Notice that the alignment colors have been applied to the molecule.
Click on the links Identical, Similar, Different to spacefill the residues in these categories. The catalytic site is marked by a bound sulfate ion, and deeper, a brown zinc ion. Notice that the entire area around the active site is "Identical" in an evolutionary span from Archebacteria through man!
Pasting in an alignment and correcting mismatches with "sliding".
Thanks to Gabe McCool, University of Massachusetts Amherst, for acquainting me with these molecules.

Instructions are given for making an alignment in Biologists Workbench in a later section below. Before you do that, the prepared alignment in this section will give you some useful experience in using the MSA3D feature. This is an alignment between chain B of tubulin, and the bacterial cell division protein Ftsz. These two proteins have less than 20% sequence similarity, but a high level of structural similarity. (For more information on these proteins, please see Exploring structure and function of FtsZ, a prokaryotic cell division protein and tubulin-homologue by Gabe J. McCool.) The alignment below was done by ClustalW in Biology Workbench, using default settings. Given the low level of sequence homology, the alignment may not be very meaningful, but it is useful to illustrate some features of MSA3D.

>1TUB_B; Tubulin from Sus scrofa, electron diffraction
MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQL--ERINVYYNEAAGNKYVPRAILVDLEP
GTMDSVRSGPFGQIFRPDNFVFGQSGAGNNWAKGHYTEGAELVDSVLDVVRKESESCDCLQGFQLTHSLG
GGTGSGMGTLLISKIREEYPDRIMNTFSVVPSPKVSDTVVEPYNATLSVHQLVENTDETYCIDNEALYDI
CFRTLKLTTPTYGDLNHLVSATMSGVTTCLRFPGQLNADLRKLAVNMVPFPRLHFFMPGFAPLTSRGSQQ
YRALTVPELTQQMFDAKNMMAACDPRHGRYLTVAAVFRGRMSMKEVDEQMLNVQNKNSSYFVEWIPNNVK
TAVCDIPPRGLKMSATFIGNSTAIQELFKRISEQFTAMFRRKAFLHWYTGEGMDEMEFTEAESNMNDLVS
EYQQYQD

>1FSZ; Methanococcus jannaschii
-------------SPEDKELLEYLQQTKAKITVVGCGGAGNNTI--TRLKMEG--------IEGAKTVAINT
DAQQLIRTKADKKILIGKKLTRG-LGAG-----GNPKIGEEAAKESAEEIKAAIQDSDMVF---ITCGLG
GGTGTGS-APVVAEISKKIG---ALTVAVVTLPFVMEGKVRMKNAMEGLERLKQHTDTLVVIPNEKLFEI
VPN--MPLKLAFKVADEVLINAVKGLVELITKDGLINVDFADVKAVMN---NGGLAMIGIG--ESDSEKR
AKEAVSMALNSPLLDVD-----IDGATGALIHVMGPED--LTLEEAREVVATVSSR--------------
----------LDPNATIIWG--------ATIDENLENTVRVLLVITGVQSR----IEFTDTGLKRKKL--
-------

Load 1fsz.pdb.
In the "MSA3D Form" window, press the "Clear Form" button and OK the confirmation. Block and paste the alignment above into the top "Alignment Box" on the MSA3D Form window. (Don't worry about the spaces at the beginning of each line -- spaces will be ignored.)
Block the 1FSZ portion of the above alignment and paste it into the lower "3D Sequence" box.
Uncheck "Apply colors to molecule". This is optional but will save some time until we get the mismatches fixed.
Click on "Color Alignment & Molecule". (The molecule won't be colored, however, since we unchecked that option.) Notice that nearly all residues are red, signifying mismatches with the aligned 3D sequence.
In the "Alignment Listing" window, touch the N-terminal Ser with the mouse and notice (in the status bar) that it is residue 23 in the sequence of 1fsz.pdb. This causes 22 dots to be prefixed, representing the missing 22 residues (presumably disordered and unresolved in the crystal). These types of gaps are typically closed up in aligned sequences. Notice that the leading sequences labeled 1FSZ and 1fsz.pdb agree, but are offset by 22 residues. To make them match, we need to slide the PDB file sequence 22 residues to the left. To instruct MSA3D to do this, enter "-22" in the slot labeled "slide the PDB file sequence to the right positions". Now press the "Color Alignment & Molecule" button again. There are now 0 mismatches (check summary line at the bottom of the Listing window).
Here is a more complicated example. Bring the main Protein Explorer window to the foreground, click on the link "MSA3D Procedure". Enter 1tub (tubulin) into the slot near "Load" at item 4 on the "MSA3D Procedure" page.
Bring the "MSA3D Form" window to the foreground, and replace the contents of the bottom box "3D Sequence" with the aligned sequence 1TUB_B. (Leave the contents of the top box unchanged.)
Delete the "-22" in the slot.
Uncheck "Apply colors to molecule".
Press the "Color Alignment & Molecule" button. Examining the listing will reveal that about 90% of the 1tub.pdb sequence is mismatched, and there is no obvious offset that would correct this. The problem is that we did not specify which chain is in the alignment, so chain A was used by default. There is not much sequence similarity between the two chains in tubulin. Enter "b" in the "Apply colors to chain(s)" slot. Press the "Color Alignment & Molecule" button again.
The first 44 residues are matched, but a 2-residue gap causes a mismatch thereafter. Touching the first dot in the gap reports in the status line that it is position 45. Therefore we must slide the PDB file sequence 2 positions to the left starting at position 45. In the "slide the PDB file sequence to the right" slot, enter "-2@45". Press the "Color Alignment & Molecule" button again.
Mismatches are now avoided up to an 8-residue gap beginning with at dot at position 361. In the "slide the PDB file sequence to the right" slot, enter "-2@45;-8@361". Press the "Color Alignment & Molecule" button again. Zero mismatches -- hooray!
Now check "Apply colors to molecule", and press the "Color Alignment & Molecule" button again. Pull the main Protein Explorer window to the foreground so you can see the molecule. The Ready/Busy indicator below the molecule will be busy while the colors are applied, and again while the Identical/Similar/Different buttons are generated.
The main purpose of the above exercise was to make clear what mismatches mean, and how to correct them when sliding is needed.
Design characteristics of the MSA3D tool.
MSA3D refuses to proceed unless all the aligned sequences have the same length. Were an unaligned sequence of the loaded PDB file to be pasted into the lower box, most likely the length would differ from the alignment, and hence this would be caught.
The sequence of the PDB file can be longer or shorter than the alignment, and vice versa.
The residue counts (and percentages, and total residues) in the summary at the bottom of the alignment listing include only the portion of PDB file residues that fit underneath the alignment. If sliding to the left causes residues in the PDB file to be skipped, they will neither be listed nor included in the summary counts.
If the PDB file sequence is longer than the alignment, residues beyond the end of the alignment will be listed in the "No Info" color, colored "No Info" in the 3D structure, and excluded from the summary counts at the bottom of the listing window.
In the alignment listing, residues in the PDB file will be numbered beginning with 1. Thus, although the first three residues in 1avq are numbered -1, 0, 1, they will be numbered 1, 2, 3 in the alignment listing.
Gaps, including a leading gap, in the PDB file sequence will be represented by dots (periods) and will require a sliding correction to avoid mismatches.
When a residue in the PDB file sequence is not identical to the residue in that position in the aligned "3D Sequence", both residues will be colored "mismatch" in the listing, and the mismatch color will also be applied to the 3D structure.

Preparing an alignment in Biology Workbench (BW).

BW is a very flexible and powerful system. The method described below is only one of many ways it could be used to prepare a multiple protein sequence alignment. It is offered primarily to get you started since BW is not very user-friendly. Despite is user-unfriendliness, BW's feature of saving your sessions makes it worth the trouble. Once you get the hang of it, you can try out other methods.

Be warned: I have almost no experience preparing alignments! I am likely omitting some information which is crucial to making good alignments. If you know of a tutorial with better or more complete advice, please tell me about it.

Go to the Biology Workbench (BW).
If you have not used BW before, click "Setup a free account". It takes only a few minutes. The advantage is that your sessions will be saved, so you can easily resume one.
After you enter BW, click the [Session Tools] button.
Select "Start New Session", and press the [Run] button.
Enter a session description, such as the name of the molecule of interest. Press the [Start New Session] button.
Press the [Protein Tools] button.
Select "Ndjinn - Multiple Database Search", and press the [Run] button.
Check "SWISSPROT". (If you can't find it in the long list of databases, use Netscape's Edit, Find in Page to look for "swiss".) Also check "PDBFINDER".
Enter the name of the molecule of interest in the slot at the top, and press the [Search] button.
A list of sequences is displayed. Each is prefixed with "PDBFINDER" (meaning it has a published 3D structure) or "SWISSPROT" (in which case often no 3D structure has been published). You need to select a subset of the hits. This is the most time-consuming part of this process. In order to get an overview of the process, for your first alignment, pick a few sequences without spending too much time and effort on the selection process. Use the [Show Records] button to get more information about the checked sequences.
After you have checked the desired sequences, and unchecked others, press the [Import Sequences] button.
Now you have a list of sequences with checkboxes. Select "Select All Sequences" and press [Run].
Uncheck any sequences you don't want to include in the alignment.
Make sure you check at least one sequence for which a 3D structure is available. Any PDBFINDER sequence has a 3D structure.
Scroll down in the list of operations at the top until you find CLUSTALW (near the middle of the list). Select it and [Run]. On the next screen titled CLUSTALW, press [Submit].
Examine the alignment carefully. An alignment that has very few identities, or very few differences, may not be informative. If you wish to exclude one or more sequences, press the [Return] button and rerun the alignment.
Once you are satisfied with the alignment, press [Import Alignment].
You should now see a list of all alignments you have made (initially just one), each with a checkbox. Notice that you are now in the Alignment Tools, no longer in Protein Tools.
Now we need to get the alignment in FASTA format. Check the checkbox for the desired alignment. Select "Edit Aligned Sequences", press [Run].
At the bottom of the page, change the format to "Fasta".
Block and copy the alignment. Paste it directly into Protein Explorer's MSA3D form. Optionally, also paste into a word processor and save it as a file for later use.
Select one sequence which matches a 3D structure PDB file. Copy that sequence into the "3D Sequence" box on the MSA3D Form. Load the corresponding PDB file. Assuming you have done the tutorial above, you will now know how to proceed.
Occasionally, CLUSTALW will fail to align a sequence correctly with other sequences. Inspect your alignment carefully in the MSA3D Alignment Listing. (If this happens, I don't know how to fix it. Suggestions are welcome.)