Guide to Using MSA3D
in Protein Explorer -- by Eric Martz, August 2000
Tutorial for First-Time Users: Enolase
Thanks to Garry Duncan, Nebraska Wesleyan College, for providing
the enolase alignment.
- Please start by reading carefully the overview on the "MSA3D
Procedure" page.
- Now click on the link "MSA3D ALIGNMENT FORM"
and carefully read the "MSA3D Form" page.
You can skip the "Advanced Options" section at the bottom.
If portions of this page
are unclear, they will become clear as we proceed.
- Now, in the "MSA3D Form" window,
please find the Ready-Made Examples and click the link "Enolase".
Accept the offer to fetch 4enl.pdb via Internet, or else you
will need to load a local copy.
Notice that clicking "Enolase" caused the relevant alignments
to be pasted into both boxes.
- Press the button "Color Alignment & Molecule". A new window
will appear containing the "MSA3D Alignment Listing". Read carefully
the explanation at the top, and see also the summary counts and
percentages at the bottom of this window.
- After you have scrutinized the Listing, click on the molecule
to bring it to the foreground. Notice that the alignment colors have
been applied to the molecule.
- Click on the links Identical, Similar, Different to spacefill
the residues in these categories.
The catalytic site is marked by a bound sulfate ion, and deeper, a
brown zinc ion.
Notice that the entire area around
the active site is "Identical" in an evolutionary span from Archebacteria
through man!
Pasting in an alignment and correcting mismatches with "sliding".
Thanks to Gabe McCool, University of Massachusetts Amherst,
for acquainting me with these molecules.
- Instructions are given for making an alignment in Biologists
Workbench in a later section below. Before you do that, the prepared alignment
in this section will give you some
useful experience in using the MSA3D feature. This is an alignment between
chain B of tubulin, and the bacterial cell division protein Ftsz.
These two proteins have less than 20% sequence similarity, but a high
level of structural similarity.
(For more information on these proteins,
please see
Exploring structure and function of FtsZ, a prokaryotic cell
division protein and
tubulin-homologue
by Gabe J. McCool.) The alignment below was done by ClustalW in
Biology Workbench, using default settings. Given the low level of
sequence homology, the alignment may not be very meaningful, but it
is useful to illustrate some features of MSA3D.
>1TUB_B; Tubulin from Sus scrofa, electron diffraction
MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQL--ERINVYYNEAAGNKYVPRAILVDLEP
GTMDSVRSGPFGQIFRPDNFVFGQSGAGNNWAKGHYTEGAELVDSVLDVVRKESESCDCLQGFQLTHSLG
GGTGSGMGTLLISKIREEYPDRIMNTFSVVPSPKVSDTVVEPYNATLSVHQLVENTDETYCIDNEALYDI
CFRTLKLTTPTYGDLNHLVSATMSGVTTCLRFPGQLNADLRKLAVNMVPFPRLHFFMPGFAPLTSRGSQQ
YRALTVPELTQQMFDAKNMMAACDPRHGRYLTVAAVFRGRMSMKEVDEQMLNVQNKNSSYFVEWIPNNVK
TAVCDIPPRGLKMSATFIGNSTAIQELFKRISEQFTAMFRRKAFLHWYTGEGMDEMEFTEAESNMNDLVS
EYQQYQD
>1FSZ; Methanococcus jannaschii
-------------SPEDKELLEYLQQTKAKITVVGCGGAGNNTI--TRLKMEG--------IEGAKTVAINT
DAQQLIRTKADKKILIGKKLTRG-LGAG-----GNPKIGEEAAKESAEEIKAAIQDSDMVF---ITCGLG
GGTGTGS-APVVAEISKKIG---ALTVAVVTLPFVMEGKVRMKNAMEGLERLKQHTDTLVVIPNEKLFEI
VPN--MPLKLAFKVADEVLINAVKGLVELITKDGLINVDFADVKAVMN---NGGLAMIGIG--ESDSEKR
AKEAVSMALNSPLLDVD-----IDGATGALIHVMGPED--LTLEEAREVVATVSSR--------------
----------LDPNATIIWG--------ATIDENLENTVRVLLVITGVQSR----IEFTDTGLKRKKL--
-------
- Load 1fsz.pdb.
- In the "MSA3D Form" window, press the "Clear Form" button
and OK the confirmation. Block and paste the alignment above into
the top "Alignment Box" on the MSA3D Form window. (Don't worry
about the spaces at the beginning of each line -- spaces will be
ignored.)
- Block the 1FSZ portion of the above alignment and
paste it into the lower "3D Sequence" box.
- Uncheck "Apply colors to molecule". This is optional but will save
some time until we get the mismatches fixed.
- Click on "Color Alignment & Molecule". (The molecule won't be colored,
however, since we unchecked that option.) Notice that nearly all residues
are red, signifying mismatches with the aligned 3D sequence.
Had we loaded
the wrong PDB file altogether, this would be the result, and this coloring
would prevent you from inadvertantly thinking the alignment colors
could be meaningfully applied to the 3D structure.
Notice that the number of residues not mismatched is 4 + 1 + 10 = 15.
Given that there are 20 amino acids, we can expect about 5% matches
at random, in the absence of any sequence similarity. Note that 15 residues is
close to 5% of the 312 residues shown.
- In the "Alignment Listing" window,
touch the N-terminal Ser
with the mouse and notice (in the status bar) that it is residue 23 in the
sequence of 1fsz.pdb. This causes 22 dots to be prefixed, representing
the missing 22 residues (presumably disordered and unresolved in the crystal).
These types of gaps are typically closed up in aligned sequences.
Notice that the leading sequences labeled 1FSZ and 1fsz.pdb agree, but
are offset by 22 residues. To make them match, we need to slide
the PDB file sequence 22 residues to the left. To instruct MSA3D to do this, enter
"-22" in the slot labeled "slide the PDB file sequence to the right
positions". Now press the "Color Alignment & Molecule" button again.
There are now 0 mismatches (check summary line at the bottom of the Listing
window).
- Here is a more complicated example.
Bring the main Protein Explorer window to the foreground,
click on the link "MSA3D Procedure".
Enter 1tub (tubulin) into the slot near "Load"
at item 4 on the "MSA3D Procedure" page.
- Bring the "MSA3D Form" window to the foreground, and replace
the contents of the bottom box "3D Sequence" with the aligned
sequence 1TUB_B. (Leave the contents of the top box unchanged.)
- Delete the "-22" in the slot.
- Uncheck "Apply colors to molecule".
- Press the "Color Alignment & Molecule" button. Examining the listing
will reveal that about 90% of the 1tub.pdb sequence is mismatched, and there
is no obvious offset that would correct this. The problem is that we did
not specify which chain is in the alignment, so chain A was used by default.
There is not much sequence similarity between the two chains in tubulin.
Enter "b" in the "Apply colors to chain(s)" slot.
Press the "Color Alignment & Molecule" button again.
- The first 44 residues are matched, but a 2-residue gap causes
a mismatch thereafter. Touching the first dot in the gap reports
in the status line that it is position 45. Therefore we must slide
the PDB file sequence 2 positions to the left starting at position 45.
In the "slide the PDB file sequence to the right" slot, enter
"-2@45". Press the "Color Alignment & Molecule" button again.
- Mismatches are now avoided up to an 8-residue gap beginning with at dot
at position 361.
In the "slide the PDB file sequence to the right" slot, enter
"-2@45;-8@361". Press the "Color Alignment & Molecule" button again.
Zero mismatches -- hooray!
- Now check "Apply colors to molecule", and
press the "Color Alignment & Molecule" button again.
Pull the main Protein Explorer window to the foreground so you can see
the molecule. The Ready/Busy indicator below the molecule will be busy
while the colors are applied, and again while the Identical/Similar/Different
buttons are generated.
- The main purpose of the above exercise was to make clear what
mismatches mean, and how to correct them when sliding is needed.
Design characteristics of the MSA3D tool.
- MSA3D refuses to proceed unless all the aligned sequences have
the same length. Were an unaligned sequence of the loaded PDB file
to be pasted into
the lower box, most likely the length would differ from the alignment, and
hence this would be caught.
- The sequence of the PDB file can be longer or shorter than the
alignment, and vice versa.
- The residue counts (and percentages, and total residues) in the
summary at the bottom of the alignment listing include only the portion
of PDB file residues that fit underneath the alignment.
If sliding to the left causes residues in the PDB file to be skipped,
they will neither be listed nor included in the summary counts.
- If the PDB file sequence
is longer than the alignment, residues beyond the end of the alignment
will be listed in the "No Info" color, colored "No Info" in the 3D
structure, and excluded from
the summary counts at the bottom of the listing window.
- In the alignment listing, residues in the PDB file will be numbered
beginning with 1. Thus, although the first three residues in 1avq are
numbered -1, 0, 1, they will be numbered 1, 2, 3 in the alignment listing.
- Gaps, including a leading gap, in the PDB file sequence
will be represented by dots (periods) and will
require a sliding correction
to avoid mismatches.
- When a residue in the PDB file sequence is not identical to the
residue in that position in the aligned "3D Sequence", both residues
will be colored "mismatch" in the listing, and the mismatch color will
also be applied to the 3D structure.
Preparing an alignment in Biology Workbench (BW).
BW is a very flexible and powerful system. The method described
below is only one of many ways it could be used to prepare a multiple
protein sequence alignment. It is offered primarily to get you started
since BW is not very user-friendly. Despite is user-unfriendliness,
BW's feature of saving your sessions makes it worth the trouble.
Once you get the hang of it, you can try out other methods.
Be warned: I have almost no experience preparing alignments!
I am likely omitting some information which is crucial to making
good alignments.
If you know of a tutorial with better or more complete advice, please
tell me about it.
- Go to the Biology Workbench (BW).
- If you have not used BW before, click "Setup a free account".
It takes only a few
minutes. The advantage is that your sessions will be saved, so you can easily
resume one.
- After you enter BW, click the [Session Tools] button.
- Select "Start New Session", and press the [Run] button.
- Enter a session description, such as the name of the molecule
of interest. Press the [Start New Session] button.
- Press the [Protein Tools] button.
- Select "Ndjinn - Multiple Database Search", and press the [Run]
button.
- Check "SWISSPROT". (If you can't find it in the long list
of databases, use Netscape's Edit, Find in Page to look for "swiss".)
Also check "PDBFINDER".
- Enter the name of the molecule of interest in the slot at the top,
and press the [Search] button.
- A list of sequences is displayed. Each is prefixed with "PDBFINDER"
(meaning it has a published 3D structure) or "SWISSPROT" (in which case
often no 3D structure has been published). You need to select a subset
of the hits. This is the most time-consuming part of this process.
In order to get an overview of the process, for your first alignment,
pick a few sequences without spending too much time and effort on the
selection process. Use the [Show Records] button to get more information
about the checked sequences.
- After you have checked the desired sequences, and unchecked others,
press the [Import Sequences] button.
- Now you have a list of sequences with checkboxes. Select
"Select All Sequences" and press [Run].
- Uncheck any sequences you don't want to include in the alignment.
- Make sure you check at least one sequence for which a
3D structure is available. Any PDBFINDER sequence has a 3D structure.
- Scroll down in the list of operations at the top until you find
CLUSTALW (near the middle of the list). Select it and [Run]. On the
next screen titled CLUSTALW, press [Submit].
- Examine the alignment carefully. An alignment that has very few
identities, or very few differences, may not be informative. If you wish
to exclude one or more sequences, press the [Return] button and rerun
the alignment.
- Once you are satisfied with the alignment, press [Import Alignment].
- You should now see a list of all alignments you have made (initially
just one), each with a checkbox. Notice that you are now in the Alignment
Tools, no longer in Protein Tools.
- Now we need to get the alignment in FASTA format. Check the
checkbox for the desired alignment. Select "Edit
Aligned Sequences", press [Run].
- At the bottom of the page, change the format to "Fasta".
- Block and copy the alignment. Paste it directly into
Protein Explorer's MSA3D form. Optionally, also paste into a word processor
and save it as a file for later use.
- Select one sequence which matches a 3D structure PDB file. Copy
that sequence into the "3D Sequence" box on the MSA3D Form. Load the
corresponding PDB file. Assuming you have done the tutorial above,
you will now know how to proceed.
- Occasionally, CLUSTALW will fail to align a sequence correctly
with other sequences. Inspect your alignment carefully in the
MSA3D Alignment Listing. (If this happens, I don't know how to fix it.
Suggestions are welcome.)