Structure determination of proteins in solution by nuclear magnetic resonance spectroscopy.KV Chary, HS Atreya
Department of Chemical Sciences, Tata Institute of Fundamental Research, Colaba, Mumbai - 400005, India. , India
Correspondence Address: Source of Support: None, Conflict of Interest: None PMID: 12082342
Source of Support: None, Conflict of Interest: None
Keywords: Human, Imaging, Three-Dimensional, Magnetic Resonance Spectroscopy, Protein Conformation, Proteins, chemistry,Support, Non-U.S. Gov′t,
Each cell in the human body consists of a myriad of biological macromolecules and organelles, which are required for various cellular functions and metabolism. A key component in such a system comprises of proteins, which play a crucial role in proper functioning of the cell. There are an estimated 100,000 different proteins present in the human body. The activity of a protein molecule inside the cell is indirectly governed by the overall fold of the individual polypeptide chains, or in other words, their three dimensional (3D) structures in space. Thus, the knowledge of the 3D structure of a given protein is most essential for a complete understanding of its function inside the cell. Many diseases in humans such as Alzheimers’, Parkinsons’, Prion disease, Cystic fibrosis, cancers etc. are attributed as the result of malfunctioning of proteins. Further, knowledge of the 3D structure of a protein, involved in a disease, is eventually used in designing its target drugs. Such a sphere of activity is popularly called as quantitative structure activity relationship (QSAR).
As of today, there are only two experimental techniques available to determine the 3D structure of proteins. These are namely, X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy. In this article, we will review the various steps involved in unraveling the 3D structures of proteins by NMR. As an example, we will consider the 3D structure of a calcium binding protein from Entamoeba histolytica (also called as EhCaBP), which we have recently determined using NMR.
The phenomenon of NMR arises in certain elements or their isotopes, whose nuclei possess, what is called a non-zero nuclear spin. Such nuclei are termed as NMR-active nuclei. Thus, the nucleus of an atom can be imagined as a ball spinning about an axis (see Scheme I below). The nuclear spin is measured experimentally in units of Planck’s constant. Elements such as H (Hydrogen atom, or more commonly known as proton), 13C (an isotope of carbon), 15N (an isotope of nitrogen) etc. naturally possess a nuclear spin of ˝ . Such spin ˝ nuclei when placed in a magnetic field get distributed (or quantized) into two distinct energy levels, with some atoms of the sample in the lower energy level (called the ground state or a-state) and the remaining in the higher energy level (called the excited state or b-state) (Scheme I). Such distribution of nuclei is governed by the famous Boltzman distribution, which results in more atoms in the ground state than the excited state.
Now, nuclei in the lower energy level can be taken to higher energy level (also known as excitation) by applying an energy (GE), which corresponds to the difference in energies of the ground state and the excited state (see Scheme I). However, similar kind of nuclei in a given molecule cannot be excited with the same energy, GE. This is primarily because the splitting in the energy levels (GE) of nuclei of a given type (1H, 13C or 15N) depends on their surrounding chemical environment and, is mostly different in different parts of the molecule. Thus, in the same molecule, say ethanol (CH3CH2OH), protons (1H nuclei) belonging to the CH3 group will have different energy splitting as compared to that of CH2, which in turn is different from the OH groups in the same molecule, due to their different chemical environments. This difference in splitting among different nuclei, and hence the difference in energy required to excite them, forms the crux of high resolution NMR spectroscopy. A plot of frequency versus the intensity of absorption of energy forms the NMR spectrum. The frequency of absorption of individual groups of nuclei is associated with their chemical shifts. A NMR spectrum of ethanol is shown in [Figure:1A], where the three set of lines arising from three different types of protons are indicated. Chemical shifts (or frequencies) of resonance lines in a NMR spectrum are always reported with respect to a reference sample, which is usually taken as the zero of reference. The most common reference is tetramethylsilane (TMS). This is shown in [Figure:1A], where TMS has been assigned 0 ppm (parts per million, the unit of chemical shifts). The extent of splitting in energy levels (or GE in Scheme I) determines the sensitivity of a NMR spectrum, which in turn, depends on the strength of the external magnetic field (denoted as B0 in Scheme I). B0 is given in terms of the frequency required to excite a proton at a given magnetic field strength. Modern day spectrometers have field strengths ranging form 90 MHz to 1000 MHz.
Although the NMR spectrum of ethanol [Figure:1A] appears very simple, with three distinct sets of peaks, a NMR spectrum of a protein molecule is extremely complex with many hundreds of peaks. Since each type of proton in a molecule gives rise to a peak (or resonance line) in the NMR spectrum, a protein consisting of about 100 amino acids will display, on an average, 600-700 resonance lines. As an illustrative example of such a complexity of NMR spectrum, the one-dimensional (1D) NMR spectrum of a calcium binding protein from Entamoeba histolytica is shown in [Figure:1B] (this protein is made up of 134 amino acid residues and there are ~900 observable proton resonances). The fact that analysis of such a complex spectrum is nearly impossible, led to the concept of multidimensional NMR experiments. Consider a one-dimension (1D) spectrum [Figure:1A] and [Figure:1B], which consists of overlapping peaks as depicted in Scheme II. If we spread the peaks in a two-dimensional plane (2D) some peaks will get resolved, depending on what
property we choose to separate them in a plane. However, there can still remain some overlaps in a 2D spectrum, which can be further resolved by resorting to third dimension or 3D
Thus, it is easier to analyze multidimensional NMR spectra of a protein owing to its good resolution. Normally, all the dimensions in a multidimensional NMR spectrum (2D or 3D) consist of frequencies of either proton (homonuclear) or other nuclei such as 13C/15N (heteronuclear). For a given protein, various multidimensional spectra are recorded which are then used in a concerted manner to obtain its 3D structure as discussed below.
The 3D structure determination of proteins by NMR proceeds in the following steps (see Scheme III), each of which is described in detail below.
The protein sample preparation step consists of dissolving the required amount of protein under investigation in a small quantity (~600 microlitres) of water (H2O) or deuterium oxide (D2O), to obtain the desired concentration The concentration required to obtain a good spectrum depends on factors such as stability of the protein against aggregation, the sensitivity of the NMR spectrometer and the type of NMR experiments to be recorded on the sample. While a protein concentration of 1 to 2 mM suffices in most cases, the lower limit on concentration is largely determined by the spectrometer sensitivity. With highly sensitive modern day spectrometers, equipped with what is known as cryo-probes, one can obtain a good NMR spectrum with a concentration as low as 50 mM of the protein samples. For a protein with a molecular weight (Mr) of 10 kDa (1kDa ? 1000 protons) (roughly 90 amino acids), 10 mg dissolved in 500 ?l of solvent corresponds to 2 mM, or 250 ?g of protein dissolved in same amount of the solvent corresponds to 50 ?M.
In the case of large molecular weight proteins (Mr > 20 kDa), there is a severe overlap of peaks in the NMR spectrum (as seen in [Figure:1B]. In such cases, one can utilize the good resolution and sensitivity offered by 13C and 15N nuclei in combination with multidimensional NMR experiments.
However, as proteins are naturally deficient in 13C (only 1 13C atom in every 100 C atoms) and 15N nuclei (only 1 15N atom in every 300 N atoms), it is necessary to enrich them with 13C and 15N isotopes. This is achieved by over-expressing the protein in a suitable host, such as E. coli. The host, E. coli is grown in a medium containing 15NH4Cl and 13C-Glucose (these are available commercially) as the sole source of nitrogen and carbon, respectively. This methodology is also referred to as isotope labeling. Detailed description of this subject is beyond the scope of this review. However, it is suffice to mention that, as of today, one can isotopically label any given protein with 13C or/and 15N. At times, it may be also necessary to partially or uniformly deuterate (2H) the protein, particularly for large molecular weight proteins (Mr > 20 kDa), for which the protocols are well established.
The next step consists of recording a series of different NMR experiments on the protein sample. These experiments, which range from a simple 1D spectrum to complex 2D, 3D and 4D experiments, fall in two classes (see [Figure - 2] for an illustrative example of a 2D and 3D NMR spectrum). The first category of experiments is aimed at assigning all the NMR active nuclei in the protein, which constitutes Step 3, discussed below. Once such sequence specific resonance assignments (hereafter referred to as ssr_assignment) are done, the second category of experiments (e.g. 2D NOESY, 3D NOESY-HSQC etc.) is used to obtain different types of structural constraints. Such constraints are used as inputs in molecular modelling to compute the final 3D structure of protein. Depending on the protein concentration, the type of experiment used, and the spectrometer sensitivity, a 2D experiment generally, can be recorded in few hours, while a 3D experiment may take 12–48 hours for completion.
Since different types of nuclei in a given molecule give rise to different NMR signals [Figure:1A], it is necessary to identify and assign all the signals in the NMR spectrum to their respective nuclei. This implies that in a protein, each NMR signal has to be assigned to its respective nuclei in all the amino acid residues. This process is called as ssr_assignments. Ssr_assignments, if carried out manually, constitutes a tedious and time-consuming task. However, in recent years, many methodologies have been proposed to carry out ssr_assignments in an automated fashion.,,,, Such methodologies, in general, use NMR data from different
experiments as input and directly output the assignments.
A stretch of amino acid residues in the protein primary sequence can be involved in a specific local geometry, such as ?-helix, ?-sheet, ?-turn or ? loop, which are also popularly called as secondary structural elements. The overall disposition of the secondary structural elements in the 3D space constitutes a complete picture of the tertiary structure of the protein. This implies that residues that are located far apart in the primary sequence can come closer in space, within a short distance (2-5 Ĺ) (1Ĺ = 10-10 meters). This is depicted in Scheme IV(A), where two protons are shown to come closer in space due to the tertiary structural fold of the protein. On the other hand, presence of secondary structural elements in a protein provides constraints on the local conformation of amino acid residues. Such a conformation is identified using torsion angles, which is defined for a covalent bond as shown in Scheme IV(B). Thus, determination of torsion angle values in amino acid residues can indicate whether they are part of a a-helix or b-sheet.
The identification of such short range (2-2.5Ĺ), medium range (2.5-4.0 Ĺ) and long range (4.0-6.0 Ĺ) distance contacts between nuclei belonging to different amino acid residues in space and torsion angle values for the local conformation, helps in identifying a unique overall geometry of the protein. Experimentally, protons close in space in the protein transfer part of their energy to each other. The magnitude of transfer can be used as a measure of their closeness. On the other hand, torsion angle values can be estimated by measuring the coupling between 2 nuclei that are separated by three covalent bonds (e.g. HN-Ha as in Scheme IV(B)).
The different distance and torsion angle constraints generated in the previous step can now be used to define a unique geometry of the protein, starting from a random configuration of amino acid residues in space. This is done using molecular modeling programs, which use these constraints as input to obtain an energy minimized 3D structure. Energy minimized structures are preferred, as molecules tend to be in their minimum energy state in their native form.
The procedures outlined in the previous sections are demonstrated on a 15 kDa (134 amino acid residues) calcium binding protein from the Entamoeba histolytica (EhCaBP). A number of biochemical experiments suggest that calcium (Ca2+) may be involved in the pathogenetic mechanisms of amoebiasis. Thus, in order to understand the mechanism by which Ca2+ effects virulence and gain more insight of the function of the protein, a gene encoding a novel calcium binding protein was isolated from E. histolytica and cloned in E. coli. The protein was isotopically labelled with 13C or/and 15N for NMR experiments. Ssr_ssignments were achieved using a series of heteronuclear multidimensional NMR experiments., Subsequently, 1265 distance and 200 torsion angle constraints obtained for the protein were used as input to the molecular dynamics program, DYANA, which computes 3D structure of proteins using experimental structural constraints. A minimum energy structure obtained is shown in [Figure - 2]. The 3D structure of EhCaBP reveals that it belongs a well-known family of EF-hand proteins, a popular member of this family being, Calmodulin. Such a structure of EhCaBP can now be used as a template to further investigate structure-function relationship in this protein.
The National Facility for High Field NMR supported by Department of Science and Technology, Department of Biotechnology, Council of Scientific and Industrial Research, and Tata Institute of Fundamental Research, Mumbai.
The facilities provided by the National Facility for High Field NMR, supported by Department of Science and Technology (DST), Department of Biotechnology (DBT), Council of Scientific and Industrial Research (CSIR), and Tata Institute of Fundamental Research, Mumbai, are gratefully acknowledged. We dedicate this paper in the memory of late Prof. G. N. Ramachandran (1922-2001).
[Figure - 1], [Figure - 2], [Figure - 3], [Figure - 4], [Figure - 5], [Figure - 6]