High-Throughput Toxin Structure Determination
NMR spectroscopy has historically been the method of choice for structural characterization of DRPs. Indeed, the first protein NMR structure was of a proteinase inhibitor, containing six cysteine residues and folding into a Kazal-like fold (Williamson, Havel, & Wüthrich, 1985). Crystallography has typically not been applied to toxins, in particular those that have disordered regions, where presumably sufficient stable crystal contacts cannot be achieved. The flexible loops and tails of DRPs constitute a much larger percentage of the surface area of the toxins compared to large globular proteins, and this is likely one of the reasons for the dominance of NMR in structural elucidation for these molecules. An EM structure of an individual DRP has not been reported, as their size is substantially below the resolving power of the current generation of detectors. Nevertheless, EM structures of toxin:channel complexes have emerged recently and the application of EM in such studies is likely to become dominant in the future (Cao, Liao, Cheng, & Julius, 2013).
The early NMR studies of DRPs utilized homonuclear NMR where resonance assignments and distance restraints rely on 1H atoms only. For small DRPs up to ~ 4 kDa, this approach still yields well-resolved structures of good quality. However, for proteins > 8 kDa, the approach becomes increasingly difficult to apply as spectral crowding in the 2D homonuclear (1H1H) leads to overlap of signals that introduce ambiguity in the structure determination process, ultimately leading to poorly resolved structures. The homonuclear approach has an additional disadvantage in that manual analysis is required and even for small ~ 4 kDa DRPs, it would be nearly impossible to achieve automated resonance assignment and structure determination. Thus, homonuclear NMR is poorly suited as a component in a high-throughput sequence-to-toxin approach as described earlier. In the classical toxin-to-sequence approach, however, homonuclear NMR remains a central and critical component.
In a high-throughput setting, modern multidimensional heteronuclear NMR methods are far more applicable. The heteronuclear approach utilizes through-bond correlations between 1H, 13C, and 15N nuclei to yield unambiguous connectivities. This, of course, requires production of toxins uniformly labeled with 13C and 15N isotopes (Atreya, 2012). The isotope-labeled sample is used to acquire multiple 2D, 3D, and 4D NMR experiments. These multidimensional experiments contain far less information than the 2D homonuclear experiments, as they are filtered to show very specific correlations (like a 2D gel). Thus, analysis can be time consuming, not because the data is complex or difficult to interpret, but rather because signals must be identified in vast empty regions of multiple 3D experiments. This procedure is fortunately highly amenable to automation through spectral analysis software (Lee, Petit, Cornilescu, Stark, & Markley, 2016). The NMR experiments used for heteronuclear structural analysis are summarized in Table 2.
The general procedure is to first achieve resonance assignment, which is done by acquisition of two experiments that identify the correlation between the resonances of backbone atoms of two adjacent residues. This is referred to sequential backbone resonance assignment. Once this is achieved a series of 3D NMR experiments are used to correlate the nuclear frequencies of sidechain 1H, 13C atoms and 15N atoms. Once complete resonance assignment is achieved, a number of 3D experiments are acquired that correlate 1H atoms that are close in space. The atoms are further resolved through correlation of one of the 1H atoms to its covalently attached 13C or 15N atom. Typically a few thousand distances are measured in these experiments, which are subsequently used to determine the structure of the protein by a computational method.
When coupled to spectral analysis software, the entire procedure can be fully automated. A universal platform for achieving this is being developed by the NMR community called NMRbox (https://www.nmrbox.org/), which is a virtual box that contains all necessary tools for automated structure determination. These approaches are an area of intense research and it is likely that such automation will become routine in the near future.
An aspect of the heteronuclear NMR approach that makes it ill suited for high-throughput applications is the time requirement of the many multidimensional NMR experiments required. Furthermore, the success of the automation will critically depend on the resolution and signal to noise. The relationship between the time requirement and resolution increases as the power of the dimensionality. An 8-h 2D experiment thus becomes a 64-h 3D experiment and a 4096-h 4D experiment. It is therefore not uncommon that time requirements for data acquisition to solve the structure of a single protein can run into several weeks if not months. This would
certainly be stretching the definition of high throughput. Fortunately, the majority of multidimensional heteronuclear NMR spectra are largely empty, containing information in a few percent of the spectrum. Statistically, it may then be possibly to extract this information without the heavy time burden imposed by the Fourier transform in traditional NMR data acquisition. The problem is very similar to how very large raw images can be compressed into tiny files by compression methods such as JPEG. The same compression philosophy can be implemented in NMR and it has been shown that 3D and 4D NMR data can be acquired in the same amount of time that would traditionally be required for a 2D NMR spectrum (Mobli, 2015; Mobli & Hoch, 2014; Mobli, Stern, Bermel, King, & Hoch, 2010). Using these advanced statistical methods of spectral reconstruction from nonuniformly sampled (NUS) multidimensional NMR data, it is then possible to reduce the weeks of NMR time typically required to less than 1 week. In theory, it is possible to reduce this time even further if optimal sampling can be achieved. Currently, such optimized sampling methods have not emerged but it if the full potential of NUS can be achieved it is likely that fully automated DRP structure determination will be achieved in the near future in a few hours of acquisition time for concentrated samples (> 2 mM), and in a few days for dilute samples. Thus, currently a fully occupied NMR magnet can be used to solve ~ 50 DRP structures per year, but we anticipate that this number can increased to anywhere between 100 and 500. The relatively low throughput of current structural characterization methods therefore requires prioritization of peptides to be analyzed, which can be guided based on preliminary functional studies.
In the research group of one of the authors, a pipeline for automated DRP structure determination called ASAP-NMR is being developed. In its current form, isotopically labeled proteins are used to acquire multidimensional heteronuclear NMR data at high field (900 MHz NMR spectrometer at UQ is typically used). The following 3D datasets are acquired using NUS and used for backbone resonance assignment: 3D HNCO, 3D CBCA(CO)NH, and 3D HNCACB. In general, 5–10% of a very high-resolution “master” dataset is sampled. These three 3D experiments require ~ 12 h of NMR time in total (see also Table 2). A 4D HCC(CO)NH experiment is then collected using NUS to assign aliphatic sidechain atoms; this experiment requires ~ 1 day of NMR time (Mobli et al., 2010). Three NOESY datasets are collected: one 3D 15N-edited NOESY-HSQC experiment as well as two 13C-edited HSQC-NOESY experiment focused on the aliphatic and aromatic regions, respectively. These 3D experiments are currently acquired using traditional uniform sampling, primarily because these experiments are not as sparse as the experiments used for resonance assignment, and in the absence of optimal sampling only modest time savings can be achieved through NUS. The NOESY data requires a total of 3 days, resulting in a total data acquisition time of ~ 4 days per toxin. NUS datasets are automatically processed using MaxEnt, while the traditional NOESY datasets are processed by Fourier transformation (Mobli, Maciejewski, Gryk, & Hoch, 2007). Peak information is automatically extracted from the spectra using in house software (PEAKY). This software performs peak picking followed by line-shape fitting using the Levenberg–Marquardt algorithm and is applicable to spectra with up to four dimensions. Automated sequence-specific resonance assignment is achieved using the FLYA algorithm (Lopez-Mendez & Guntert, 2006). Backbone dihedral angles are determined from chemical shifts using TALOS (Cornilescu, Delaglio, & Bax, 1999; Shen, Delaglio, Cornilescu, & Bax, 2009), and automated NOESY assignment and structure calculation performed using CYANA (Güntert, 2004). Structure calculations are currently manually supervised and amended until all inconsistencies have been rectified. This final step generally requires 1 day of manual work. So far, we have used this approach to obtain complete NMR resonance assignments for > 20 toxins and have obtained near complete assignment (~ 90%) for several larger proteins (10–15 kDa) (Casey et al., 2016; Klint, Chin, & Mobli, 2015; Lau, King, & Mobli, 2016).
Low-field nuclear magnetic resonance (NMR) spin–spin relaxation (T2) measurements were used to study the denaturation and aggregation of β-lactoglobulin (β-LG) solutions of varying concentrations (1–80 g L−1) as they were heated at temperatures ranging from ambient up to 90 °C. For concentrations of 1–10 g L−1, the T2 of β-LG solutions did not change, even after heating to 90 °C. A decrease in T2 was only observed when solutions having higher concentrations (20–80 g L−1) were heated. Circular dichroism (CD) spectroscopy and fluorescence tests using the dye 1-anilino-8-naphthalene sulfonate (ANS) on 0.2 and 1 g L−1 solutions, respectively, indicated there were changes in the protein's secondary and tertiary conformations when the β-LG solutions reached 70 °C and above. In addition, dynamic light scattering (DLS) showed that protein aggregation occurred only at concentrations above 10 g L−1 and for heating at 70 °C and above. The hydrodynamic radius increased as T2 decreased. When excess 2-mercaptoethanol was added, the changes in both T2 and the hydrodynamic radius followed the same trend for all β-LG protein concentrations between 1 and 40 g L−1. These observations led to the conclusion that the changes in T2 were due to protein aggregation, not protein unfolding. Copyright © 2007 Society of Chemical Industry