Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Fast closure of long loops at the initiation of the folding transition of globular proteins studied by time-resolved FRET-based methods

Fast closure of long loops at the initiation of the folding transition of globular proteins... The protein folding problem would be considered "solved" when it will be possible to "read genes", i.e., to predict the native fold of proteins, their dynamics, and the mechanism of fast folding based solely on sequence data. The long-term goal should be the creation of an algorithm that would simulate the stepwise mechanism of folding, which constrains the conformational space and in which random search for stable interactions is possible. Here, we focus attention on the initial phases of the folding transition starting with the compact disordered collapsed ensemble, in search of the initial sub-domain structural biases that direct the otherwise stochastic dynamics of the backbone. Our studies are designed to test the "loop hypothesis", which suggests that fast closure of long loop structures by non-local interactions between clusters of mainly non-polar residues is an essential conformational step at the initiation of the folding transition of globular proteins. We developed and applied experimental methods based on time-resolved resonance excitation energy transfer (trFRET) measurements combined with fast mixing methods and studied the initial phases of the folding of Escherichia coli adenylate kinase (AK). A series of AK mutants were prepared, in which the ends of selected backbone segments that form long closed loops or secondary structure elements were labeled by donors and acceptors of excitation energy. The end-to-end distance distributions of such segments were determined under equilibrium and during the fast folding transitions. These experiments show that three out of seven long loops that were labeled in the AK molecule are closed very early in the transition. The N terminal 26-residue loop (loop I) is closed in 00 s after the initiation of folding, while the <2 strand included in loop I is still disordered. The closure *Corresponding author: Elisha Haas, The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan 52900, Israel, Phone: +972 546270012, E-mail: Elisha.haas@biu.ac.il Tomer Orevi, Gil Rahamim, Sivan Shemesh, Eldad Ben Ishay and Dan Amir: The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel of the second 44-residue loop (loop II, which starts at the end of loop I) is also complete within 00 s. Four other <3 loops as well as five secondary structures of the CORE domain of AK (an helix and four strands) are formed at a late step, at a rate of 0.5 .3 s­1, the rate of the coop±0 erative folding of the molecule. These experiments reveal a hierarchically ordered pathway of folding of the AK molecule, ranging from microseconds to seconds. The results reviewed here, obtained mainly from studying a small number of model proteins, support the counterintuitive mechanism whereby non-local interactions are effective in the initiation of the folding pathways. The experiments presented demonstrate the importance of mapping the rates of sub-domain structural transitions along the folding transition, in situ, in the context of the other sections of the chain, whether folded or disordered. These experiments also show the power of the time-resolved FRET measurements in achieving this goal. A large body of data obtained by theoretical and experimental studies that support, or can accommodate, the loop hypothesis is reviewed. We suggest that mapping multiple sub-domain structural transitions during the refolding transition of many proteins using the approach presented here will refine the conclusions and help reveal some common principles of the initiation of the folding. To achieve this goal, the trFRET measurements should be combined with mutagenesis experiments where the role of selected residue clusters will be tested by perturbation mutations. Nevertheless, the solution of the protein folding problem depends on the application of many additional approaches, both experimental and theoretical, while the approach presented here is only a small section of the big puzzle. Keywords: FRET; globular proteins; long loops; loop closure; protein folding. DOI 10.1515/bams-2014-0018 Received September 18, 2014; accepted October 24, 2014 170Orevi et al.: The loop hypothesis of protein folding Introduction The fast and efficient transition of an ensemble of disordered proteins under physiological conditions to a much more compact and uniform ensemble of folded molecules has been the subject of numerous studies for more than seven decades and is still not fully understood. An ensemble of disordered protein molecules, which populates a large conformational space and undergoes fast transition to an ensemble of ordered molecules, explores a very large number of pathways, due to the stochastic nature of the search. Yet, the folding transition is fast and efficient, and all the pathways converge to an ensemble of ordered molecules that occupy only a very small portion of the conformational space. This led to the hypothesis that the mechanism of the folding transition of globular proteins is not a random search (the thermodynamic hypothesis), but rather a programmed hierarchical sequence of subdomain or domain transitions that gradually constrains the conformational space available to the polypeptide chain. The "thermodynamic hypothesis" [1­8] suggests that, in principle, one should be able to predict the product of the conformational search (i.e., the folded conformation) by searching for the free energy minimum for a given system of protein plus solvent [3­8]. However, the high rate and efficiency of the folding transition are not explained by this approach. This led to the hypothesis that it should be possible to characterize transient ensembles of partially ordered molecules in which specific molecular interactions are effective and specific structural features (whether native-like or nonnative-like) are formed. The assumption was that a series of such partial transitions forms a pathway that should be characteristic of the protein; a search for such generalized principles is still ongoing. The underlying assumption was that the mechanism of folding is hierarchical and that the sequence information encoded in each globular protein includes a set of "instructions" for the construction of the folded states through the gradual formation of sub-domain structures. Thus, the genetic information should be viewed as a "blue print" for the construction process, and, hence, the challenge is to decipher the relationship between the sequence information and the specific steps in the transition process. The challenge inherent in this approach is the need to characterize transient ensembles that include a wide range of conformations, with large variance of structural characteristics; each of these conformational states is poorly populated and very short lived. The common methods, relying on the determination of population averages of structural characteristics, are insufficient, and methods for very fast capture of the mean and the variance of each measured structural characteristic should be developed. Historically, inspired by the chemical concepts of reaction intermediates, it was assumed that the detection of kinetic intermediates along the pathway [9] and residual structures [10] should reveal some principles of the folding mechanism [9, 11, 12]. Metastable intermediates, however, may actually slow the folding transition and might not be the correct target for the search [13, 14]. When considering the possible mechanisms of stepwise structure formation, the "bottom-up" model seems intuitively to be the first choice [11, 15]. In this model, local interactions (LIs) between near-neighbor residues along the chain are assumed to first form and stabilize short structural elements, such as secondary structures. Those elements then coalesce to form a higher level of contacts. The entropy change resulting from such a transition is relatively small, and the probability of formation of pairwise interactions internal to short chain segments is high. An alternative model is based on the hypothesis that the dominant interactions that are essential for the early determination of the folding pathway, enabling rapid and early elimination of major sections of the conformational space, are non-local interactions (NLIs) between monomers separated by segments of the chain. This model is counterintuitive, since the probability of interaction between widely separated sites is lower than that of near neighbors, and, thus, these NLIs should be harder to form than LIs. The relative importance of local vs. non-local interactions in determining the mechanism and rate of folding has been investigated for many years based on experiments, simulations, and theoretical considerations [16­18]. Globular proteins are very compact, and, hence, contact between residues that are separated by long chain segments is a common theme of their folded structures. Furthermore, unlike the coil-to-globule transition of homopolymers, the dependence of inter-residue distances on the segment length, n, in the native structure of globular proteins is very weak. Domains of globular proteins are strongly cross-linked by NLIs. NLIs can be contributed by specific interactions between clusters of residues or by third-party cross-linkers such as ligands [19] that form specific interactions with non-contiguous sections of the backbone. Disulfide bonds are another example of cross-linking that accelerates refolding by reduction of the entropy of the disordered state [20]. Yet, covalent crosslinking is a slow reaction, and during the initial folding of most proteins, the folded state is maintained by a large number of non-covalent NLIs. It is therefore reasonable to assume that NLIs are also instrumental in the mechanism responsible for the fast and efficient protein folding Orevi et al.: The loop hypothesis of protein folding171 transition. Berezovsky and Trifonov [21, 22] showed that protein structure can be viewed as a compact linear array of closed loops. Furthermore, they proposed that protein folding occurs through consecutive looping of the chain, with the loops ending primarily at hydrophobic nuclei. The role of non-local interactions in the early phases of folding pathways was described in many studies [23­34]. The view that non-local contacts can be effective early in the folding transition was first suggested based on simple lattice-based protein folding simulations [35, 36]. Using a cubic lattice simulation, Abkevich et al. [37] found that NLIs enhance the rate of folding. Molecular dynamics and a Monte Carlo Go model simulation were used to examine the events that lead to the formation of the transition state ensemble (TSE) [38­40]. Simulations of the folding transition by several groups showed the initiation of the pathway by NLIs [41, 42]. Zhang and Chan [43] performed explicit-chain simulations using coarse-grained chain models of natural proteins and computed the transient distributions of conformations sampled along the trajectories of many molecules. They found that conformations in the initial phases of the faster pathways were enriched with NLIs. Juraszek and Bolhuis [42] found that, in the folding of the small protein, Trp-cage, 80% of the pathways are initiated by NLI, while secondary structures appear later. Chomilier et al. [44­46] used native fold analysis and simulations to develop the concept of closed loops as early folding structures and to elucidate the mechanism of folding [44]. An algorithm was developed for the identification of the residues involved in the loop closure interaction, which were named most interacting residues (MIR). Segments that form loops in the folded state were identified [47]. The importance of the MIR elements was further validated by the simulation of mutations in these sequences and by the correlation with the known stabilities of 385 proteins with published stability data [48]. These studies of the MIR elements, which are located at the ends of long loops, strongly support the loop hypothesis. Insights into the relative roles of local and global interactions in determining folding mechanisms and rates were recently obtained by Peter and Shea [49]. The folding kinetics of a small protein, Trp-cage, was simulated in two pathways (with either the secondary or the tertiary structure formation as the rate-determining step), and it was concluded that the competition between the secondary and the tertiary structure formation affects the folding rate. However, many other studies supported the counterhypothesis that the folding mechanism is dominated by LIs, while the non-local ones provide non-specific stabilization of the compact conformers [6, 50­58]. A large number of studies provided evidence for a dominant role of LIs in the initiation of the folding transition. On the basis of entropic considerations, Rose et al. [59­62] suggested that helix and strand formation are guiding events in protein folding. Folding prediction by a fragment assembly mechanism, which is based mainly on early formed LIs, was successfully used by many groups [63­71]. The success of this approach strongly supports the importance of early formed LIs in stabilizing the final fold of small proteins. In these models, the NLIs are assumed to have a lesser role and can be non-specific. In the zipping and assembly (ZA) mechanism suggested by Dill et al. [53], local structuring happens first at independent sites along the chain, then those structures either grow (zip) or coalescence (assemble) with other structures [72­74]. Dill et al. [53] argue that the ZA mechanism provides a model for the physical pathways of protein folding. The fragment assembly (FA) (or ZA) models for protein folding predictions show impressive success when applied to small proteins. However, does the success of the FA methods mean that it predicts the folding routes? This should be tested by experiments, e.g., where the kinetics of each one of multiple sub-domain elements of model proteins will be studied in situ, during the time leading to the formation of the TSE. Daggett and Fersht [57, 75] conclude that only when the propensity of stable secondary structures is high do such structures form first, followed by their assembly. But in the majority of proteins, the unstable secondary structures are stabilized by NLIs. The interplay of local and non-local interactions in the folding transition The interplay between non-local hydrophobic contacts and local secondary structure was the subject of a series of reports on several well-studied model proteins. Numerous works showed that local and non-local contact are critical for the proper folding in both hierarchical and nucleation-condensation folding models. Studies of various mutants of apomyoglobin of different sources were reported by several groups where contributions of both local and non-local interactions were found [33, 76­80]. Numerous studies of the folding of SH3 domains showed a similar sequence of events where non-local contacts lead to the formation of a nucleus [23­25, 81­85], and the context dependence of secondary structure formation was also shown [86]. Folding transitions where non-local hydrophobic interactions stabilize secondary 172Orevi et al.: The loop hypothesis of protein folding structure elements were reported for several proteins [87­90]. The early steps of folding of dihydrofolate reductase were shown to involve stabilization of secondary structures by hydrophobic contacts [91]. A systematic study of mutants of an immunoglobulin V(L) domain suggests that local and non-local interactions contribute to the stability by an approximately equal amount, but that local interactions stabilize by increasing the resistance to denaturation, while non-local interactions increase folding cooperativity [92]. The question of the role of specific LIs or NLIs and their relative importance at the initiation of folding or for the determination of the direction and rate of the folding transition remains, and their mutual dependence is open and calls for more experiments and theoretical studies. The focus of the current review is the search for the role of local and non-local interactions in the earliest detected folding steps were only a few specific interactions are found. In order to achieve this goal, we took advantage of the speed of fluorescence detection and the power of various modes of FRET and trFRET measurements. enables monitoring of the conformational changes in each sub-domain structure. Very fast data collection enables determination of a series of sequential distance distributions along the folding pathway [99, 109­117]. Such experiments yield meaningful information describing specific conformational changes and the order of their occurrence along the folding pathway. The loop hypothesis We first proposed the "loop hypothesis" in 1995 based on the results of trFRET studies of the folding of doublelabeled (donor and acceptor) reduced bovine pancreatic trypsin inhibitor (BPTI) (Figure 1) [118, 119]. We hypothesized that the earliest steps in the folding transition are the closure of a small number of long loop segments by NLIs between clusters of mostly non-polar residues at their ends [107, 119]. In contrast to the "bottom-up" folding mechanism, we assumed a "top-down" pathway, whereby loop structure elements are formed at the initiation of folding. Secondary structures can form either at the same time or at a later time along the transition, but the NLIs that close the loops do not always depend on the secondary structures of interacting clusters. The biological advantages of such a mechanism are the (a) major constraint of the disordered ensemble and hence a large backbone entropy reduction per interaction, (b) reduced chance of aggregation or misfolding, (c) fast partial protection from proteolysis mechanisms in the cell [120], and (d) reduction of the number of available non-productive pathways. It was further suggested that a very small number of such closed loops sufficiently restrict the conformational space and force the protein into an ensemble of conformations, with the characteristics of the outline of the conformations populating the TSE and the folded ensembles. This was presented as a working hypothesis, which was meant to guide the research that would test the relative contribution and timing of the formation of LIs and NLIs at the earliest steps of the folding transition of globular proteins. A viable possible result might be that both types of interactions are effective at the early steps of the folding transition, and the challenge for both experiments and computations is to resolve the relative contributions and fine resolution of the sequence of formation of specific interactions at the initial phases of the folding transition. The feasibility of very fast specific long loop closure at the initiation of the folding is supported by several studies in which the kinetics of loop closure of long polypeptide segments were measured mainly by fluorescence Experimental strategy The search for the time and order of formation of subdomain structures stabilized by either NLIs or LIs depends on our ability to map selected conformational changes in ensembles of structures with an otherwise disordered backbone. Those ensembles should be captured in submicrosecond time frames during the fast folding transition starting at the ensemble of fast collapsed (disordered) molecules [93­99]. This is a major technical challenge, and, hence, many kinetic studies focused at the ratedetermining steps, i.e., the transition state at which the outline of the chain fold is already stabilized. To meet this challenge, methods based on time-resolved fluorescence resonance energy transfer (trFRET) were developed to characterize the ensembles of unfolded, collapsed, and partially folded globular protein molecules [100­105]. The method is based on a combination of site-specific labeling of selected pairs of residues by fluorescent donor and acceptor probes, and on the determination of distributions of intramolecular distances in ensembles of the labeled protein molecules by means of trFRET measurements. This approach enables monitoring fine changes of the end-to-end distance of preselected chain elements such as loops or secondary structure elements one at a time, in situ, in the context of the whole molecule [106­108]. Preparation of a series of labeled mutants of one protein Orevi et al.: The loop hypothesis of protein folding173 quenching experiments. Fast loop closure on a nanosecond time scale was reported [121­126]. Concerns about the possibly large negative entropy cost of loop closure reactions, which questions the feasibility of specific NLIs in the disordered ensemble, can be addressed by considering the following observations: (a) a study of the probability of loop closure showed a less than an order of magnitude change of probability when comparing 10- vs. 40-residue chain segments [127], and (b) several studies showed that the entropy change in long loop closure is not large enough to make it improbable [128, 129]. One explanation for these observations could be that only the loop ends are constrained, while the rest of the chain between them is free to occupy a large number of conformations under the constraints of the chain collapse. The entropy change is expected to be balanced by the interactions between the clusters at the nodes of the loop. The loop hypothesis and the nucleation-condensation mechanism The nucleation step was suggested to be a mechanism ensuring a fast and efficient folding transition [55, 130­ 132]. The nucleus is formed by residues from different parts of the chain, with unstructured loops between them [132]. The common theme of most of those mechanisms is the presence of a small number of NLIs that are formed prior to the main folding transition. In many cases, these contacts form long closed loops. The loop hypothesis is compatible with the nucleation-condensation mechanism [57] but describes a very different part of the folding pathway. The nucleationcondensation mechanism describes the formation of the nucleus, which is the hallmark of the TSE; all conformations belonging to the TSE have an obligate nucleus, which is often a specific well-defined set of contacts, both local and non-local. In the nucleation-condensation mechanism, the nucleation process is coupled to the TSE formation, while the "loop hypothesis" describes the formation of the earliest sub-domain structures, which precede the formation of the nucleus and are coupled to the initiation of the folding transition. The loop hypothesis assumes that the first step that follows the transfer of a fully disordered polypeptide to the folding conditions is a non-specific adaptation by a collapse to an ensemble of disordered molecules, whose global dimensions Figure 1Carton description of the "closed loop model". Five snapshots along the folding trajectory of a model protein: the unfolded state is represented by one of the many possible conformers. The three pairs of clusters of mostly hydrophobic residues that can form specific loop ends' lock are shown in three colors. Upon change to folding conditions, rapid chain collapse leads to a compact still disordered globule. During or immediately after chain collapse (collapsed I), the fast-formed loop structure locked by a pair of specific hydrophobic clusters is shown. In the next step (collapsed II, red and orange clusters), the second loop is formed within a very short time. The three closed loops already fix the overall native-like topology of the chain. At this point, a diffuse nucleus has been formed, and with activation-limited delay or right away (in the single-domain fast folders) the transition state ensemble (TSE) is formed. Finally, packing and complete desolvation are achieved and the folded state is stabilized (native). 174Orevi et al.: The loop hypothesis of protein folding are between those of the unfolded and the folded ensembles. This is followed by the fast formation of the small set of specific non-local contacts, which are assumed to contribute to the subsequent formation of the folding nucleus. It is assumed that short clusters of (mostly nonpolar) residues form the loop end nodes by specific steric complementarity and that these nodes form marginally stable specific non-local contacts that impose a partial order on the overall disordered molecules. In that sense, the experiments that identify the earliest contacts do not demonstrate nucleation, which is a later event coupled to the rate-limiting folding step of TSE formation. There is a relationship between fast formation of first persistent NLIs and nucleation. It is reasonable to assume that the stability of the few earliest native NLIs formed in the disordered collapsed ensemble would facilitate the subsequent completion of nucleus formation and determine its overall topology. Thus, the key distinction is that loop formation is a facilitating step for nucleus formation, directing its topology, and occurring in the collapsed disordered ensemble, while nucleation is an advanced folding event, indicating the formation of the TSE, which is the rate-limiting step of the global folding transition. The well-established "contact order correlation" [54, 133­136], which leads to the conclusion that the rate of folding correlates well with the number of interactions within short chain segments, seems to be at odds with the loop hypothesis. However, the contact order correlation is summed over the entire polypeptide chain and, hence, is not sensitive to a small number of closed loops that might affect the folding rate. Furthermore, that correlation refers to the rate of the global folding (the formation of the TSE), while the loop hypothesis describes the initiation of folding. was based on the presence of insertions and deletions in structurally aligned pairs of proteins that share essentially the same fold but not necessarily high sequence similarity. The algorithm was able to capture loop units that correspond to the closed loops found by Berezovsky and Trifonov above. Chintapalli et al. [140, 141] showed that (a) the folding rate correlates extremely closely with total contact distance evaluated only over the lock residues and (b) that the lock residues tend to have high values, "as would be expected for residues that play an important role in the transition structure for folding". Chintapalli et al. further suggested that the closed loop hypothesis is able to give an alternative description of the data obtained by Englander et al. [142­144] for cytochromes c and b562 (as well as for triosephosphate isomerase) initially interpreted in terms of independent folding units (foldons). The closed loop hypothesis-based mechanism was said to be "as elegant as the published explanations as it does not invoke discontinuous foldons". Bioinformatic analyses The growth of the protein structure databases and the advances in bioinformatics methods enabled large-scale analysis within structural types and homologs addressing the question of the role of local vs. non-local interactions in protein folding. The work of Govindarajan and Goldstein was mentioned in the Introduction section. A very different result was obtained by Unger and Moult [145, 146], who concluded that the foldability of a sequence is determined primarily by LIs. Yew et al. [147] analyzed the degree of conservation of loop end clusters and reported that 70% of these loop ends were found to be well conserved. In a recent bioinformatics study using a contemporary database, Noivirt-Brik et al. [148] assessed the importance of the two types of interactions through their evolutionary and structural conservation. The underlying assumption was that, in positions that form more critical contacts for the folding, stability and function are likely to be more conserved. They found that, for the majority of proteins found in the current database, non-local contacts are structurally and evolutionarily more conserved than the local ones. The loop as a basic folding unit based on the analysis of folded structures Berezovsky and Trifonov [22, 44, 137­139] analyzed the outline of the chain fold of the crystal structures of 302 proteins and proposed that protein structures can be viewed as compact linear arrays of closed loops. They proposed that protein folding progresses through the consecutive looping of the chain, with the loops ending primarily at hydrophobic nuclei termed "locks". An alternative approach to identify closed loop folding units was developed by Chintapalli et al. [140, 141]. Their strategy Mode of detection The interaction between the loop ends in the ensemble of collapsed molecules, where cooperativity is missing, is very weak. Therefore they could be observed by Orevi et al.: The loop hypothesis of protein folding175 characterization of the fine changes in the transient distributions of intramolecular distances between the clusters of residues that form the closing interactions. trFRET experiments are ideal for the detection of the formation of each closed long loop since it is possible to follow selected distances between two sites that are separated by a large number of residues, their distributions, and fast fluctuations [100, 104, 107, 108, 115, 149­151]. FRET-detected single-molecule approaches are available to meet this goal. Nevertheless, we applied mainly the ensemble detection methods for several reasons: deducing the folding pathway requires ensemble characteristics; natural amino acids, or mildly modified residues can be used and the structural perturbation can be minimized; full sets of fluorescence decay curves can be collected in nanosecond time intervals; application of the double kinetics approach is more straightforward compared to the single-molecule FRET-detected (smFRET)-based methods; pairs of probes suitable for monitoring distances in the range corresponding to sub-domain structural dimensions are readily available; and the ability to resolve subpopulations of intramolecular distances, a major strength of the smFRET-based methods, is also readily provided by the ensemble-based trFRET experiments [152]. Analyses of the trFRET measurements do not yield atom-to-atom distances, and depend on the size and dynamics of the probes and on the correct determination of the Förster constant, Ro. Yet, when the same pair of probes is used for studying a selected intramolecular distance under a series of conditions, the changes in the distribution of each distance are faithfully reported. Proper selection of pairs of probes enables angstrom resolution of mean distances by the trFRET measurements, such as in the case shown in the text box. 10,000 DO DA 300 600 900 Auto correlation 15.5 Å 16.3 Å Counts/channel 5000 DO 2500 IRF 0 0 4 0 -4 19.2 Å Folded 404 s 159 s DA 16.8 Å Res. 60 s Time (Channels) Time-resolved FRET in the double kinetics context. (A) The difference between the traces of the fluorescence decay [nanosecond time scale (ts)] of the donor emission in the absence of an acceptor (the DO experiment) and the presence of the acceptor (the DA experiment) is the result of the FRET mechanism, and it contains the information on the distribution of the distance between the two probes. When the fast conformational exchange within the ensemble of labeled protein molecules is slow relative to the lifetime of the excited state of the donor, the distribution of intramolecular distance, No(r), can be recovered by modeling i(t) using the expression: R 6 t o i ( t ) = k No ( r )exp - o 1 + 6 dr r 0 D o o where k is a proportionality factor and D is obtained from the analysis of the DO trace. Knowledge of Ro and D enables the extraction of No(r) from the multi-exponential decay curve, i(t) (the DA trace). (B) When both the DA and the DO traces are collected in very short time intervals [short relative to the rate of change of conformations (ts < c)], a series of distributions of intramolecular distance can be < t obtained. In the following, we review FRET-based folding studies relevant to the loop hypothesis. We start by briefly reviewing equilibrium folding/unfolding studies and continue with the review of studies of the collapsed state. This is followed by steady-state and trFRET-detected folding kinetics experiments of double-labeled AK mutants designed 176Orevi et al.: The loop hypothesis of protein folding for the detection of either long loop closure or folding of secondary structure elements, first by stopped flow and then by fast mixing double kinetics experiments. the distributions of end-to-end distances of each segment. The sizes of the two sub-populations were temperature and denaturant concentration dependent [119], i.e., transition to conditions more favorable to folding increased the native-like sub-population and reduced the unfolded one. The interpretation of these observations led to the suggestions that the native-like non-local contacts between the ends of the labeled segments form loop structures, which are the basic folding unit [119]. A similar effect was reported by Klein-Seetharaman et al. [155, 156] based on the NMR study of the denatured state of lysozyme. The specificity of the observed interactions is in agreement with Baldwin and Rose [157], who suggested that a specific stereochemical code for intramolecular interactions directs the folding transition. Buckler et al. [158] and Navon et al. [159] applied the FRET-based experimental approach to the study of unfolded and partially folded states of reduced bovine ribonuclease A (RNase A) (124 residues). The distance distribution between the C-terminal residue (residue 124) and residue 76 (a 49-residue chain segment, representing a long C terminal loop) was another example in which two sub-populations were resolved. The distributions represent an equilibrium between nativelike and unfolded-like distance between the ends of the loop under partial denaturing conditions. Interestingly, under the same conditions, but in phosphate buffer, the native-like sub-population was dominant. In that case, the C terminal loop structure of RNase A was closed by the phosphate ions, which bind in the active site and thereby cross-link residues 12, 19, and 119 (Figure 3). Studies of the mechanism and kinetics of the folding transition Equilibrium studies BPTI (Figure 2) is a small stable globular protein of 58 residues, including four lysine residues and three disulfide bonds [118, 153]. Distributions of the end-to-end distance of four segments of the BPTI backbone were determined [100]. Reduced BPTI in 6 mol/L of GndHCl is fully denatured [153, 154]. Yet, at least two sub-populations, one native-like and one unfolded-like, were distinguished in Multiprobe FRET studies of the initial collapse transition The introduction of FRET-detected single-molecule detection methods enabled the study of sub-populations of collapsed molecule at equilibrium with sub-populations of folded molecules. Studying single molecules under partially folding conditions one by one and then collecting those that show short intramolecular distances into an ensemble of collapsed molecules at equilibrium with the disordered ensemble enables the mapping of intramolecular distances in the collapsed ensemble (reviewed by Haran [97], Schuler and Eaton [160] and Ferreon and Deniz [161]). The collapsed state of protein molecules was studied by many researchers at equilibrium under partial folding conditions in the presence of the unfolded ensemble. Single-molecule FRET was applied [150, 161­163], and the dimensions of the Figure 2The backbone structure of BPTI and the pairs that were labeled in four different preparations. The sites of the four lysine residues that were labeled by the acceptor (15, 26, 41, and 46) as well as the N terminal residue where the donor was attached are shown. The closure of the N terminal loop is monitored by the measurements of the distribution of the distance between residues 1 and 26. Orevi et al.: The loop hypothesis of protein folding177 Steady-state and time-resolved FRETdetected folding kinetics experiments An ideal folding kinetics experiment would produce a time-dependent series of three-dimensional structures. These structures describe the transition of an ensemble of molecules from an unfolded state under folding conditions to the folded state. The path would probably start with a collection of all possible compact configurations and gradually reduce the number of populated conformations. Either a parallel continuous change at all segments or, more likely, partially ordered structures or folding initiation sites in various domains would become visible at different time points. Analysis of such an ideal experiment should reveal the order of structural transitions and the formation of structural elements along very broad and gradually narrowing pathways to the native state. Such a series of structures should enable the inference of the basic principles of the master plan of the folding mechanism. If such an ideal experiment could be combined with sitedirected perturbation mutagenesis, it might also be possible to search for the "sequence signals", i.e., inter-residue interactions that stabilize and lock structural elements, either simultaneously or sequentially. Such experiments could enable the compilation of a "dictionary" that would relate the types of clusters of residues, either contiguous or non-contiguous, to the types of conformational events and structures, and the timing of their appearance during the folding transition. A pioneer of the application of the distance dependence of the resonance energy transfer effect [170] is I.Z. Steinberg [171], who introduced the early applications of time-resolved FRET to biopolymer studies. A first application of intramolecular FRET in peptides was reported by Edelhoch et al. [172]. Steady-state and time-resolved FRETdetected kinetics experiments are far from the "ideal" experiment described previously, as they are unable to yield atomic resolution, are limited to a few pairs of sites, and involve modifications of the protein. This is a lowresolution measurement, but the arsenal of FRET experiments displays some unique qualities that justify the efforts required to produce site-specifically labeled protein samples. The goal of these experiments is the production of a time series of distributions of selected key intramolecular distances (e.g., distances between or internal to structural elements). Such series can enable the transient structures of folding intermediates to be characterized. Other factors that add to the unique strength of kinetic FRET experiments include the range of detected distances, the time resolution (sub-nanosecond), the interpretation of spectroscopic data in terms of intramolecular distances Figure 3Effective cross-linking by a substrate-mimicking ligand enhances the closure of the C terminal loop of the RNase A backbone. The phosphate ions can keep the three residues, widely separated along the chain in close proximity and thus might increase the probability of closure of the long C terminal loop labeled at residues 76 and 124 or 115. The coordinates for this drawing were taken from the crystal structure of a complex of RNase A with a phosphate ion (PDB file 5RSA). ensemble of collapsed molecules were compared with the results obtained by small-angle X-ray scattering [96, 164­166]. The FRET experiments showed that the dimensions of the ensemble of collapsed protein molecules are smaller than those of the ensemble of unfolded molecules but larger than those of the folded state [99]. The collapsed state was described as a globular state of proteins that is akin to the collapsed state of polymers, as it is predominantly disordered (reviewed by Haran [9] and Udgaonkar [99, 167­169]). However, in the ensembles of collapsed molecules collected under equilibrium under partial folding conditions, all early sub-domain structures that appear under folding conditions are already at least partially formed [99]. Thus, for the analysis of the folding mechanism starting with the ensemble of disordered molecules under folding conditions, ultrafast kinetics must be used. Rapid initiation of folding, combined with ultrafast collection of data to determine the distributions of intramolecular distances prior to the onset of fast partial stabilization of the first sub-domain structures, should enable the ensemble of collapsed molecules to be characterized. This ensemble is the starting point of the folding pathway. Fast collection of data at consecutive time points should enable the detection of the sequence of formation of sub-domain structures that are the leading building blocks of the folding mechanism and, hence, determine its path. To this end, we applied the "double kinetics" approach. 178Orevi et al.: The loop hypothesis of protein folding (10­100 Å, i.e., the dimensions of protein molecules), the capacity for real-time detection, and the ability to determine the transient distributions of distances. Both steady-state and time-resolved detection may be applied together with any method of fast initiation of the folding or unfolding transition, e.g., stopped flow, continuous flow, or T-jump. In general, we preferred to induce refolding from chemically denatured ensembles since, under such conditions, the number of residual structures is minimal and the probability of refolding starting from a short-lived fully disordered collapsed ensemble is high. specific hydrophobic contacts can rapidly form. The less specific force is the local hydrophobicity that directs parts of the chain to the interior, which, in turn, increases the probability of forming close contacts with other (nonlocal) hydrophobic regions. Then, more specific recognition is achieved on the basis of stereo-specific alignment between defined clusters or residues. These features are encoded in the linear sequence of the chain. trFRET detection of rapid folding kinetics: the "double kinetics" experiment The transient transfer efficiencies determined by steadystate detection of the fluorescence intensities of the donor or the acceptor probes report rapid changes in distances. But since the conformations found in ensembles of partially folded protein molecules are inherently heterogeneous, the mean transfer efficiency cannot be used to determine any meaningful mean distances. However, the mean and width of the distributions of distances in these rapidly changing ensembles of partially folded protein molecules can be determined by the rapid recording of time-resolved fluorescence decay curves of the probes. This may be achieved by the double kinetics experiment (see text box). The double kinetics [114, 115, 166, 182] folding/unfolding experiments combine the fast initiation of folding/ unfolding transitions with the rapid change in solution conditions, synchronized with rapid determination of fluorescence decay curves. The challenge here is twofold: first, to collect fluorescence decay curves with a sufficiently high signal-to-noise ratio to enable the determination of statistically significant parameters of the transient distribution of distances at each time point, No(r, t), and, second, to synchronize the refolding initiation mechanism with the probe pulsed laser source. The instruments that were developed enable the time series of transient distributions of the distance between pairs of probes attached to the ends of selected chain segments during the fast refolding transition to be determined. Two time regimes are involved in this experimental approach: the "chemical time regime" (tc) (microseconds to seconds), which is the duration of the conformational transition, and the "spectroscopic time regime" (ts), which is the nanosecond fluorescence decay of the probes. Combining this instrumental approach with the production of a series of protein samples, site-specifically labeled with donor and acceptor pairs, enables the characterization of the backbone fold and flexibility in transient intermediate states during the protein folding transitions. Steady-state FRET detection of the kinetics folding Elegant studies based on steady-state detected FRET monitoring of the kinetics of folding were reported, and we describe next some early representative examples. Chan et al. [173] used a FRET-detected ultrarapidmixing continuous-flow method to study the sub-millisecond folding of chemically denatured cytochrome c. A fast collapse followed by a second folding transition was resolved. Another early FRET experiment was reported by Teilum et al. [115], who studied the early conformational events during the refolding of acyl-CoA binding protein, an 86-residue -helical protein. Udgaonkar [99] is another pioneer of the application of FRET-detected folding kinetics experiments. Multiple distances were determined in model proteins to characterize the ensemble of collapsed molecules at the earliest possible time after the initiation of folding [99]. A study of the folding of barstar (89 residues) [99, 151] showed a fast reduction of the intersegmental distances in a small number of labeled pairs, while some other pairs showed only a slow transition to native intramolecular distances. Such heterogeneity is a hallmark of the formation of specific non-local contacts at some parts of the chain. Ultrafast (microsecond) formation of a small number of non-local interactions at the initiation of folding was reported for several systems [78, 80, 174­177]. Mirny and Shakhnovich [132] reviewed a number of folding kinetics studies and found that, in all cases, secondary interactions play, at most, a minor role in determining folding kinetics. Instead, strong and specific hydrophobic NLIs seem to dominate. A number of experiments demonstrated the role of clusters of non-polar residues in the formation of NLIs at the initial phases of folding transitions [91, 83, 178­181]. These experiments shed light on the mechanism by which Orevi et al.: The loop hypothesis of protein folding179 Kimura et al. [183] used trFRET to determine the distance distributions of two loops/residue pairs in cytochrome c (125 residues), 150 s after the refolding initiation. It was found that one distance distribution (Trp32-heme) was native-like, while the distribution of the second pair (72-heme) was still unfolded. The double kinetics approach was applied by Matthews et al. [103, 117, 184, 185], who studied several different proteins, revealing the very early formation of structural elements and highlighting the role of the Ile, Leu, and Val residues in the formation of the intramolecular interactions. The early steps (30 s) of the folding transition of the subunit of tryptophan synthase [185] and cytochrome c were reported. in which a small number of specific interactions can be formed. Estimation of the radius of the ensemble of collapsed globular protein molecules Unlike the case of fully unfolded polypeptide, where the end-to-end distance of chain segments depends on the number of residues (n) in each segment [192, 193], in the non-specifically collapsed ensemble, where non-specific monomer-monomer interactions exist, the mean end-toend distances are constrained within the dimensions of the ensemble. Thus, it is expected that, in the initially collapsed ensemble, segments of very different numbers of residues could exhibit quite similar values of mean endto-end distances. Sinha and Udegaonkar [151] studied the mean distance between nine pairs of sites in the collapsed ensemble of refolding barstar (an 89-residue protein) molecules and found that segments of 12­51 residues have approximate mean distances in the range 18­21 Å without a clear length dependence. Camacho and Thirumalai [194] found only a weak dependence of the probability of endto-end interaction on chain length. Escherichia coli adenylate kinase (AK) is a 214-residue, three-domain bacterial protein that catalyzes the transfer of a phosphoryl group between ATP and AMP [195­198]. We used this protein as a model for testing the loop hypothesis since it is a large protein in which site-specific labeling methods combined with the strength of trFRET methods enabled us to study the sequence of folding transitions of each domain and sub-domain structure in the context of the full-size molecule in situ. The native topology of AK includes seven long loops. The residues at the opposite termini of each loop-forming chain segment form a strong loop node, as judged by the number of distances that are Å between the atoms of the contacting termini. <5 Methods for the site-specific labeling of pairs of sites with probes that cause minimal structural perturbations were developed. Each labeled mutant was tested by a series of control experiments including assessing enzymatic activity, far UV CD spectroscopy, and reversibility of unfolding/ folding transitions [106, 199]. The experimental strategy is based on the preparation of protein mutants labeled at the pairs of sites selected so that the distribution of the distance between them is sensitive to a specific sub-domain structural element, either a long loop or a secondary structure element. Each double-labeled mutant (donor and acceptor, "DA mutant") is accompanied by a second mutant of the same exact sequence as that of the DA mutant, except that the site for The ensemble of collapsed molecules Many experiments show that the mean radius of the collapsed globule is about 30% larger than that of the same molecule in its fully folded state [95, 103, 165, 186]. Thus, almost half of its volume is occupied by solvent molecules. This implies that, in the collapsed state, few residue-residue interactions are effective, chain dynamics are not inhibited, and the probability of residue-residue encounters is enhanced. Remaining questions regarding this folding stage include identifying the key interactions that appear in the collapsed globule that contribute to the entry into the programmed pathway, and the residues that contribute them. FRET-detected fast kinetics experiments show that, under strongly stabilizing conditions, the initial non-specific collapse reaction is quickly followed by a structureforming reaction that is completed within a millisecond or so of the folding and that leads to the formation of a partially structured and collapsed intermediate [99, 103, 117, 123, 126, 175, 184, 187, 188]. The microsecond folding kinetics detected by FRET and other probes was used to characterize the ensemble of fast collapsed disordered molecules, which include a few specific interactions [105, 113, 117, 151, 166, 184]. Barrier-limited chain contraction and specific sub-domain interactions, in particular by isoleucine, leucine, and valine (ILV) clusters, were observed upon transfer of the GndHCl denatured state ensemble to native-like conditions [166, 185, 189, 190]. An added value of the time-resolved FRET experiments is the ability to resolve subpopulations under conditions of fast exchange [106, 159, 191]. As suggested earlier [149], the collapsed molecules do not constitute a separate thermodynamic state, but belong to the disordered state 180Orevi et al.: The loop hypothesis of protein folding the acceptor is blocked by acetamide (the "DO mutant"). This second mutant is used for reference measurements to determine the fluorescence decay of the donor in the absence of an acceptor under each experimental setup. To estimate the size of the collapsed form of AK, we can choose the most compact structure of the AK molecule, that of its complex with the inhibitor Ap5A as the starting structure [196]. That structure can be approximated as a sphere whose radius is 23 Å, leading to a prediction of a radius of 30 Å for the collapsed ensemble of AK molecules under folding conditions. The mean distance between two points randomly located within a sphere is 1.03 fold larger than the radius of the sphere, i.e., almost identical to the sphere radius. Based on these estimations of the radius of the collapsed globule (30 Å), and taking into account that FRET-determined distances can be extended by 3 Å due to the size of the probes, any result of the end-to-end distance of a chain segment that is longer than a few persistence lengths [200] smaller than 26 Å can be considered as an indication of bias in the randomly collapsed ensemble and of specific interactions. Similarly, significantly larger mean distances are also indications of specific interactions that cause a deviation from the randomness of the collapsed ensemble. In the collapsed ensemble of the AK molecules, the dependence of the segmental end-to-end distances for segments that are larger than approx. 15 residues should be very weak and they are expected to be close to the radius of the collapsed globule. Ratner et al. [110] reported the transient end-to-end distance of three segments of the AK molecule at 5 ms after the transfer from denaturing to folding conditions. These segments, which were shown to fold only in the seconds time regime (much longer than 5 ms), had n 5, 20, and =1 176 residues, and their corresponding mean end-to-end distances under folding conditions prior to the folding transition were 22, 25, and 26 Å respectively. studies were focused on the ensemble of collapsed molecules, and, hence, even mutants in which the labeling procedures caused moderate perturbation detected by the control experiments could be used in this study since, in the mostly disordered ensembles of interest, such perturbations have no effect. Stopped flow double kinetics studies A measurement system developed by Ratner and Haas [114] was based on low-frequency (10 MHz) laser pulses and a fast digitizer oscilloscope. The time resolution of the spectroscopic time scale, ts, in this mode of double kinetics experiment is 250 ps. Up to 20 fluorescence decay curves can be measured with an acceptable signalto-noise ratio using a single stopped-flow run. The time resolution was further enhanced by using a femtosecond laser source and averaging multiple emission pulses [113] (Figure 4). AK is fully disordered at 2 mol/L of GndHCl at pH 7. Upon transfer to the folding conditions (by fast dilution of the denaturant), the initial transient collapsed ensemble of AK conformers appears disordered and refolds to a native structure through an apparent (depending on the probe and detection method) two-state mechanism with a rate constant of 0.5 s­1. At the end of the dead time of the stopped flow device, all the mutants showed a fast increase in the transfer efficiency and a correspondingly reduced mean distance. This is a clear manifestation of the expected non-specific fast collapse upon transfer to a poor solvent. trFRET determination of the dimensions of the collapsed ensemble of AK Three mutant pairs were used to monitor the end-to-end distance of very long chain segments [131 residues (73­ 203); 176 residues (28­203), and 186 residues (18­203)] (Figure 5A). The last two segments include all three domains of the protein, but the labels are located in the CORE domain and report the intramolecular distances in that domain. In the collapsed ensemble, at the end of the dead time of the stopped flow device, the mean of the distribution of the intramolecular distances of the 131-residue segment was 26 Å, in accordance with the ±2 expected radius of the ensemble of collapsed AK molecules. The distance between residues 18 and 203 was larger, although much smaller than their distance in the denatured state. The full width at half maximum (FWHM) Detection of the initial sub-domain structures in the collapsed ensemble of refolding molecules The next step was a test of the loop hypothesis by searching for the rate of closure of long loops in the AK molecule using the FRET-detected double kinetics method. Equipped with the capacity to probe selected fast-changing distributions of distances even at nanosecond time intervals, we designed and measured the folding transition of a series of mutants, monitoring mainly putative loop closure and secondary structure elements. Our Orevi et al.: The loop hypothesis of protein folding181 Figure 4Instrument for the time-resolved FRET measurements in the stopped flow mode. The 297-nm third harmonics beam of the Ti:sapphire laser was used as the excitation source, operating with a repetition rate of 8 MHz. The excitation beam was focused at the center of the stopped flow cell along its long dimension. Emission was collected at 90°, filtered, and then focused onto the photocathode of the micro-channel plate photomultiplier tube. The signal was sampled at 40 GSa/s with 13 GHz of bandwidth. A digital delay generator was used for the triggering of each data acquisition sweep of the oscilloscope. A photodiode module was used to determine the time position of every excitation pulse. of the distribution of this distance at the initial collapsed state was relatively small, which is an indication of a deviation from full disorder in the ensemble, i.e., the 5-ms collapsed ensemble was not randomly collapsed. It is possible that the spherical approximation is not valid for this state due to several specific interactions in the N terminal section of the chain. The mean of the distribution of the intramolecular distance between residues 28 and 203 in the collapsed ensemble was also larger than 26 Å and only 10% larger than that found for the native state ensemble. This is probably also an effect of the specific interactions of the fast closed loops of the CORE domain as manifested by the relatively small width of the distributions in the collapsed ensemble. These results help establish a reference. reduction of the sub-population characterized by a larger mean distance of the collapsed ensemble. At equilibrium, all AK molecules reached the native distance between residues 18 and 203 (Figure 6). The native ensemble has a high degree of order, which is reflected in the narrow intramolecular distance distribution characterized by a mean distance of 15.2 .4 Å and width (FWHM) of 8 . The rate of ±0 ±1 the growth of the native subpopulation was 0.26 .06 s­1, ±0 which is lower than the apparent rate constant obtained from the mean FRET efficiency (0.43 .01 s­1). This differ±0 ence is a typical result of the non-linearity of the distance dependence of the FRET efficiency. A small increase in the short distance sub-population causes a large increase in the mean transfer efficiency when the change brings the pair closer to Ro. Two subpopulations Analysis of the double kinetics experiment monitoring the donor fluorescence decay in the 18­203 mutants during the folding transition revealed a gradual change of proportion between the two subpopulations. As the folding proceeded, a sub-population with native-like intramolecular distance increased concomitantly, with a gradual Fast closure of long loops in the AK CORE domain Seven pairs of sites at the ends of sub-domain structures of interest were labeled in six sets of mutants comprising several groups (Table 1): (a) The N terminal loop [residues 1­24; Figure 5C (loop I)] and the AMPbind domain loop 182Orevi et al.: The loop hypothesis of protein folding Figure 5Ribbon diagram representing the fast- and slow-folding sub-domain loop and the secondary structure elements marked on the native-state chain fold of the backbone of the E. coli AK molecule (PDB ID code:4AKE). (A) Secondary structure elements of the AK molecule (color coded). Those that remained disordered at 5 ms after the initiation of the mixing into the folding buffer (the dead time) are colored in black. Those that gained native-like mean end-to-end distance at the 5-ms detection time are colored green. Parts of the chain that were not tested are colored in gold. Most of the segments that formed secondary structures that constituted the core domain of AK were still disordered in the initial phase of the folding transition. (B) Pairs of sites that were separated by long chain segments that were labeled and studied by the stopped flow-based kinetics methods. (C) Seven loops of the nativestate structure of the AK molecule that were labeled and used in the folding kinetics studies. (loop II) (residues 29­73, Figure 5C), which are typical elementary long closed loop structures; (b) two loop elements (loop III residues 1­86 and loop IV residues 1­109; Figure 5C), which represent "merged" loop elements, i.e., long segments ( 0 residues) whose ends are in close >4 contact, stabilized by hydrophobic interactions that also enclose at least one elementary loop within their boundaries; (c) two loops associated with the LID domain (loop V, residues 121­155) and its extended version (loop VI residues 113­169); and (d) a small loop included in loop IV (residues 66­95). Replacement of any one of the six N terminal residues of the AK backbone caused a loss of expression, which is an indication of their importance for the folding mechanism. Therefore, to label the N terminus of the AK molecule, we extended the polypeptide backbone using a four-residue insert (Met­4-Lys­3-Cys­2-Ala­1). The inserted Cys residue at position ­2 was labeled in mutants designed to study the folding of the closed loop structures (I, III, and IV). A scheme of the N terminal loops of AK and the non-polar residues that may contribute interactions in the loop nodes are shown in Figure 7. Kinetics of sub-domain transitions The closure of the seven labeled loops in the AK molecule was monitored by the determination of the mean FRET efficiency, (t) in the stopped flow device. The closure , of the two N terminal loops (I and II) was also studied Orevi et al.: The loop hypothesis of protein folding183 A 0.06 0.04 p (r) 0.02 0 0.03 0.02 p (r) 5 ms: collapsed state 30% 1.5 s: appearance of native population 70% C 0.04 0.03 p (r) 0.02 55% 0.01 0 3 s: partial folding 45% D 0.06 0.04 p (r) 0.02 6 s: advanced folding 80% 20% p (r) 0 0.1 Equilibrium 0.05 40 60 Distance, Å Figure 6The relative population (percent) of the native ensemble of the mutant labeled at residues 18 and 203 as a function of time after the initiation of refolding. Fitting to a mono-exponential function (y +be­kt) gave a rate =a constant of 0.26 .06 s­1 for the transition from collapsed to native ±0 ensemble. The uncertainty in the value of the folded ensemble parameter at 1.5, 3, and 6 s was %. ±4 i.e., in the seconds time regime. For loop IV, which is a long "merged" loop containing the three fast closing N terminal loops, a biphasic time course was observed, i.e., a fast partial closure synchronized with the closure of the included N terminal loops, followed by a slow full closure to native-like end-to-end distance. This pattern of fast closure of only a few loops and a slow closure of others is a strong indication for the specificity in the folding pathway and of a hierarchic folding "plan". In loops I and II, which were closed first, the number of residues between the labeled sites were 26 and 44, respectively. It is reasonable to assume that, in a fully disordered but collapsed ensemble, the corresponding mean of the distributions of segment end-to-end distances would also be in the range of 26 Å. The rapidly closed loops are associated with the ±2 N terminal section of the CORE domain, and it is therefore reasonable to assume that they contribute to the formation of a non-contiguous folding nucleus. Loop I includes the cluster of N-terminal non-polar residues (1­6), which form several non-local interactions in the native state and thus seem to be the "king pin" of the native structure of the CORE domain. Several attempts to produce foldable AK mutants in which residues 2, 3, or 4 were replaced by Cys or Trp or Ala failed. These are further indications that the residues responsible for the closure of loop I and are also connected to loops III and IV are essential for the folding of the whole molecule, probably by involvement in the folding initiation step as an apparent chain folding initiation site [202]. The slow closure of loops V and VI, which include the LID domain, shows that this domain is not natively folded at the completion of the fast mixing. The kinetics of change of FRET efficiency between the ends of three labeled chain segments, which included sections of loop II and additional chain sections at its C terminal end, were monitored (Table 1). These segments, which included part of loop II, reached native-like mean transfer efficiency ( (t) only at the rate-limiting step, i.e., slow. That is ) an indication that the internal sections of loop II are not folded prior to the rate-limiting step. Internal segments in the closed loops seem to be partially disordered. by the double kinetics experiment (Figures 8 and 9). The micromilli second kinetics of change of (t) between the ends of the three N terminal loops, I, II, and III, were native-like within the dead time of the stopped flow device [111, 201]. In sharp contrast, the FRET efficiency between the ends of loops V, VI, and VII increased at a much slower rate, comparable to the rate of the global folding transition, Microsecond kinetics Preliminary results of the application of double kinetics experiments based on a continuous mixing device done in the Bilsel et al. [112] laboratory revealed microsecond kinetics of closure of loops I and II. Within 60 s, both loops were close to the native end to end, and within 250 s, full closure of the loop was observed. 184Orevi et al.: The loop hypothesis of protein folding Table 1Pairs of sites that were labeled in the AK molecule and monitored by folding kinetics experiments.a Structural element Residues labeled Time of transition to native mean distance of FRET efficiencyb Mean distance in the collapsed ensemble (Å)c 20 17 25 ±1 22 ±1 19.4 .8 ±0 0 Å >3 26 Å ±2 0 Å >3 26 Å ±2 nd Loops Loop I (residues 1­26) Loop II (residues 29­73) Merged loop III (residues 1­75) Extended loop IV (1­102) Loop V, the LID domain (121­155) Loop VI, "extended" loop V Loop VII, internal section of loop IV Secondary structure elements Helix 8 Strand 9 (192­198) Strand 1 (1­6) Strand 3 (79­85) Strand 4 (104­110) Long segments incorporating the three domains Overall dimensions (186 residues) Overall dimensions (176 residues) Overall dimensions (131 residues) Section of loop II Section of loop II ­2 and 24 28 and 71 ­2 and 75 ­2 and 102 121 and 155 113 and 169 66 and 95 169 and 188 188 and 203 ­2 and 8 79 and 86 104 and 109 18 and 203 28 and 203 73 and 203 28 and 86 58­86 00 s <2 00 s <2 ms <5 Partial fast ( ms) closure <5 Slow Slow Slow Slow at the cooperative folding transition Slow Slow Slow Slow Slow two-state transition Close to native in the collapse Slow Slow This table summarizes the results of many experiments. b"Slow" indicates that the transition to the native value of the parameters is in the seconds time regime. cMean distance between the ends of the labeled segment in the collapsed state at the end of the dead time of the mixing (either stopped flow or continuous mixing) available for the mutants studied by the double kinetics experiment. For the other mutants, the rate of folding was deduced only from the rate of change of the mean FRET efficiency. dNumber of residues in the labeled chain segment. Loop IV Loop III Loop I (N terminal) 2 Loop II C MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKSGSELGKQAKDIMDAGKLVTDELVIALV KERIAQEDCRNGFLLDGFPRTIPQADAMKEAGINVDYVLEFDV Figure 7Schematic representation of the loop structures in the CORE domain of the AK molecule. (A) The backbone fold of the AK molecule (PDB 4AKE). The hydrophobic clusters of residues involved in the formation of the non-local contacts that formed the nodes of the closed loops are colored. (B) Schematic representation of the native structure loops formed by the chain segments that connected the interacting clusters. (C) The amino acid sequence of the N terminal section of the AK molecule highlighting the clusters of hydrophobic residues that form the loop nodes [color coded as in (A) and (B)]. Kinetics of folding of secondary structure elements in the AK molecule In order to test the loop hypothesis, we wished to answer the question whether long loop closure in the CORE domain of the AK molecule is dependent on the earlier formation of secondary structure elements or not. Five secondary structure elements in the CORE domain were labeled (Table 1), and the folding kinetics were studied by the FRET-detected stopped flow system. Orevi et al.: The loop hypothesis of protein folding185 5 ms 0.5 s 0.1 1s 1.5 s P (r) 3s 0.05 6s 9s 5 min Unfolded 0 40 Distance, Å Figure 8Intra-molecular distance distribution between resides 71 and 28 during refolding and at equilibrium. The broken black line indicates the distance distribution under denaturing conditions (2 mol/L of GndHCl) measured with the tryptophan and coumarin FRET pair (R0 4 Å). The mean and FWHM =2 in the denatured state were 38 and 34 Å, respectively. The blue and black solid lines mark the 5 ms transient distribution and the distribution at 5 min after mixing, respectively. of the folding pathway of the AK molecule. In preliminary results, replacement of non-polar residues at the termini of loop II resulted in a reduced rate of loop closure. When residue Leu35 was replaced by Ala, the protein was destabilized and the yield of folding was extremely poor. The loss of protein expression following all attempts to mutate residues 2­4 is another possible example of the effect of loss of hydrophobic interactions in the loop nodes. Our results are in good agreement with the sequential collapse model for protein folding [203, 204] and several theoretical works [21, 43, 205, 206]. The overview of the FRET-detected AK folding experiments reviewed here portrays a sequence of sub-domain and global transitions that form a hierarchically ordered pathway spread over four (or more) time scales, from microseconds to seconds (Figure 10). Concluding remarks The protein folding problem will be considered "solved" when we are able to "read" genes, i.e., to predict the native fold of the proteins, their dynamics, and the mechanism of fast folding, based simply on primary amino acid sequence. The long-term goal in folding studies is the development of an algorithm that would simulate the stepwise mechanism of folding, which constrains the conformational space in which a random search for stable interactions is possible. Our working hypothesis is that the fast and efficient folding transition is based on the ordered formation of sub-domain structures that are "instructed" by sequence signals in terms of intramolecular interactions between clusters of residues, and between them and the solvent. The very long term goal of our studies is to create a "dictionary" of generic sequence clusters that can form loop nodes by non-local interactions and to define the order of formation of their interactions during the search for the native combination of interactions. This dictionary could be part of a database and used as part of the folding prediction algorithm. Here we focused on only a single element in the complex problem, the structural constraints generated by the loop closure and the secondary structure, and applied trFRET-based methods to capture the timing of specific steps at the initiation of the folding transition. The results reviewed here demonstrate the power of the FRET-based methods in studying sub-domain transitions in situ. Non-local contacts were detected by other methods (e.g., hydrogen exchange [207­210]), but FRET measurements are unique in their ability to directly yield All of them were disordered at the initial phase of the folding pathway [110, 152]. Analysis of the trFRET double kinetics experiments showed the shift of the mean endto-end distance between the ends of strands and the rate of reduction of FWHM of the distance distributions. Strands that were included in long closed loops (e.g., 1 and 3) were disordered, while the loops (I and III) were closed. This was also the case of strand 4 which forms the C terminal side of the node of loop IV (with strand 1 on the N terminal side). Thus, the closure of loops I, III, and IV is not dependent on the secondary structures of their nodes. The segment forming the long helix 8 (20 residues, 170­189) was also fully disordered at 5 ms and changed to the native end-to-end distance at the rate of the cooperative folding transition of the AK molecule [110]. Strand 9, the fifth (C terminal) strand of the (parallel) sheet structure that dominates the CORE domain of the AK molecule (labeled at residues 188 and 203), also formed only in the slow cooperative transition of the folding transition. Thus, we can conclude that, at least in some cases, loop nodes were formed independently of the formation of secondary structure elements, particularly in the case of strands. The aforementioned results provide strong evidence in support of the loop hypothesis. Yet, further tests must be performed in order to firmly establish the role of the early closed long loop in the initiation and propagation 186Orevi et al.: The loop hypothesis of protein folding A 104 15 ms 1.5 s 3.0 s Unfolded 15 ms 1.5 s 3.0 s 4.5 s 5 min Photon counts 5.0 min Probability 4.5 s 0.00 102 0 1 2 3 4 5 Time, ns 0.20 0.15 Probability 0.10 0.05 0.00 0 10 20 30 40 50 60 6 7 8 0 10 20 30 40 50 Distance, Å Unfolded 15 ms 1.5 s 3.0 s 4.5 s 5 min Distance, Å Figure 9A series of transient segmental end-to-end distance distribution monitoring the progress of folding of the N terminal loop I and strand 3 obtained by trFRET measurement in the double kinetics experiment. (A) A series of fluorescence decay curves of the tryptophan residue 24 in the absence of an acceptor at residue ­2. Each curve was collected during a 2-ms time interval at the predetermined refolding time point as indicated by the color code. The black traces represent the best-fit calculated decay curve. The gradual reduction of the characteristic of the probe (mean fluorescence lifetime), which reflects local structural changes, is in contrast to the immediate change of the distance between the ends of loop I (loop closure), which was found by global analysis of the series of DO and DA experiments. (B) A series of transient distributions of the distance between the ends of the segment forming strand 3, obtained by global analysis of the DO and DA double kinetics experiment of the mutant AK [24, 79]. The distance distribution of the 15-ms species ( ) fully overlaps with that obtained at equilibrium under unfolding conditions ( ). The narrow native-like end-to-end distance distribution characteristic of the extended conformation of the native strand structure was completed only 3 s after the refolding initiation ( ). (C) Series of transient distributions of the distance between the labeled ends of the N terminal loop I obtained by global analysis of the transient fluorescence decay curves of the Trp residue in the DO and DA mutants by means of the double kinetics experiment. The 15-ms distribution ( ) fully overlaps with the native fold distribution ( ). Thus this loop is closed within or immediately after the initiation of folding. The shift of the mean of the distribution to longer distance and the reduction of the corresponding width values at intermediate time points ( , and ) indicate that the early formed closed loop structure later deforms during the slow ordering of the chain segments, which seem to perturb the loop closure interaction. distributions of intramolecular distances while the rest of the molecule is "transparent". We believe that this approach, or other methods with the ability to characterize the distribution of intramolecular distances at selected sections of the protein backbone, together with ultrafast data collection, should be an essential component of the arsenal of the protein folding research. The kinetics of formation of specific key sub-domain structures should be probed directly, in situ, on the background of the rest of the chain, where mutual influences exist. The possible significance of the loop closure effect in folding in vivo is an intriguing question that might be addressed with currently available FRET and timeresolved fluorescence methods [77, 211­214]. Co-translational folding of nascent polypeptides on the ribosome is a subject of a major field of current research. N terminal Orevi et al.: The loop hypothesis of protein folding187 Figure 10Partial map of the folding pathway of AK. The folding of E. coli adenylate kinase (AK) included transitions whose rates differed by six orders of magnitude. The denatured ensemble (D) assumed to be fully disordered collapsed at a sub-microsecond (?) time scale to disordered compact globule (C). Closure of loops I and II formed I1. This was followed by additional loop closure (cross-links) and secondary structure sub-domain elements in the millisecond-tosecond time scale. The rate-determining step led to the native state (N). loops can fold on the ribosome or upon emergence and thus might have a major role in the folding and trafficking of proteins. The ideal experiment should be able to monitor multiple intramolecular distances in all parts of the molecule, in situ, with sub-microsecond time resolution and with a spatial resolution of single angstrom units. Moreover, since the folding transition starts from an ensemble of a very large number of conformations, the early folding transitions should be studied, among other means, by methods that can yield probability distributions of selected intramolecular distances that report the folding status of key segment structures and their development from the moment of transition to folding conditions. For that reason, we chose to apply and further develop the trFRET-based methods and, in particular, the double kinetics method. A major task for the near future is enhancement of the resolution of the "chemical time base" (by faster initiation of the refolding transition and by reduction of the instrument dead time) for routine multiple double kinetics experiments. New devices have been developed [215], and we hope that they will become more practical for routine applications. The results reviewed here, obtained mainly from studying just a few model proteins, support the counterintuitive mechanism whereby the NLIs are effective in the initiation of the folding pathways. Alternatively, it is possible that there are two types of folding mechanism, one in which NLIs are dominant and the other in which the LIs predominate. This was suggested by Daggett and Fersht [57] and is supported by several results reviewed here. It is reasonable to assume that a combination of both local and non-local interactions is effective at the early steps of folding of many proteins. The challenge for future work is to develop the experimental approach that will enable the determination of the relative contribution of both types of interaction, the exact timing of the formation of each interaction at high resolution, the specificity of selected early non-local interactions, and the extent of inter-dependence of selected early interactions. We suggest that mapping multiple sub-domain structural transitions during the refolding transition of many proteins will refine the conclusions and help reveal some common principles of the initiation of the folding to enable the first steps towards the preparation of a folding "dictionary". In order to achieve this goal, the trFRET measurements should be combined with mutagenesis experiments in which the role of selected residue clusters will be tested by perturbation mutations. A positive test for the loop closure capacity of nodes forming sequence clusters can be achieved by measurements of the distribution of end-to-end distance of flexible polypeptides whose ends include such clusters of residues. Yet, we remember that the solution to the protein folding problem depends on many more approaches, both experimental and theoretical, while the approach presented here is only small portion of the big puzzle. Acknowledgments: We are grateful to Mr. E. Zimerman and D. Freedman for excellent technical assistance. We are grateful to Eldad Ben Ishay, Eitan Lerner, and Asaf Grupi for their contributions and discussions. Author contributions: All the authors have accepted the responsibility for the entire content of this submitted manuscript and approved the submission. Research funding: This study was supported by grants from the Israel Science Foundation (ISF1464/10 and the I-CORE 1902/12), the EU Marie Curie TOK grant (29936), the US-Israel Binational Science Foundation (BSF 2011143), and by the Damadian Center for Magnetic Resonance Research, Bar-Ilan University. Employment or leadership: None declared. 188Orevi et al.: The loop hypothesis of protein folding Honorarium: None declared. Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication. dopsis thaliana cytochrome c(6A). Biochim Biophys Acta 2012;1824:311­8. 21. Berezovsky IN, Trifonov EN. Loop fold nature of globular proteins. Protein Eng 2001;14:403­7. 22. Berezovsky IN, Trifonov EN. Van der Waals locks: loop-n-lock structure of globular proteins. J Molec Biol 2001;307:1419­26. 23. Hubner IA, Edmonds KA, Shakhnovich EI. Nucleation and the transition state of the SH3 domain. J Mol Biol 2005;349:424­34. 24. Lindorff-Larsen K, Vendruscolo M, Paci E, Dobson CM. Transition states for protein folding have native topologies despite high structural variability. Nat Struct Mol Biol 2004;11:443­9. 25. Lindorff-Larsen K, Rogen P, Paci E, Vendruscolo M, Dobson CM. Protein folding and the organization of the protein topology universe. Trends Biochem Sci 2005;30:13­9. 26. Paci E, Clarke J, Steward A, Vendruscolo M, Karplus M. Selfconsistent determination of the transition state for protein folding: application to a fibronectin type III domain. Proc Natl Acad Sci USA 2003;100:394­9. 27. Geierhaas CD, Paci E, Vendruscolo M, Clarke J. Comparison of the transition states for folding of two Ig-like proteins from different superfamilies. J Mol Biol 2004;343:1111­23. 28. Lappalainen I, Hurley MG, Clarke J. Plasticity within the obligatory folding nucleus of an immunoglobulin-like domain. J Mol Biol 2008;375:547­59. 29. Sosnick TR, Dothager RS, Krantz BA. Differences in the folding transition state of ubiquitin indicated by phi and psi analyses. Proc Natl Acad Sci USA 2004;101:17377­82. 30. Krantz BA, Dothager RS, Sosnick TR. Discerning the structure and energy of multiple transition states in protein folding using psi-analysis. J Mol Biol 2004;337:463­75. 31. Tsong TY, Hu CK, Wu MC. Hydrophobic condensation and modular assembly model of protein folding. Biosystems 2008;93:78­89. 32. Fulton KF, Main ER, Daggett V, Jackson SE. Mapping the interactions present in the transition state for unfolding/folding of FKBP12. J Mol Biol 1999;291:445­61. 33. Samatova EN, Katina NS, Balobanov VA, Melnik BS, Dolgikh DA, Bychkova VE, et al. How strong are side chain interactions in the folding intermediate? Protein Sci 2009;18:2152­9. 34. Rader AJ, Yennamalli RM, Harter AK, Sen TZ. A rigid network of long-range contacts increases thermostability in a mutant endoglucanase. J Biomol Struct Dyn 2012;30:628­37. 35. Go N, Taketomi H. Respective roles of short- and long-range interactions in protein folding. Proc Natl Acad Sci USA 1978;75:559­63. 36. Taketomi H, Ueda Y, Go N. Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions. Int J Pept Protein Res 1975;7:445­59. 37. Abkevich VI, Gutin AM, Shakhnovich EI. Impact of local and nonlocal interactions on thermodynamics and kinetics of protein folding. J Mol Biol 1995;252:460­71. 38. Dokholyan NV, Buldyrev SV, Stanley HE, Shakhnovich EI. Identifying the protein folding nucleus using molecular dynamics. J Mol Biol 2000;296:1183­8. 39. Hubner IA, Oliveberg M, Shakhnovich EI. Simulation, experiment, and evolution: understanding nucleation in protein S6 folding. Proc Natl Acad Sci USA 2004;101:8354­9. 40. Hubner IA, Shimada J, Shakhnovich EI. Commitment and nucleation in the protein G transition state. J Mol Biol 2004;336: 745­61. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bio-Algorithms and Med-Systems de Gruyter

Fast closure of long loops at the initiation of the folding transition of globular proteins studied by time-resolved FRET-based methods

Loading next page...
 
/lp/de-gruyter/fast-closure-of-long-loops-at-the-initiation-of-the-folding-transition-ONf7XFgUoa
Publisher
de Gruyter
Copyright
Copyright © 2014 by the
ISSN
1895-9091
eISSN
1896-530X
DOI
10.1515/bams-2014-0018
Publisher site
See Article on Publisher Site

Abstract

The protein folding problem would be considered "solved" when it will be possible to "read genes", i.e., to predict the native fold of proteins, their dynamics, and the mechanism of fast folding based solely on sequence data. The long-term goal should be the creation of an algorithm that would simulate the stepwise mechanism of folding, which constrains the conformational space and in which random search for stable interactions is possible. Here, we focus attention on the initial phases of the folding transition starting with the compact disordered collapsed ensemble, in search of the initial sub-domain structural biases that direct the otherwise stochastic dynamics of the backbone. Our studies are designed to test the "loop hypothesis", which suggests that fast closure of long loop structures by non-local interactions between clusters of mainly non-polar residues is an essential conformational step at the initiation of the folding transition of globular proteins. We developed and applied experimental methods based on time-resolved resonance excitation energy transfer (trFRET) measurements combined with fast mixing methods and studied the initial phases of the folding of Escherichia coli adenylate kinase (AK). A series of AK mutants were prepared, in which the ends of selected backbone segments that form long closed loops or secondary structure elements were labeled by donors and acceptors of excitation energy. The end-to-end distance distributions of such segments were determined under equilibrium and during the fast folding transitions. These experiments show that three out of seven long loops that were labeled in the AK molecule are closed very early in the transition. The N terminal 26-residue loop (loop I) is closed in 00 s after the initiation of folding, while the <2 strand included in loop I is still disordered. The closure *Corresponding author: Elisha Haas, The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan 52900, Israel, Phone: +972 546270012, E-mail: Elisha.haas@biu.ac.il Tomer Orevi, Gil Rahamim, Sivan Shemesh, Eldad Ben Ishay and Dan Amir: The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel of the second 44-residue loop (loop II, which starts at the end of loop I) is also complete within 00 s. Four other <3 loops as well as five secondary structures of the CORE domain of AK (an helix and four strands) are formed at a late step, at a rate of 0.5 .3 s­1, the rate of the coop±0 erative folding of the molecule. These experiments reveal a hierarchically ordered pathway of folding of the AK molecule, ranging from microseconds to seconds. The results reviewed here, obtained mainly from studying a small number of model proteins, support the counterintuitive mechanism whereby non-local interactions are effective in the initiation of the folding pathways. The experiments presented demonstrate the importance of mapping the rates of sub-domain structural transitions along the folding transition, in situ, in the context of the other sections of the chain, whether folded or disordered. These experiments also show the power of the time-resolved FRET measurements in achieving this goal. A large body of data obtained by theoretical and experimental studies that support, or can accommodate, the loop hypothesis is reviewed. We suggest that mapping multiple sub-domain structural transitions during the refolding transition of many proteins using the approach presented here will refine the conclusions and help reveal some common principles of the initiation of the folding. To achieve this goal, the trFRET measurements should be combined with mutagenesis experiments where the role of selected residue clusters will be tested by perturbation mutations. Nevertheless, the solution of the protein folding problem depends on the application of many additional approaches, both experimental and theoretical, while the approach presented here is only a small section of the big puzzle. Keywords: FRET; globular proteins; long loops; loop closure; protein folding. DOI 10.1515/bams-2014-0018 Received September 18, 2014; accepted October 24, 2014 170Orevi et al.: The loop hypothesis of protein folding Introduction The fast and efficient transition of an ensemble of disordered proteins under physiological conditions to a much more compact and uniform ensemble of folded molecules has been the subject of numerous studies for more than seven decades and is still not fully understood. An ensemble of disordered protein molecules, which populates a large conformational space and undergoes fast transition to an ensemble of ordered molecules, explores a very large number of pathways, due to the stochastic nature of the search. Yet, the folding transition is fast and efficient, and all the pathways converge to an ensemble of ordered molecules that occupy only a very small portion of the conformational space. This led to the hypothesis that the mechanism of the folding transition of globular proteins is not a random search (the thermodynamic hypothesis), but rather a programmed hierarchical sequence of subdomain or domain transitions that gradually constrains the conformational space available to the polypeptide chain. The "thermodynamic hypothesis" [1­8] suggests that, in principle, one should be able to predict the product of the conformational search (i.e., the folded conformation) by searching for the free energy minimum for a given system of protein plus solvent [3­8]. However, the high rate and efficiency of the folding transition are not explained by this approach. This led to the hypothesis that it should be possible to characterize transient ensembles of partially ordered molecules in which specific molecular interactions are effective and specific structural features (whether native-like or nonnative-like) are formed. The assumption was that a series of such partial transitions forms a pathway that should be characteristic of the protein; a search for such generalized principles is still ongoing. The underlying assumption was that the mechanism of folding is hierarchical and that the sequence information encoded in each globular protein includes a set of "instructions" for the construction of the folded states through the gradual formation of sub-domain structures. Thus, the genetic information should be viewed as a "blue print" for the construction process, and, hence, the challenge is to decipher the relationship between the sequence information and the specific steps in the transition process. The challenge inherent in this approach is the need to characterize transient ensembles that include a wide range of conformations, with large variance of structural characteristics; each of these conformational states is poorly populated and very short lived. The common methods, relying on the determination of population averages of structural characteristics, are insufficient, and methods for very fast capture of the mean and the variance of each measured structural characteristic should be developed. Historically, inspired by the chemical concepts of reaction intermediates, it was assumed that the detection of kinetic intermediates along the pathway [9] and residual structures [10] should reveal some principles of the folding mechanism [9, 11, 12]. Metastable intermediates, however, may actually slow the folding transition and might not be the correct target for the search [13, 14]. When considering the possible mechanisms of stepwise structure formation, the "bottom-up" model seems intuitively to be the first choice [11, 15]. In this model, local interactions (LIs) between near-neighbor residues along the chain are assumed to first form and stabilize short structural elements, such as secondary structures. Those elements then coalesce to form a higher level of contacts. The entropy change resulting from such a transition is relatively small, and the probability of formation of pairwise interactions internal to short chain segments is high. An alternative model is based on the hypothesis that the dominant interactions that are essential for the early determination of the folding pathway, enabling rapid and early elimination of major sections of the conformational space, are non-local interactions (NLIs) between monomers separated by segments of the chain. This model is counterintuitive, since the probability of interaction between widely separated sites is lower than that of near neighbors, and, thus, these NLIs should be harder to form than LIs. The relative importance of local vs. non-local interactions in determining the mechanism and rate of folding has been investigated for many years based on experiments, simulations, and theoretical considerations [16­18]. Globular proteins are very compact, and, hence, contact between residues that are separated by long chain segments is a common theme of their folded structures. Furthermore, unlike the coil-to-globule transition of homopolymers, the dependence of inter-residue distances on the segment length, n, in the native structure of globular proteins is very weak. Domains of globular proteins are strongly cross-linked by NLIs. NLIs can be contributed by specific interactions between clusters of residues or by third-party cross-linkers such as ligands [19] that form specific interactions with non-contiguous sections of the backbone. Disulfide bonds are another example of cross-linking that accelerates refolding by reduction of the entropy of the disordered state [20]. Yet, covalent crosslinking is a slow reaction, and during the initial folding of most proteins, the folded state is maintained by a large number of non-covalent NLIs. It is therefore reasonable to assume that NLIs are also instrumental in the mechanism responsible for the fast and efficient protein folding Orevi et al.: The loop hypothesis of protein folding171 transition. Berezovsky and Trifonov [21, 22] showed that protein structure can be viewed as a compact linear array of closed loops. Furthermore, they proposed that protein folding occurs through consecutive looping of the chain, with the loops ending primarily at hydrophobic nuclei. The role of non-local interactions in the early phases of folding pathways was described in many studies [23­34]. The view that non-local contacts can be effective early in the folding transition was first suggested based on simple lattice-based protein folding simulations [35, 36]. Using a cubic lattice simulation, Abkevich et al. [37] found that NLIs enhance the rate of folding. Molecular dynamics and a Monte Carlo Go model simulation were used to examine the events that lead to the formation of the transition state ensemble (TSE) [38­40]. Simulations of the folding transition by several groups showed the initiation of the pathway by NLIs [41, 42]. Zhang and Chan [43] performed explicit-chain simulations using coarse-grained chain models of natural proteins and computed the transient distributions of conformations sampled along the trajectories of many molecules. They found that conformations in the initial phases of the faster pathways were enriched with NLIs. Juraszek and Bolhuis [42] found that, in the folding of the small protein, Trp-cage, 80% of the pathways are initiated by NLI, while secondary structures appear later. Chomilier et al. [44­46] used native fold analysis and simulations to develop the concept of closed loops as early folding structures and to elucidate the mechanism of folding [44]. An algorithm was developed for the identification of the residues involved in the loop closure interaction, which were named most interacting residues (MIR). Segments that form loops in the folded state were identified [47]. The importance of the MIR elements was further validated by the simulation of mutations in these sequences and by the correlation with the known stabilities of 385 proteins with published stability data [48]. These studies of the MIR elements, which are located at the ends of long loops, strongly support the loop hypothesis. Insights into the relative roles of local and global interactions in determining folding mechanisms and rates were recently obtained by Peter and Shea [49]. The folding kinetics of a small protein, Trp-cage, was simulated in two pathways (with either the secondary or the tertiary structure formation as the rate-determining step), and it was concluded that the competition between the secondary and the tertiary structure formation affects the folding rate. However, many other studies supported the counterhypothesis that the folding mechanism is dominated by LIs, while the non-local ones provide non-specific stabilization of the compact conformers [6, 50­58]. A large number of studies provided evidence for a dominant role of LIs in the initiation of the folding transition. On the basis of entropic considerations, Rose et al. [59­62] suggested that helix and strand formation are guiding events in protein folding. Folding prediction by a fragment assembly mechanism, which is based mainly on early formed LIs, was successfully used by many groups [63­71]. The success of this approach strongly supports the importance of early formed LIs in stabilizing the final fold of small proteins. In these models, the NLIs are assumed to have a lesser role and can be non-specific. In the zipping and assembly (ZA) mechanism suggested by Dill et al. [53], local structuring happens first at independent sites along the chain, then those structures either grow (zip) or coalescence (assemble) with other structures [72­74]. Dill et al. [53] argue that the ZA mechanism provides a model for the physical pathways of protein folding. The fragment assembly (FA) (or ZA) models for protein folding predictions show impressive success when applied to small proteins. However, does the success of the FA methods mean that it predicts the folding routes? This should be tested by experiments, e.g., where the kinetics of each one of multiple sub-domain elements of model proteins will be studied in situ, during the time leading to the formation of the TSE. Daggett and Fersht [57, 75] conclude that only when the propensity of stable secondary structures is high do such structures form first, followed by their assembly. But in the majority of proteins, the unstable secondary structures are stabilized by NLIs. The interplay of local and non-local interactions in the folding transition The interplay between non-local hydrophobic contacts and local secondary structure was the subject of a series of reports on several well-studied model proteins. Numerous works showed that local and non-local contact are critical for the proper folding in both hierarchical and nucleation-condensation folding models. Studies of various mutants of apomyoglobin of different sources were reported by several groups where contributions of both local and non-local interactions were found [33, 76­80]. Numerous studies of the folding of SH3 domains showed a similar sequence of events where non-local contacts lead to the formation of a nucleus [23­25, 81­85], and the context dependence of secondary structure formation was also shown [86]. Folding transitions where non-local hydrophobic interactions stabilize secondary 172Orevi et al.: The loop hypothesis of protein folding structure elements were reported for several proteins [87­90]. The early steps of folding of dihydrofolate reductase were shown to involve stabilization of secondary structures by hydrophobic contacts [91]. A systematic study of mutants of an immunoglobulin V(L) domain suggests that local and non-local interactions contribute to the stability by an approximately equal amount, but that local interactions stabilize by increasing the resistance to denaturation, while non-local interactions increase folding cooperativity [92]. The question of the role of specific LIs or NLIs and their relative importance at the initiation of folding or for the determination of the direction and rate of the folding transition remains, and their mutual dependence is open and calls for more experiments and theoretical studies. The focus of the current review is the search for the role of local and non-local interactions in the earliest detected folding steps were only a few specific interactions are found. In order to achieve this goal, we took advantage of the speed of fluorescence detection and the power of various modes of FRET and trFRET measurements. enables monitoring of the conformational changes in each sub-domain structure. Very fast data collection enables determination of a series of sequential distance distributions along the folding pathway [99, 109­117]. Such experiments yield meaningful information describing specific conformational changes and the order of their occurrence along the folding pathway. The loop hypothesis We first proposed the "loop hypothesis" in 1995 based on the results of trFRET studies of the folding of doublelabeled (donor and acceptor) reduced bovine pancreatic trypsin inhibitor (BPTI) (Figure 1) [118, 119]. We hypothesized that the earliest steps in the folding transition are the closure of a small number of long loop segments by NLIs between clusters of mostly non-polar residues at their ends [107, 119]. In contrast to the "bottom-up" folding mechanism, we assumed a "top-down" pathway, whereby loop structure elements are formed at the initiation of folding. Secondary structures can form either at the same time or at a later time along the transition, but the NLIs that close the loops do not always depend on the secondary structures of interacting clusters. The biological advantages of such a mechanism are the (a) major constraint of the disordered ensemble and hence a large backbone entropy reduction per interaction, (b) reduced chance of aggregation or misfolding, (c) fast partial protection from proteolysis mechanisms in the cell [120], and (d) reduction of the number of available non-productive pathways. It was further suggested that a very small number of such closed loops sufficiently restrict the conformational space and force the protein into an ensemble of conformations, with the characteristics of the outline of the conformations populating the TSE and the folded ensembles. This was presented as a working hypothesis, which was meant to guide the research that would test the relative contribution and timing of the formation of LIs and NLIs at the earliest steps of the folding transition of globular proteins. A viable possible result might be that both types of interactions are effective at the early steps of the folding transition, and the challenge for both experiments and computations is to resolve the relative contributions and fine resolution of the sequence of formation of specific interactions at the initial phases of the folding transition. The feasibility of very fast specific long loop closure at the initiation of the folding is supported by several studies in which the kinetics of loop closure of long polypeptide segments were measured mainly by fluorescence Experimental strategy The search for the time and order of formation of subdomain structures stabilized by either NLIs or LIs depends on our ability to map selected conformational changes in ensembles of structures with an otherwise disordered backbone. Those ensembles should be captured in submicrosecond time frames during the fast folding transition starting at the ensemble of fast collapsed (disordered) molecules [93­99]. This is a major technical challenge, and, hence, many kinetic studies focused at the ratedetermining steps, i.e., the transition state at which the outline of the chain fold is already stabilized. To meet this challenge, methods based on time-resolved fluorescence resonance energy transfer (trFRET) were developed to characterize the ensembles of unfolded, collapsed, and partially folded globular protein molecules [100­105]. The method is based on a combination of site-specific labeling of selected pairs of residues by fluorescent donor and acceptor probes, and on the determination of distributions of intramolecular distances in ensembles of the labeled protein molecules by means of trFRET measurements. This approach enables monitoring fine changes of the end-to-end distance of preselected chain elements such as loops or secondary structure elements one at a time, in situ, in the context of the whole molecule [106­108]. Preparation of a series of labeled mutants of one protein Orevi et al.: The loop hypothesis of protein folding173 quenching experiments. Fast loop closure on a nanosecond time scale was reported [121­126]. Concerns about the possibly large negative entropy cost of loop closure reactions, which questions the feasibility of specific NLIs in the disordered ensemble, can be addressed by considering the following observations: (a) a study of the probability of loop closure showed a less than an order of magnitude change of probability when comparing 10- vs. 40-residue chain segments [127], and (b) several studies showed that the entropy change in long loop closure is not large enough to make it improbable [128, 129]. One explanation for these observations could be that only the loop ends are constrained, while the rest of the chain between them is free to occupy a large number of conformations under the constraints of the chain collapse. The entropy change is expected to be balanced by the interactions between the clusters at the nodes of the loop. The loop hypothesis and the nucleation-condensation mechanism The nucleation step was suggested to be a mechanism ensuring a fast and efficient folding transition [55, 130­ 132]. The nucleus is formed by residues from different parts of the chain, with unstructured loops between them [132]. The common theme of most of those mechanisms is the presence of a small number of NLIs that are formed prior to the main folding transition. In many cases, these contacts form long closed loops. The loop hypothesis is compatible with the nucleation-condensation mechanism [57] but describes a very different part of the folding pathway. The nucleationcondensation mechanism describes the formation of the nucleus, which is the hallmark of the TSE; all conformations belonging to the TSE have an obligate nucleus, which is often a specific well-defined set of contacts, both local and non-local. In the nucleation-condensation mechanism, the nucleation process is coupled to the TSE formation, while the "loop hypothesis" describes the formation of the earliest sub-domain structures, which precede the formation of the nucleus and are coupled to the initiation of the folding transition. The loop hypothesis assumes that the first step that follows the transfer of a fully disordered polypeptide to the folding conditions is a non-specific adaptation by a collapse to an ensemble of disordered molecules, whose global dimensions Figure 1Carton description of the "closed loop model". Five snapshots along the folding trajectory of a model protein: the unfolded state is represented by one of the many possible conformers. The three pairs of clusters of mostly hydrophobic residues that can form specific loop ends' lock are shown in three colors. Upon change to folding conditions, rapid chain collapse leads to a compact still disordered globule. During or immediately after chain collapse (collapsed I), the fast-formed loop structure locked by a pair of specific hydrophobic clusters is shown. In the next step (collapsed II, red and orange clusters), the second loop is formed within a very short time. The three closed loops already fix the overall native-like topology of the chain. At this point, a diffuse nucleus has been formed, and with activation-limited delay or right away (in the single-domain fast folders) the transition state ensemble (TSE) is formed. Finally, packing and complete desolvation are achieved and the folded state is stabilized (native). 174Orevi et al.: The loop hypothesis of protein folding are between those of the unfolded and the folded ensembles. This is followed by the fast formation of the small set of specific non-local contacts, which are assumed to contribute to the subsequent formation of the folding nucleus. It is assumed that short clusters of (mostly nonpolar) residues form the loop end nodes by specific steric complementarity and that these nodes form marginally stable specific non-local contacts that impose a partial order on the overall disordered molecules. In that sense, the experiments that identify the earliest contacts do not demonstrate nucleation, which is a later event coupled to the rate-limiting folding step of TSE formation. There is a relationship between fast formation of first persistent NLIs and nucleation. It is reasonable to assume that the stability of the few earliest native NLIs formed in the disordered collapsed ensemble would facilitate the subsequent completion of nucleus formation and determine its overall topology. Thus, the key distinction is that loop formation is a facilitating step for nucleus formation, directing its topology, and occurring in the collapsed disordered ensemble, while nucleation is an advanced folding event, indicating the formation of the TSE, which is the rate-limiting step of the global folding transition. The well-established "contact order correlation" [54, 133­136], which leads to the conclusion that the rate of folding correlates well with the number of interactions within short chain segments, seems to be at odds with the loop hypothesis. However, the contact order correlation is summed over the entire polypeptide chain and, hence, is not sensitive to a small number of closed loops that might affect the folding rate. Furthermore, that correlation refers to the rate of the global folding (the formation of the TSE), while the loop hypothesis describes the initiation of folding. was based on the presence of insertions and deletions in structurally aligned pairs of proteins that share essentially the same fold but not necessarily high sequence similarity. The algorithm was able to capture loop units that correspond to the closed loops found by Berezovsky and Trifonov above. Chintapalli et al. [140, 141] showed that (a) the folding rate correlates extremely closely with total contact distance evaluated only over the lock residues and (b) that the lock residues tend to have high values, "as would be expected for residues that play an important role in the transition structure for folding". Chintapalli et al. further suggested that the closed loop hypothesis is able to give an alternative description of the data obtained by Englander et al. [142­144] for cytochromes c and b562 (as well as for triosephosphate isomerase) initially interpreted in terms of independent folding units (foldons). The closed loop hypothesis-based mechanism was said to be "as elegant as the published explanations as it does not invoke discontinuous foldons". Bioinformatic analyses The growth of the protein structure databases and the advances in bioinformatics methods enabled large-scale analysis within structural types and homologs addressing the question of the role of local vs. non-local interactions in protein folding. The work of Govindarajan and Goldstein was mentioned in the Introduction section. A very different result was obtained by Unger and Moult [145, 146], who concluded that the foldability of a sequence is determined primarily by LIs. Yew et al. [147] analyzed the degree of conservation of loop end clusters and reported that 70% of these loop ends were found to be well conserved. In a recent bioinformatics study using a contemporary database, Noivirt-Brik et al. [148] assessed the importance of the two types of interactions through their evolutionary and structural conservation. The underlying assumption was that, in positions that form more critical contacts for the folding, stability and function are likely to be more conserved. They found that, for the majority of proteins found in the current database, non-local contacts are structurally and evolutionarily more conserved than the local ones. The loop as a basic folding unit based on the analysis of folded structures Berezovsky and Trifonov [22, 44, 137­139] analyzed the outline of the chain fold of the crystal structures of 302 proteins and proposed that protein structures can be viewed as compact linear arrays of closed loops. They proposed that protein folding progresses through the consecutive looping of the chain, with the loops ending primarily at hydrophobic nuclei termed "locks". An alternative approach to identify closed loop folding units was developed by Chintapalli et al. [140, 141]. Their strategy Mode of detection The interaction between the loop ends in the ensemble of collapsed molecules, where cooperativity is missing, is very weak. Therefore they could be observed by Orevi et al.: The loop hypothesis of protein folding175 characterization of the fine changes in the transient distributions of intramolecular distances between the clusters of residues that form the closing interactions. trFRET experiments are ideal for the detection of the formation of each closed long loop since it is possible to follow selected distances between two sites that are separated by a large number of residues, their distributions, and fast fluctuations [100, 104, 107, 108, 115, 149­151]. FRET-detected single-molecule approaches are available to meet this goal. Nevertheless, we applied mainly the ensemble detection methods for several reasons: deducing the folding pathway requires ensemble characteristics; natural amino acids, or mildly modified residues can be used and the structural perturbation can be minimized; full sets of fluorescence decay curves can be collected in nanosecond time intervals; application of the double kinetics approach is more straightforward compared to the single-molecule FRET-detected (smFRET)-based methods; pairs of probes suitable for monitoring distances in the range corresponding to sub-domain structural dimensions are readily available; and the ability to resolve subpopulations of intramolecular distances, a major strength of the smFRET-based methods, is also readily provided by the ensemble-based trFRET experiments [152]. Analyses of the trFRET measurements do not yield atom-to-atom distances, and depend on the size and dynamics of the probes and on the correct determination of the Förster constant, Ro. Yet, when the same pair of probes is used for studying a selected intramolecular distance under a series of conditions, the changes in the distribution of each distance are faithfully reported. Proper selection of pairs of probes enables angstrom resolution of mean distances by the trFRET measurements, such as in the case shown in the text box. 10,000 DO DA 300 600 900 Auto correlation 15.5 Å 16.3 Å Counts/channel 5000 DO 2500 IRF 0 0 4 0 -4 19.2 Å Folded 404 s 159 s DA 16.8 Å Res. 60 s Time (Channels) Time-resolved FRET in the double kinetics context. (A) The difference between the traces of the fluorescence decay [nanosecond time scale (ts)] of the donor emission in the absence of an acceptor (the DO experiment) and the presence of the acceptor (the DA experiment) is the result of the FRET mechanism, and it contains the information on the distribution of the distance between the two probes. When the fast conformational exchange within the ensemble of labeled protein molecules is slow relative to the lifetime of the excited state of the donor, the distribution of intramolecular distance, No(r), can be recovered by modeling i(t) using the expression: R 6 t o i ( t ) = k No ( r )exp - o 1 + 6 dr r 0 D o o where k is a proportionality factor and D is obtained from the analysis of the DO trace. Knowledge of Ro and D enables the extraction of No(r) from the multi-exponential decay curve, i(t) (the DA trace). (B) When both the DA and the DO traces are collected in very short time intervals [short relative to the rate of change of conformations (ts < c)], a series of distributions of intramolecular distance can be < t obtained. In the following, we review FRET-based folding studies relevant to the loop hypothesis. We start by briefly reviewing equilibrium folding/unfolding studies and continue with the review of studies of the collapsed state. This is followed by steady-state and trFRET-detected folding kinetics experiments of double-labeled AK mutants designed 176Orevi et al.: The loop hypothesis of protein folding for the detection of either long loop closure or folding of secondary structure elements, first by stopped flow and then by fast mixing double kinetics experiments. the distributions of end-to-end distances of each segment. The sizes of the two sub-populations were temperature and denaturant concentration dependent [119], i.e., transition to conditions more favorable to folding increased the native-like sub-population and reduced the unfolded one. The interpretation of these observations led to the suggestions that the native-like non-local contacts between the ends of the labeled segments form loop structures, which are the basic folding unit [119]. A similar effect was reported by Klein-Seetharaman et al. [155, 156] based on the NMR study of the denatured state of lysozyme. The specificity of the observed interactions is in agreement with Baldwin and Rose [157], who suggested that a specific stereochemical code for intramolecular interactions directs the folding transition. Buckler et al. [158] and Navon et al. [159] applied the FRET-based experimental approach to the study of unfolded and partially folded states of reduced bovine ribonuclease A (RNase A) (124 residues). The distance distribution between the C-terminal residue (residue 124) and residue 76 (a 49-residue chain segment, representing a long C terminal loop) was another example in which two sub-populations were resolved. The distributions represent an equilibrium between nativelike and unfolded-like distance between the ends of the loop under partial denaturing conditions. Interestingly, under the same conditions, but in phosphate buffer, the native-like sub-population was dominant. In that case, the C terminal loop structure of RNase A was closed by the phosphate ions, which bind in the active site and thereby cross-link residues 12, 19, and 119 (Figure 3). Studies of the mechanism and kinetics of the folding transition Equilibrium studies BPTI (Figure 2) is a small stable globular protein of 58 residues, including four lysine residues and three disulfide bonds [118, 153]. Distributions of the end-to-end distance of four segments of the BPTI backbone were determined [100]. Reduced BPTI in 6 mol/L of GndHCl is fully denatured [153, 154]. Yet, at least two sub-populations, one native-like and one unfolded-like, were distinguished in Multiprobe FRET studies of the initial collapse transition The introduction of FRET-detected single-molecule detection methods enabled the study of sub-populations of collapsed molecule at equilibrium with sub-populations of folded molecules. Studying single molecules under partially folding conditions one by one and then collecting those that show short intramolecular distances into an ensemble of collapsed molecules at equilibrium with the disordered ensemble enables the mapping of intramolecular distances in the collapsed ensemble (reviewed by Haran [97], Schuler and Eaton [160] and Ferreon and Deniz [161]). The collapsed state of protein molecules was studied by many researchers at equilibrium under partial folding conditions in the presence of the unfolded ensemble. Single-molecule FRET was applied [150, 161­163], and the dimensions of the Figure 2The backbone structure of BPTI and the pairs that were labeled in four different preparations. The sites of the four lysine residues that were labeled by the acceptor (15, 26, 41, and 46) as well as the N terminal residue where the donor was attached are shown. The closure of the N terminal loop is monitored by the measurements of the distribution of the distance between residues 1 and 26. Orevi et al.: The loop hypothesis of protein folding177 Steady-state and time-resolved FRETdetected folding kinetics experiments An ideal folding kinetics experiment would produce a time-dependent series of three-dimensional structures. These structures describe the transition of an ensemble of molecules from an unfolded state under folding conditions to the folded state. The path would probably start with a collection of all possible compact configurations and gradually reduce the number of populated conformations. Either a parallel continuous change at all segments or, more likely, partially ordered structures or folding initiation sites in various domains would become visible at different time points. Analysis of such an ideal experiment should reveal the order of structural transitions and the formation of structural elements along very broad and gradually narrowing pathways to the native state. Such a series of structures should enable the inference of the basic principles of the master plan of the folding mechanism. If such an ideal experiment could be combined with sitedirected perturbation mutagenesis, it might also be possible to search for the "sequence signals", i.e., inter-residue interactions that stabilize and lock structural elements, either simultaneously or sequentially. Such experiments could enable the compilation of a "dictionary" that would relate the types of clusters of residues, either contiguous or non-contiguous, to the types of conformational events and structures, and the timing of their appearance during the folding transition. A pioneer of the application of the distance dependence of the resonance energy transfer effect [170] is I.Z. Steinberg [171], who introduced the early applications of time-resolved FRET to biopolymer studies. A first application of intramolecular FRET in peptides was reported by Edelhoch et al. [172]. Steady-state and time-resolved FRETdetected kinetics experiments are far from the "ideal" experiment described previously, as they are unable to yield atomic resolution, are limited to a few pairs of sites, and involve modifications of the protein. This is a lowresolution measurement, but the arsenal of FRET experiments displays some unique qualities that justify the efforts required to produce site-specifically labeled protein samples. The goal of these experiments is the production of a time series of distributions of selected key intramolecular distances (e.g., distances between or internal to structural elements). Such series can enable the transient structures of folding intermediates to be characterized. Other factors that add to the unique strength of kinetic FRET experiments include the range of detected distances, the time resolution (sub-nanosecond), the interpretation of spectroscopic data in terms of intramolecular distances Figure 3Effective cross-linking by a substrate-mimicking ligand enhances the closure of the C terminal loop of the RNase A backbone. The phosphate ions can keep the three residues, widely separated along the chain in close proximity and thus might increase the probability of closure of the long C terminal loop labeled at residues 76 and 124 or 115. The coordinates for this drawing were taken from the crystal structure of a complex of RNase A with a phosphate ion (PDB file 5RSA). ensemble of collapsed molecules were compared with the results obtained by small-angle X-ray scattering [96, 164­166]. The FRET experiments showed that the dimensions of the ensemble of collapsed protein molecules are smaller than those of the ensemble of unfolded molecules but larger than those of the folded state [99]. The collapsed state was described as a globular state of proteins that is akin to the collapsed state of polymers, as it is predominantly disordered (reviewed by Haran [9] and Udgaonkar [99, 167­169]). However, in the ensembles of collapsed molecules collected under equilibrium under partial folding conditions, all early sub-domain structures that appear under folding conditions are already at least partially formed [99]. Thus, for the analysis of the folding mechanism starting with the ensemble of disordered molecules under folding conditions, ultrafast kinetics must be used. Rapid initiation of folding, combined with ultrafast collection of data to determine the distributions of intramolecular distances prior to the onset of fast partial stabilization of the first sub-domain structures, should enable the ensemble of collapsed molecules to be characterized. This ensemble is the starting point of the folding pathway. Fast collection of data at consecutive time points should enable the detection of the sequence of formation of sub-domain structures that are the leading building blocks of the folding mechanism and, hence, determine its path. To this end, we applied the "double kinetics" approach. 178Orevi et al.: The loop hypothesis of protein folding (10­100 Å, i.e., the dimensions of protein molecules), the capacity for real-time detection, and the ability to determine the transient distributions of distances. Both steady-state and time-resolved detection may be applied together with any method of fast initiation of the folding or unfolding transition, e.g., stopped flow, continuous flow, or T-jump. In general, we preferred to induce refolding from chemically denatured ensembles since, under such conditions, the number of residual structures is minimal and the probability of refolding starting from a short-lived fully disordered collapsed ensemble is high. specific hydrophobic contacts can rapidly form. The less specific force is the local hydrophobicity that directs parts of the chain to the interior, which, in turn, increases the probability of forming close contacts with other (nonlocal) hydrophobic regions. Then, more specific recognition is achieved on the basis of stereo-specific alignment between defined clusters or residues. These features are encoded in the linear sequence of the chain. trFRET detection of rapid folding kinetics: the "double kinetics" experiment The transient transfer efficiencies determined by steadystate detection of the fluorescence intensities of the donor or the acceptor probes report rapid changes in distances. But since the conformations found in ensembles of partially folded protein molecules are inherently heterogeneous, the mean transfer efficiency cannot be used to determine any meaningful mean distances. However, the mean and width of the distributions of distances in these rapidly changing ensembles of partially folded protein molecules can be determined by the rapid recording of time-resolved fluorescence decay curves of the probes. This may be achieved by the double kinetics experiment (see text box). The double kinetics [114, 115, 166, 182] folding/unfolding experiments combine the fast initiation of folding/ unfolding transitions with the rapid change in solution conditions, synchronized with rapid determination of fluorescence decay curves. The challenge here is twofold: first, to collect fluorescence decay curves with a sufficiently high signal-to-noise ratio to enable the determination of statistically significant parameters of the transient distribution of distances at each time point, No(r, t), and, second, to synchronize the refolding initiation mechanism with the probe pulsed laser source. The instruments that were developed enable the time series of transient distributions of the distance between pairs of probes attached to the ends of selected chain segments during the fast refolding transition to be determined. Two time regimes are involved in this experimental approach: the "chemical time regime" (tc) (microseconds to seconds), which is the duration of the conformational transition, and the "spectroscopic time regime" (ts), which is the nanosecond fluorescence decay of the probes. Combining this instrumental approach with the production of a series of protein samples, site-specifically labeled with donor and acceptor pairs, enables the characterization of the backbone fold and flexibility in transient intermediate states during the protein folding transitions. Steady-state FRET detection of the kinetics folding Elegant studies based on steady-state detected FRET monitoring of the kinetics of folding were reported, and we describe next some early representative examples. Chan et al. [173] used a FRET-detected ultrarapidmixing continuous-flow method to study the sub-millisecond folding of chemically denatured cytochrome c. A fast collapse followed by a second folding transition was resolved. Another early FRET experiment was reported by Teilum et al. [115], who studied the early conformational events during the refolding of acyl-CoA binding protein, an 86-residue -helical protein. Udgaonkar [99] is another pioneer of the application of FRET-detected folding kinetics experiments. Multiple distances were determined in model proteins to characterize the ensemble of collapsed molecules at the earliest possible time after the initiation of folding [99]. A study of the folding of barstar (89 residues) [99, 151] showed a fast reduction of the intersegmental distances in a small number of labeled pairs, while some other pairs showed only a slow transition to native intramolecular distances. Such heterogeneity is a hallmark of the formation of specific non-local contacts at some parts of the chain. Ultrafast (microsecond) formation of a small number of non-local interactions at the initiation of folding was reported for several systems [78, 80, 174­177]. Mirny and Shakhnovich [132] reviewed a number of folding kinetics studies and found that, in all cases, secondary interactions play, at most, a minor role in determining folding kinetics. Instead, strong and specific hydrophobic NLIs seem to dominate. A number of experiments demonstrated the role of clusters of non-polar residues in the formation of NLIs at the initial phases of folding transitions [91, 83, 178­181]. These experiments shed light on the mechanism by which Orevi et al.: The loop hypothesis of protein folding179 Kimura et al. [183] used trFRET to determine the distance distributions of two loops/residue pairs in cytochrome c (125 residues), 150 s after the refolding initiation. It was found that one distance distribution (Trp32-heme) was native-like, while the distribution of the second pair (72-heme) was still unfolded. The double kinetics approach was applied by Matthews et al. [103, 117, 184, 185], who studied several different proteins, revealing the very early formation of structural elements and highlighting the role of the Ile, Leu, and Val residues in the formation of the intramolecular interactions. The early steps (30 s) of the folding transition of the subunit of tryptophan synthase [185] and cytochrome c were reported. in which a small number of specific interactions can be formed. Estimation of the radius of the ensemble of collapsed globular protein molecules Unlike the case of fully unfolded polypeptide, where the end-to-end distance of chain segments depends on the number of residues (n) in each segment [192, 193], in the non-specifically collapsed ensemble, where non-specific monomer-monomer interactions exist, the mean end-toend distances are constrained within the dimensions of the ensemble. Thus, it is expected that, in the initially collapsed ensemble, segments of very different numbers of residues could exhibit quite similar values of mean endto-end distances. Sinha and Udegaonkar [151] studied the mean distance between nine pairs of sites in the collapsed ensemble of refolding barstar (an 89-residue protein) molecules and found that segments of 12­51 residues have approximate mean distances in the range 18­21 Å without a clear length dependence. Camacho and Thirumalai [194] found only a weak dependence of the probability of endto-end interaction on chain length. Escherichia coli adenylate kinase (AK) is a 214-residue, three-domain bacterial protein that catalyzes the transfer of a phosphoryl group between ATP and AMP [195­198]. We used this protein as a model for testing the loop hypothesis since it is a large protein in which site-specific labeling methods combined with the strength of trFRET methods enabled us to study the sequence of folding transitions of each domain and sub-domain structure in the context of the full-size molecule in situ. The native topology of AK includes seven long loops. The residues at the opposite termini of each loop-forming chain segment form a strong loop node, as judged by the number of distances that are Å between the atoms of the contacting termini. <5 Methods for the site-specific labeling of pairs of sites with probes that cause minimal structural perturbations were developed. Each labeled mutant was tested by a series of control experiments including assessing enzymatic activity, far UV CD spectroscopy, and reversibility of unfolding/ folding transitions [106, 199]. The experimental strategy is based on the preparation of protein mutants labeled at the pairs of sites selected so that the distribution of the distance between them is sensitive to a specific sub-domain structural element, either a long loop or a secondary structure element. Each double-labeled mutant (donor and acceptor, "DA mutant") is accompanied by a second mutant of the same exact sequence as that of the DA mutant, except that the site for The ensemble of collapsed molecules Many experiments show that the mean radius of the collapsed globule is about 30% larger than that of the same molecule in its fully folded state [95, 103, 165, 186]. Thus, almost half of its volume is occupied by solvent molecules. This implies that, in the collapsed state, few residue-residue interactions are effective, chain dynamics are not inhibited, and the probability of residue-residue encounters is enhanced. Remaining questions regarding this folding stage include identifying the key interactions that appear in the collapsed globule that contribute to the entry into the programmed pathway, and the residues that contribute them. FRET-detected fast kinetics experiments show that, under strongly stabilizing conditions, the initial non-specific collapse reaction is quickly followed by a structureforming reaction that is completed within a millisecond or so of the folding and that leads to the formation of a partially structured and collapsed intermediate [99, 103, 117, 123, 126, 175, 184, 187, 188]. The microsecond folding kinetics detected by FRET and other probes was used to characterize the ensemble of fast collapsed disordered molecules, which include a few specific interactions [105, 113, 117, 151, 166, 184]. Barrier-limited chain contraction and specific sub-domain interactions, in particular by isoleucine, leucine, and valine (ILV) clusters, were observed upon transfer of the GndHCl denatured state ensemble to native-like conditions [166, 185, 189, 190]. An added value of the time-resolved FRET experiments is the ability to resolve subpopulations under conditions of fast exchange [106, 159, 191]. As suggested earlier [149], the collapsed molecules do not constitute a separate thermodynamic state, but belong to the disordered state 180Orevi et al.: The loop hypothesis of protein folding the acceptor is blocked by acetamide (the "DO mutant"). This second mutant is used for reference measurements to determine the fluorescence decay of the donor in the absence of an acceptor under each experimental setup. To estimate the size of the collapsed form of AK, we can choose the most compact structure of the AK molecule, that of its complex with the inhibitor Ap5A as the starting structure [196]. That structure can be approximated as a sphere whose radius is 23 Å, leading to a prediction of a radius of 30 Å for the collapsed ensemble of AK molecules under folding conditions. The mean distance between two points randomly located within a sphere is 1.03 fold larger than the radius of the sphere, i.e., almost identical to the sphere radius. Based on these estimations of the radius of the collapsed globule (30 Å), and taking into account that FRET-determined distances can be extended by 3 Å due to the size of the probes, any result of the end-to-end distance of a chain segment that is longer than a few persistence lengths [200] smaller than 26 Å can be considered as an indication of bias in the randomly collapsed ensemble and of specific interactions. Similarly, significantly larger mean distances are also indications of specific interactions that cause a deviation from the randomness of the collapsed ensemble. In the collapsed ensemble of the AK molecules, the dependence of the segmental end-to-end distances for segments that are larger than approx. 15 residues should be very weak and they are expected to be close to the radius of the collapsed globule. Ratner et al. [110] reported the transient end-to-end distance of three segments of the AK molecule at 5 ms after the transfer from denaturing to folding conditions. These segments, which were shown to fold only in the seconds time regime (much longer than 5 ms), had n 5, 20, and =1 176 residues, and their corresponding mean end-to-end distances under folding conditions prior to the folding transition were 22, 25, and 26 Å respectively. studies were focused on the ensemble of collapsed molecules, and, hence, even mutants in which the labeling procedures caused moderate perturbation detected by the control experiments could be used in this study since, in the mostly disordered ensembles of interest, such perturbations have no effect. Stopped flow double kinetics studies A measurement system developed by Ratner and Haas [114] was based on low-frequency (10 MHz) laser pulses and a fast digitizer oscilloscope. The time resolution of the spectroscopic time scale, ts, in this mode of double kinetics experiment is 250 ps. Up to 20 fluorescence decay curves can be measured with an acceptable signalto-noise ratio using a single stopped-flow run. The time resolution was further enhanced by using a femtosecond laser source and averaging multiple emission pulses [113] (Figure 4). AK is fully disordered at 2 mol/L of GndHCl at pH 7. Upon transfer to the folding conditions (by fast dilution of the denaturant), the initial transient collapsed ensemble of AK conformers appears disordered and refolds to a native structure through an apparent (depending on the probe and detection method) two-state mechanism with a rate constant of 0.5 s­1. At the end of the dead time of the stopped flow device, all the mutants showed a fast increase in the transfer efficiency and a correspondingly reduced mean distance. This is a clear manifestation of the expected non-specific fast collapse upon transfer to a poor solvent. trFRET determination of the dimensions of the collapsed ensemble of AK Three mutant pairs were used to monitor the end-to-end distance of very long chain segments [131 residues (73­ 203); 176 residues (28­203), and 186 residues (18­203)] (Figure 5A). The last two segments include all three domains of the protein, but the labels are located in the CORE domain and report the intramolecular distances in that domain. In the collapsed ensemble, at the end of the dead time of the stopped flow device, the mean of the distribution of the intramolecular distances of the 131-residue segment was 26 Å, in accordance with the ±2 expected radius of the ensemble of collapsed AK molecules. The distance between residues 18 and 203 was larger, although much smaller than their distance in the denatured state. The full width at half maximum (FWHM) Detection of the initial sub-domain structures in the collapsed ensemble of refolding molecules The next step was a test of the loop hypothesis by searching for the rate of closure of long loops in the AK molecule using the FRET-detected double kinetics method. Equipped with the capacity to probe selected fast-changing distributions of distances even at nanosecond time intervals, we designed and measured the folding transition of a series of mutants, monitoring mainly putative loop closure and secondary structure elements. Our Orevi et al.: The loop hypothesis of protein folding181 Figure 4Instrument for the time-resolved FRET measurements in the stopped flow mode. The 297-nm third harmonics beam of the Ti:sapphire laser was used as the excitation source, operating with a repetition rate of 8 MHz. The excitation beam was focused at the center of the stopped flow cell along its long dimension. Emission was collected at 90°, filtered, and then focused onto the photocathode of the micro-channel plate photomultiplier tube. The signal was sampled at 40 GSa/s with 13 GHz of bandwidth. A digital delay generator was used for the triggering of each data acquisition sweep of the oscilloscope. A photodiode module was used to determine the time position of every excitation pulse. of the distribution of this distance at the initial collapsed state was relatively small, which is an indication of a deviation from full disorder in the ensemble, i.e., the 5-ms collapsed ensemble was not randomly collapsed. It is possible that the spherical approximation is not valid for this state due to several specific interactions in the N terminal section of the chain. The mean of the distribution of the intramolecular distance between residues 28 and 203 in the collapsed ensemble was also larger than 26 Å and only 10% larger than that found for the native state ensemble. This is probably also an effect of the specific interactions of the fast closed loops of the CORE domain as manifested by the relatively small width of the distributions in the collapsed ensemble. These results help establish a reference. reduction of the sub-population characterized by a larger mean distance of the collapsed ensemble. At equilibrium, all AK molecules reached the native distance between residues 18 and 203 (Figure 6). The native ensemble has a high degree of order, which is reflected in the narrow intramolecular distance distribution characterized by a mean distance of 15.2 .4 Å and width (FWHM) of 8 . The rate of ±0 ±1 the growth of the native subpopulation was 0.26 .06 s­1, ±0 which is lower than the apparent rate constant obtained from the mean FRET efficiency (0.43 .01 s­1). This differ±0 ence is a typical result of the non-linearity of the distance dependence of the FRET efficiency. A small increase in the short distance sub-population causes a large increase in the mean transfer efficiency when the change brings the pair closer to Ro. Two subpopulations Analysis of the double kinetics experiment monitoring the donor fluorescence decay in the 18­203 mutants during the folding transition revealed a gradual change of proportion between the two subpopulations. As the folding proceeded, a sub-population with native-like intramolecular distance increased concomitantly, with a gradual Fast closure of long loops in the AK CORE domain Seven pairs of sites at the ends of sub-domain structures of interest were labeled in six sets of mutants comprising several groups (Table 1): (a) The N terminal loop [residues 1­24; Figure 5C (loop I)] and the AMPbind domain loop 182Orevi et al.: The loop hypothesis of protein folding Figure 5Ribbon diagram representing the fast- and slow-folding sub-domain loop and the secondary structure elements marked on the native-state chain fold of the backbone of the E. coli AK molecule (PDB ID code:4AKE). (A) Secondary structure elements of the AK molecule (color coded). Those that remained disordered at 5 ms after the initiation of the mixing into the folding buffer (the dead time) are colored in black. Those that gained native-like mean end-to-end distance at the 5-ms detection time are colored green. Parts of the chain that were not tested are colored in gold. Most of the segments that formed secondary structures that constituted the core domain of AK were still disordered in the initial phase of the folding transition. (B) Pairs of sites that were separated by long chain segments that were labeled and studied by the stopped flow-based kinetics methods. (C) Seven loops of the nativestate structure of the AK molecule that were labeled and used in the folding kinetics studies. (loop II) (residues 29­73, Figure 5C), which are typical elementary long closed loop structures; (b) two loop elements (loop III residues 1­86 and loop IV residues 1­109; Figure 5C), which represent "merged" loop elements, i.e., long segments ( 0 residues) whose ends are in close >4 contact, stabilized by hydrophobic interactions that also enclose at least one elementary loop within their boundaries; (c) two loops associated with the LID domain (loop V, residues 121­155) and its extended version (loop VI residues 113­169); and (d) a small loop included in loop IV (residues 66­95). Replacement of any one of the six N terminal residues of the AK backbone caused a loss of expression, which is an indication of their importance for the folding mechanism. Therefore, to label the N terminus of the AK molecule, we extended the polypeptide backbone using a four-residue insert (Met­4-Lys­3-Cys­2-Ala­1). The inserted Cys residue at position ­2 was labeled in mutants designed to study the folding of the closed loop structures (I, III, and IV). A scheme of the N terminal loops of AK and the non-polar residues that may contribute interactions in the loop nodes are shown in Figure 7. Kinetics of sub-domain transitions The closure of the seven labeled loops in the AK molecule was monitored by the determination of the mean FRET efficiency, (t) in the stopped flow device. The closure , of the two N terminal loops (I and II) was also studied Orevi et al.: The loop hypothesis of protein folding183 A 0.06 0.04 p (r) 0.02 0 0.03 0.02 p (r) 5 ms: collapsed state 30% 1.5 s: appearance of native population 70% C 0.04 0.03 p (r) 0.02 55% 0.01 0 3 s: partial folding 45% D 0.06 0.04 p (r) 0.02 6 s: advanced folding 80% 20% p (r) 0 0.1 Equilibrium 0.05 40 60 Distance, Å Figure 6The relative population (percent) of the native ensemble of the mutant labeled at residues 18 and 203 as a function of time after the initiation of refolding. Fitting to a mono-exponential function (y +be­kt) gave a rate =a constant of 0.26 .06 s­1 for the transition from collapsed to native ±0 ensemble. The uncertainty in the value of the folded ensemble parameter at 1.5, 3, and 6 s was %. ±4 i.e., in the seconds time regime. For loop IV, which is a long "merged" loop containing the three fast closing N terminal loops, a biphasic time course was observed, i.e., a fast partial closure synchronized with the closure of the included N terminal loops, followed by a slow full closure to native-like end-to-end distance. This pattern of fast closure of only a few loops and a slow closure of others is a strong indication for the specificity in the folding pathway and of a hierarchic folding "plan". In loops I and II, which were closed first, the number of residues between the labeled sites were 26 and 44, respectively. It is reasonable to assume that, in a fully disordered but collapsed ensemble, the corresponding mean of the distributions of segment end-to-end distances would also be in the range of 26 Å. The rapidly closed loops are associated with the ±2 N terminal section of the CORE domain, and it is therefore reasonable to assume that they contribute to the formation of a non-contiguous folding nucleus. Loop I includes the cluster of N-terminal non-polar residues (1­6), which form several non-local interactions in the native state and thus seem to be the "king pin" of the native structure of the CORE domain. Several attempts to produce foldable AK mutants in which residues 2, 3, or 4 were replaced by Cys or Trp or Ala failed. These are further indications that the residues responsible for the closure of loop I and are also connected to loops III and IV are essential for the folding of the whole molecule, probably by involvement in the folding initiation step as an apparent chain folding initiation site [202]. The slow closure of loops V and VI, which include the LID domain, shows that this domain is not natively folded at the completion of the fast mixing. The kinetics of change of FRET efficiency between the ends of three labeled chain segments, which included sections of loop II and additional chain sections at its C terminal end, were monitored (Table 1). These segments, which included part of loop II, reached native-like mean transfer efficiency ( (t) only at the rate-limiting step, i.e., slow. That is ) an indication that the internal sections of loop II are not folded prior to the rate-limiting step. Internal segments in the closed loops seem to be partially disordered. by the double kinetics experiment (Figures 8 and 9). The micromilli second kinetics of change of (t) between the ends of the three N terminal loops, I, II, and III, were native-like within the dead time of the stopped flow device [111, 201]. In sharp contrast, the FRET efficiency between the ends of loops V, VI, and VII increased at a much slower rate, comparable to the rate of the global folding transition, Microsecond kinetics Preliminary results of the application of double kinetics experiments based on a continuous mixing device done in the Bilsel et al. [112] laboratory revealed microsecond kinetics of closure of loops I and II. Within 60 s, both loops were close to the native end to end, and within 250 s, full closure of the loop was observed. 184Orevi et al.: The loop hypothesis of protein folding Table 1Pairs of sites that were labeled in the AK molecule and monitored by folding kinetics experiments.a Structural element Residues labeled Time of transition to native mean distance of FRET efficiencyb Mean distance in the collapsed ensemble (Å)c 20 17 25 ±1 22 ±1 19.4 .8 ±0 0 Å >3 26 Å ±2 0 Å >3 26 Å ±2 nd Loops Loop I (residues 1­26) Loop II (residues 29­73) Merged loop III (residues 1­75) Extended loop IV (1­102) Loop V, the LID domain (121­155) Loop VI, "extended" loop V Loop VII, internal section of loop IV Secondary structure elements Helix 8 Strand 9 (192­198) Strand 1 (1­6) Strand 3 (79­85) Strand 4 (104­110) Long segments incorporating the three domains Overall dimensions (186 residues) Overall dimensions (176 residues) Overall dimensions (131 residues) Section of loop II Section of loop II ­2 and 24 28 and 71 ­2 and 75 ­2 and 102 121 and 155 113 and 169 66 and 95 169 and 188 188 and 203 ­2 and 8 79 and 86 104 and 109 18 and 203 28 and 203 73 and 203 28 and 86 58­86 00 s <2 00 s <2 ms <5 Partial fast ( ms) closure <5 Slow Slow Slow Slow at the cooperative folding transition Slow Slow Slow Slow Slow two-state transition Close to native in the collapse Slow Slow This table summarizes the results of many experiments. b"Slow" indicates that the transition to the native value of the parameters is in the seconds time regime. cMean distance between the ends of the labeled segment in the collapsed state at the end of the dead time of the mixing (either stopped flow or continuous mixing) available for the mutants studied by the double kinetics experiment. For the other mutants, the rate of folding was deduced only from the rate of change of the mean FRET efficiency. dNumber of residues in the labeled chain segment. Loop IV Loop III Loop I (N terminal) 2 Loop II C MRIILLGAPGAGKGTQAQFIMEKYGIPQISTGDMLRAAVKSGSELGKQAKDIMDAGKLVTDELVIALV KERIAQEDCRNGFLLDGFPRTIPQADAMKEAGINVDYVLEFDV Figure 7Schematic representation of the loop structures in the CORE domain of the AK molecule. (A) The backbone fold of the AK molecule (PDB 4AKE). The hydrophobic clusters of residues involved in the formation of the non-local contacts that formed the nodes of the closed loops are colored. (B) Schematic representation of the native structure loops formed by the chain segments that connected the interacting clusters. (C) The amino acid sequence of the N terminal section of the AK molecule highlighting the clusters of hydrophobic residues that form the loop nodes [color coded as in (A) and (B)]. Kinetics of folding of secondary structure elements in the AK molecule In order to test the loop hypothesis, we wished to answer the question whether long loop closure in the CORE domain of the AK molecule is dependent on the earlier formation of secondary structure elements or not. Five secondary structure elements in the CORE domain were labeled (Table 1), and the folding kinetics were studied by the FRET-detected stopped flow system. Orevi et al.: The loop hypothesis of protein folding185 5 ms 0.5 s 0.1 1s 1.5 s P (r) 3s 0.05 6s 9s 5 min Unfolded 0 40 Distance, Å Figure 8Intra-molecular distance distribution between resides 71 and 28 during refolding and at equilibrium. The broken black line indicates the distance distribution under denaturing conditions (2 mol/L of GndHCl) measured with the tryptophan and coumarin FRET pair (R0 4 Å). The mean and FWHM =2 in the denatured state were 38 and 34 Å, respectively. The blue and black solid lines mark the 5 ms transient distribution and the distribution at 5 min after mixing, respectively. of the folding pathway of the AK molecule. In preliminary results, replacement of non-polar residues at the termini of loop II resulted in a reduced rate of loop closure. When residue Leu35 was replaced by Ala, the protein was destabilized and the yield of folding was extremely poor. The loss of protein expression following all attempts to mutate residues 2­4 is another possible example of the effect of loss of hydrophobic interactions in the loop nodes. Our results are in good agreement with the sequential collapse model for protein folding [203, 204] and several theoretical works [21, 43, 205, 206]. The overview of the FRET-detected AK folding experiments reviewed here portrays a sequence of sub-domain and global transitions that form a hierarchically ordered pathway spread over four (or more) time scales, from microseconds to seconds (Figure 10). Concluding remarks The protein folding problem will be considered "solved" when we are able to "read" genes, i.e., to predict the native fold of the proteins, their dynamics, and the mechanism of fast folding, based simply on primary amino acid sequence. The long-term goal in folding studies is the development of an algorithm that would simulate the stepwise mechanism of folding, which constrains the conformational space in which a random search for stable interactions is possible. Our working hypothesis is that the fast and efficient folding transition is based on the ordered formation of sub-domain structures that are "instructed" by sequence signals in terms of intramolecular interactions between clusters of residues, and between them and the solvent. The very long term goal of our studies is to create a "dictionary" of generic sequence clusters that can form loop nodes by non-local interactions and to define the order of formation of their interactions during the search for the native combination of interactions. This dictionary could be part of a database and used as part of the folding prediction algorithm. Here we focused on only a single element in the complex problem, the structural constraints generated by the loop closure and the secondary structure, and applied trFRET-based methods to capture the timing of specific steps at the initiation of the folding transition. The results reviewed here demonstrate the power of the FRET-based methods in studying sub-domain transitions in situ. Non-local contacts were detected by other methods (e.g., hydrogen exchange [207­210]), but FRET measurements are unique in their ability to directly yield All of them were disordered at the initial phase of the folding pathway [110, 152]. Analysis of the trFRET double kinetics experiments showed the shift of the mean endto-end distance between the ends of strands and the rate of reduction of FWHM of the distance distributions. Strands that were included in long closed loops (e.g., 1 and 3) were disordered, while the loops (I and III) were closed. This was also the case of strand 4 which forms the C terminal side of the node of loop IV (with strand 1 on the N terminal side). Thus, the closure of loops I, III, and IV is not dependent on the secondary structures of their nodes. The segment forming the long helix 8 (20 residues, 170­189) was also fully disordered at 5 ms and changed to the native end-to-end distance at the rate of the cooperative folding transition of the AK molecule [110]. Strand 9, the fifth (C terminal) strand of the (parallel) sheet structure that dominates the CORE domain of the AK molecule (labeled at residues 188 and 203), also formed only in the slow cooperative transition of the folding transition. Thus, we can conclude that, at least in some cases, loop nodes were formed independently of the formation of secondary structure elements, particularly in the case of strands. The aforementioned results provide strong evidence in support of the loop hypothesis. Yet, further tests must be performed in order to firmly establish the role of the early closed long loop in the initiation and propagation 186Orevi et al.: The loop hypothesis of protein folding A 104 15 ms 1.5 s 3.0 s Unfolded 15 ms 1.5 s 3.0 s 4.5 s 5 min Photon counts 5.0 min Probability 4.5 s 0.00 102 0 1 2 3 4 5 Time, ns 0.20 0.15 Probability 0.10 0.05 0.00 0 10 20 30 40 50 60 6 7 8 0 10 20 30 40 50 Distance, Å Unfolded 15 ms 1.5 s 3.0 s 4.5 s 5 min Distance, Å Figure 9A series of transient segmental end-to-end distance distribution monitoring the progress of folding of the N terminal loop I and strand 3 obtained by trFRET measurement in the double kinetics experiment. (A) A series of fluorescence decay curves of the tryptophan residue 24 in the absence of an acceptor at residue ­2. Each curve was collected during a 2-ms time interval at the predetermined refolding time point as indicated by the color code. The black traces represent the best-fit calculated decay curve. The gradual reduction of the characteristic of the probe (mean fluorescence lifetime), which reflects local structural changes, is in contrast to the immediate change of the distance between the ends of loop I (loop closure), which was found by global analysis of the series of DO and DA experiments. (B) A series of transient distributions of the distance between the ends of the segment forming strand 3, obtained by global analysis of the DO and DA double kinetics experiment of the mutant AK [24, 79]. The distance distribution of the 15-ms species ( ) fully overlaps with that obtained at equilibrium under unfolding conditions ( ). The narrow native-like end-to-end distance distribution characteristic of the extended conformation of the native strand structure was completed only 3 s after the refolding initiation ( ). (C) Series of transient distributions of the distance between the labeled ends of the N terminal loop I obtained by global analysis of the transient fluorescence decay curves of the Trp residue in the DO and DA mutants by means of the double kinetics experiment. The 15-ms distribution ( ) fully overlaps with the native fold distribution ( ). Thus this loop is closed within or immediately after the initiation of folding. The shift of the mean of the distribution to longer distance and the reduction of the corresponding width values at intermediate time points ( , and ) indicate that the early formed closed loop structure later deforms during the slow ordering of the chain segments, which seem to perturb the loop closure interaction. distributions of intramolecular distances while the rest of the molecule is "transparent". We believe that this approach, or other methods with the ability to characterize the distribution of intramolecular distances at selected sections of the protein backbone, together with ultrafast data collection, should be an essential component of the arsenal of the protein folding research. The kinetics of formation of specific key sub-domain structures should be probed directly, in situ, on the background of the rest of the chain, where mutual influences exist. The possible significance of the loop closure effect in folding in vivo is an intriguing question that might be addressed with currently available FRET and timeresolved fluorescence methods [77, 211­214]. Co-translational folding of nascent polypeptides on the ribosome is a subject of a major field of current research. N terminal Orevi et al.: The loop hypothesis of protein folding187 Figure 10Partial map of the folding pathway of AK. The folding of E. coli adenylate kinase (AK) included transitions whose rates differed by six orders of magnitude. The denatured ensemble (D) assumed to be fully disordered collapsed at a sub-microsecond (?) time scale to disordered compact globule (C). Closure of loops I and II formed I1. This was followed by additional loop closure (cross-links) and secondary structure sub-domain elements in the millisecond-tosecond time scale. The rate-determining step led to the native state (N). loops can fold on the ribosome or upon emergence and thus might have a major role in the folding and trafficking of proteins. The ideal experiment should be able to monitor multiple intramolecular distances in all parts of the molecule, in situ, with sub-microsecond time resolution and with a spatial resolution of single angstrom units. Moreover, since the folding transition starts from an ensemble of a very large number of conformations, the early folding transitions should be studied, among other means, by methods that can yield probability distributions of selected intramolecular distances that report the folding status of key segment structures and their development from the moment of transition to folding conditions. For that reason, we chose to apply and further develop the trFRET-based methods and, in particular, the double kinetics method. A major task for the near future is enhancement of the resolution of the "chemical time base" (by faster initiation of the refolding transition and by reduction of the instrument dead time) for routine multiple double kinetics experiments. New devices have been developed [215], and we hope that they will become more practical for routine applications. The results reviewed here, obtained mainly from studying just a few model proteins, support the counterintuitive mechanism whereby the NLIs are effective in the initiation of the folding pathways. Alternatively, it is possible that there are two types of folding mechanism, one in which NLIs are dominant and the other in which the LIs predominate. This was suggested by Daggett and Fersht [57] and is supported by several results reviewed here. It is reasonable to assume that a combination of both local and non-local interactions is effective at the early steps of folding of many proteins. The challenge for future work is to develop the experimental approach that will enable the determination of the relative contribution of both types of interaction, the exact timing of the formation of each interaction at high resolution, the specificity of selected early non-local interactions, and the extent of inter-dependence of selected early interactions. We suggest that mapping multiple sub-domain structural transitions during the refolding transition of many proteins will refine the conclusions and help reveal some common principles of the initiation of the folding to enable the first steps towards the preparation of a folding "dictionary". In order to achieve this goal, the trFRET measurements should be combined with mutagenesis experiments in which the role of selected residue clusters will be tested by perturbation mutations. A positive test for the loop closure capacity of nodes forming sequence clusters can be achieved by measurements of the distribution of end-to-end distance of flexible polypeptides whose ends include such clusters of residues. Yet, we remember that the solution to the protein folding problem depends on many more approaches, both experimental and theoretical, while the approach presented here is only small portion of the big puzzle. Acknowledgments: We are grateful to Mr. E. Zimerman and D. Freedman for excellent technical assistance. We are grateful to Eldad Ben Ishay, Eitan Lerner, and Asaf Grupi for their contributions and discussions. Author contributions: All the authors have accepted the responsibility for the entire content of this submitted manuscript and approved the submission. Research funding: This study was supported by grants from the Israel Science Foundation (ISF1464/10 and the I-CORE 1902/12), the EU Marie Curie TOK grant (29936), the US-Israel Binational Science Foundation (BSF 2011143), and by the Damadian Center for Magnetic Resonance Research, Bar-Ilan University. Employment or leadership: None declared. 188Orevi et al.: The loop hypothesis of protein folding Honorarium: None declared. Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication. dopsis thaliana cytochrome c(6A). Biochim Biophys Acta 2012;1824:311­8. 21. Berezovsky IN, Trifonov EN. Loop fold nature of globular proteins. Protein Eng 2001;14:403­7. 22. Berezovsky IN, Trifonov EN. Van der Waals locks: loop-n-lock structure of globular proteins. J Molec Biol 2001;307:1419­26. 23. Hubner IA, Edmonds KA, Shakhnovich EI. Nucleation and the transition state of the SH3 domain. J Mol Biol 2005;349:424­34. 24. Lindorff-Larsen K, Vendruscolo M, Paci E, Dobson CM. Transition states for protein folding have native topologies despite high structural variability. Nat Struct Mol Biol 2004;11:443­9. 25. Lindorff-Larsen K, Rogen P, Paci E, Vendruscolo M, Dobson CM. Protein folding and the organization of the protein topology universe. Trends Biochem Sci 2005;30:13­9. 26. Paci E, Clarke J, Steward A, Vendruscolo M, Karplus M. Selfconsistent determination of the transition state for protein folding: application to a fibronectin type III domain. Proc Natl Acad Sci USA 2003;100:394­9. 27. Geierhaas CD, Paci E, Vendruscolo M, Clarke J. Comparison of the transition states for folding of two Ig-like proteins from different superfamilies. J Mol Biol 2004;343:1111­23. 28. Lappalainen I, Hurley MG, Clarke J. Plasticity within the obligatory folding nucleus of an immunoglobulin-like domain. J Mol Biol 2008;375:547­59. 29. Sosnick TR, Dothager RS, Krantz BA. Differences in the folding transition state of ubiquitin indicated by phi and psi analyses. Proc Natl Acad Sci USA 2004;101:17377­82. 30. Krantz BA, Dothager RS, Sosnick TR. Discerning the structure and energy of multiple transition states in protein folding using psi-analysis. J Mol Biol 2004;337:463­75. 31. Tsong TY, Hu CK, Wu MC. Hydrophobic condensation and modular assembly model of protein folding. Biosystems 2008;93:78­89. 32. Fulton KF, Main ER, Daggett V, Jackson SE. Mapping the interactions present in the transition state for unfolding/folding of FKBP12. J Mol Biol 1999;291:445­61. 33. Samatova EN, Katina NS, Balobanov VA, Melnik BS, Dolgikh DA, Bychkova VE, et al. How strong are side chain interactions in the folding intermediate? Protein Sci 2009;18:2152­9. 34. Rader AJ, Yennamalli RM, Harter AK, Sen TZ. A rigid network of long-range contacts increases thermostability in a mutant endoglucanase. J Biomol Struct Dyn 2012;30:628­37. 35. Go N, Taketomi H. Respective roles of short- and long-range interactions in protein folding. Proc Natl Acad Sci USA 1978;75:559­63. 36. Taketomi H, Ueda Y, Go N. Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions. Int J Pept Protein Res 1975;7:445­59. 37. Abkevich VI, Gutin AM, Shakhnovich EI. Impact of local and nonlocal interactions on thermodynamics and kinetics of protein folding. J Mol Biol 1995;252:460­71. 38. Dokholyan NV, Buldyrev SV, Stanley HE, Shakhnovich EI. Identifying the protein folding nucleus using molecular dynamics. J Mol Biol 2000;296:1183­8. 39. Hubner IA, Oliveberg M, Shakhnovich EI. Simulation, experiment, and evolution: understanding nucleation in protein S6 folding. Proc Natl Acad Sci USA 2004;101:8354­9. 40. Hubner IA, Shimada J, Shakhnovich EI. Commitment and nucleation in the protein G transition state. J Mol Biol 2004;336: 745­61.

Journal

Bio-Algorithms and Med-Systemsde Gruyter

Published: Dec 19, 2014

References