Protein structure



NMR spectroscopy, to determine the structure of proteins.

A number of residues is necessary to perform a particular actin molecules assemble into a collagen filament.

Levels of protein structure

  Biochemistry refers to four distinct aspects of a protein's structure:

  • Primary structure - the amino acid sequence of the peptide chains.
  • Secondary structure - highly regular sub-structures (alpha helix and strands of beta sheet) which are locally defined, meaning that there can be many different secondary motifs present in one single protein molecule.
  • Tertiary structure - Three-dimensional structure of a single protein molecule; a spatial arrangement of the secondary structures.
  • Quaternary structure - complex of several protein molecules or polypeptide chains, usually called protein subunits in this context, which function as part of the larger assembly or protein complex.

In addition to these levels of structure, a protein may shift between several similar structures in performing its biological function. In the context of these functional rearrangements, these tertiary or quaternary structures are usually referred to as chemical conformation, and transitions between them are called conformational changes.

The primary structure is held together by protein biosynthesis or translation. These peptide bonds provide rigidity to the protein. The two ends of the amino acid chain are referred to as the C-terminal end or carboxyl terminus (C-terminus) and the N-terminal end or amino terminus (N-terminus) based on the nature of the free group on each extremity.

The various types of secondary structure are defined by their patterns of hydrogen bonds between the main-chain peptide groups. However, these hydrogen bonds are generally not stable by themselves, since the water-amide hydrogen bond is generally more favorable than the amide-amide hydrogen bond. Thus, secondary structure is stable only when the local concentration of water is sufficiently low, e.g., in the molten globule or fully folded states.

Similarly, the formation of molten globules and tertiary structure is driven mainly by structurally non-specific interactions, such as the rough propensities of the amino acids and hydrophobic interactions. However, the tertiary structure is fixed only when the parts of a protein domain are locked into place by structurally specific interactions, such as ionic interactions (salt bridges), hydrogen bonds and the tight packing of side chains. The tertiary structure of extracellular proteins can also be stabilized by disulfide bonds, which reduce the entropy of the unfolded state; disulfide bonds are extremely rare in cytosolic proteins, since the cytosol is generally a reducing environment.

Structure of the amino acids

  An α-amino acid consists of a part that is present in all the amino acid types, and a side chain that is unique to each type of residue. The Cα atom is bound to 4 different molecules (the H is omitted in the diagram); an amino group, a carboxyl group, a hydrogen and a side chain, specific for this type of amino acid. An exception from this rule is isomers occur in biological proteins. Glycine however, is not chiral since its side chain is a hydrogen atom. A simple mnemonic for correct L-form is "CORN": when the Cα atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction.   The side chain determines the chemical properties of the α-amino acid and may be any one of the 20 different side chains:

Name  (Residue) 3-letter
code
Single
code
Relative
abundance
(%) E.C.
MW pK VdW volume
(ų)
Charged,
Polar,
Hydrophobic,
Neutral
Alanine ALA A 13.0 71   67 H
Arginine ARG R 5.3 157 12.5 148 C+
Asparagine ASN N 9.9 114   96 P
Aspartate ASP D 9.9 114 3.9 91 C-
Cysteine CYS C 1.8 103   86 P
Glutamate GLU E 10.8 128 4.3 109 C-
Glutamine GLN Q 10.8 128   114 P
Glycine GLY G 7.8 57   48 N
Histidine HIS H 0.7 137 6.0 118 P,C+
Isoleucine ILE I 4.4 113   124 H
Leucine LEU L 7.8 113   124 H
Lysine LYS K 7.0 129 10.5 135 C+
Methionine MET M 3.8 131   124 H
Phenylalanine PHE F 3.3 147   135 H
Proline PRO P 4.6 97   90 H
Serine SER S 6.0 87   73 P
Threonine THR T 4.6 101   93 P
Tryptophan TRP W 1.0 186   163 P
Tyrosine TYR Y 2.2 163 10.1 141 P
Valine VAL V 6.0 99   105 H

The 20 naturally occurring amino acids can be divided into several groups based on their chemical proporties. Important factors are charge, hydrophobicity/hydrophilicity, size and functional groups. The nature of the interaction of the different side chains with the aqueous environment plays a major role in molding protein structure. Hydrophobic side chains tends to be buried in the middle of the protein, whereas hydrophilic side chains are exposed to the solvent. Examples of hydrophobic residues are: Leucine, isoleucine, phenylalanine, and valine, and to a lesser extent tyrosine, alanine and tryptophan. The charge of the side chains plays an important role in protein structures, since ion bonding can stabilize proteins structures, and an unpaired charge in the middle of a protein can disrupt structures. Charged residues are strongly hydrophilic, and are usually found on the out side of proteins. Positively charged side chains are found in lysine and arginine, and in some cases in histidine. Negative charges are found in glutamate and aspartate. The rest of the amino acids have smaller generally hydrophilic side chains with various functional groups. Serine and threonine have hydroxylgroups, and aspargine and glutamine have amide groups. Some amino acids have special properties such as cysteine, that can form covalent disulphide bonds to other cysteines, proline that is cyclical, and glycine that is small, and more flexible than the other amino acids.

The peptide bond

    Two amino acids can be combined in a bond lengths are given in the table below.

Peptide bond Average length Single bond Average length Hydrogen bond Average (±30)
Ca - C 153 pm C - C 154 pm O-H --- O-H 280 pm
C - N 133 pm C - N 148 pm N-H --- O=C 290 pm
N - Ca 146 pm C - O 143 pm O-H --- O=C 280 pm

Primary structure

Main article: Primary structure

The sequence of the different amino acids is called the genetic code. Post-transcriptional modifications such as disulfide formation, phosphorylations and glycosylations are usually also considered a part of the primary structure, and cannot be read from the gene.

Secondary structure

Main article: Secondary structure

By building models of peptides using known information about bond lengths and angles, the first elements of secondary structure, the random coil. Each of these two secondary structure elements have a regular geometry, meaning they are constrained to specific values of the dihedral angles ψ and φ. Thus they can be found in a specific region of the Ramachandran plot.

 

Here are some more representation of the same helix.
     

 


Turns, loops and a few other secondary structure elements such as a 3-10 helix complete the picture. We have now enough pieces to assemble a complete protein, displaying its typical tertiary structure.

Tertiary structure

Main article: tertiary structure

The elements of secondary structure are usually folded into a compact shape using a variety of loops and turns. The formation of tertiary structure is usually driven by the burial of hydrophobic residues, but other interactions such as hydrogen bonding, ionic interactions and disulfide bonds can also stabilize the tertiary structure. The tertiary structure encompasses all the noncovalent interactions that are not considered secondary structure, and is what defines the overall fold of the protein, and is usually indispensable for the function of the protein.

Quaternary structure

Main article: Quaternary structure

The quaternary structure is the interaction between several chains of peptide bonds. The individual chains are called subunits. The individual subunits are not necessarily covalently connected, but might be connected by a disulfide bond. Not all proteins have quaternary structure, since they might be functional as monomers. The quaternary structure is stabilized by the same range of interactions as the tertiary structure. Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers. Specifically it would be called a dimer if it contains two subunits, a trimer if it contains three subunits, and a tetramer if it contains four subunits. Multimers made up of identical subunits may be referred to with a prefix of "homo-" (e.g. a homotetramer) and those made up of different subunits may be referred to with a prefix of "hetero-" (e.g. a heterodimer). Tertiary structures vary greatly from one protein to another. They are held together by glycosydic and covalent bonds.

Side chain conformation

The atoms along the side chain are named with Greek letters in Greek alphabetical order: α, β, γ, δ, є and so on. Cα refers to the carbon atom closest to the carbonyl group of that amino acid, Cβ the second closest and so on. The Cα is usually considered a part of the backbone. The dihedral angles around the bonds between these atoms are named χ1, χ2, χ3 etc. E.g. the first and second carbon atom in the side chain of lysine is named α and β, and the dihedral angle around the α-β bond is named χ1. Side chains can be in different conformations called gauche(-), trans and gauche(+). Side chains generally tend to try to come into a staggered conformation around χ2, driven by the minimization of the overlap between the electron orbitals of the hydrogen atoms.

Domains, motifs, and folds in protein structure

Many proteins are organized into several units. A amino acids. This is true not only because of the complicated relationship between tertiary and primary structure, but because the size of the elements varies from one protein and the next. Despite the fact that there are about 100,000 different proteins expressed in eukaryotic systems, there are much fewer different domains, structural motifs and folds. This is partly a consequence of evolution, since genes or parts of genes can be doubled or moved around within the genome. This means that, for example, a protein domain might be moved from one protein to another thus giving the protein a new function. Because of these mechanisms pathways and mechanisms tends to be reused in several different proteins.

Protein folding

Main article: Protein folding

The process by which the higher structures form is called protein folding and is a consequence of the primary structure. A unique polypeptide may have more than one stable folded conformation, which could have a different biological activity, but usually, only one conformation is considered to be the active, or native conformation.

Structure classification

Several methods have been developed for the structural classification of proteins. These seek to classify the data in the Protein Data Bank in a structured order. Several databases exist which classify proteins using different methods. SCOP, CATH and FSSP are the largest ones. The methods used are purely manual, manual and automated, and purely automated. Work is being done to better integrate the current data. The classification is consistent between SCOP, CATH and FSSP for the majority of proteins which have been classified, but there are still some differences and inconsistencies.

Protein structure determination

Around 90% of the protein structures available in the Protein Data Bank have been determined by Cryo-electron microscopy has recently become a means of determining protein structures to low resolution (less than 5 angstroms or 0.5 nanometer) and is anticipated to increase in power as a tool for high resolution work in the next decade. This technique is still a valuable resource for researchers working with very large protein complexes such as virus coat proteins and amyloid fibers.

A rough guide to the resolution of protein structures
ResolutionMeaning
>4.0 Individual coordinates meaningless
3.0 - 4.0Fold possibly correct, but errors are very likely. Many sidechains placed with wrong rotamer.
2.5 - 3.0Fold likely correct except that some surface loops might be mismodelled. Several long, thin sidechains (lys, glu, gln, etc) and small sidechains (ser, val, thr, etc) likely to have wrong rotamers.
2.0 - 2.5As 2.5 - 3.0, but number of sidechains in wrong rotamer is considerably less. Many small errors can normally be detected. Fold normally correct and number of errors in surface loops is small. Water molecules and small ligands become visible.
1.5 - 2.0Few residues have wrong rotamer. Many small errors can normally be detected. Folds are extremely rarely incorrect, even in surface loops.
0.5 - 1.5In general, structures have almost no errors at this resolution. Rotamer libraries and geometry studies are made from these structures.

Computational prediction of protein structure

The generation of a Threading uses existing protein structures.

Rosetta@home is a distributed computing project which tries to predict the structures of proteins with massive sampling on thousands of home computers.

Software

There are many available software packages, such as free web-based STING, used to visualize and analyze protein structures. Another example is the FeatureMap3D web-server which can visualize the quality of a protein-protein alignment in 3D and be used to map sequence feature annotation such as the underlying Exon structure onto a protein structure.

Several packages, such as Quantum Pharmaceuticals software[2], can be used to predict conformational changes of proteins and its influence on protein's functions.

Several methods have been developed to compare structures of different proteins. Please see structural alignment.

Computational tools are also frequently employed to check experimental and theoretical models of protein structures for errors (examples: ProSA, NQ-Flipper, Verify3D, ANOLEA, WHAT_CHECK).

References

  1. ^ PAULING L, COREY RB, BRANSON HR. Proc Natl Acad Sci U S A. 1951 Apr;37(4):205-11. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. PMID 14816373
  2. ^ Quantum Pharmaceuticals software

Further reading

  • Habeck M, Nilges M, Rieping W (2005). "Bayesian inference applied to macromolecular structure determination". Physical review. E, Statistical, nonlinear, and soft matter physics 72 (3 Pt 1): 031912. PMID 16241487. (Bayesian computational methods for the structure determination from NMR data)
  • NQ-Flipper Check for unfavorable rotamers of Asn and Gln residues in protein structures
  • servers That check nearly 200 aspects of protein structure, like packing, geometry, unfavourable rotamers in general of for Asn, Gln and His especially, strange water molecules, backbone conformations, atom nomenclature, symmetry parameters, etc.
  • Bioinformatics course. An interactive, fully free, course explaining many of the aspects discussed in this wiki entry.
 
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Protein_structure". A list of authors is available in Wikipedia.