Protein sequencing



metabolic pathways to be invented more easily.

The two major direct methods of protein sequencing are DNA or mRNA sequence encoding the protein, if this is known. However, there are a number of other reactions which can be used to gain more limited information about protein sequences and can be used as preliminaries to the aforementioned methods of sequencing or to overcome specific inadequacies within them.

Determining amino acid composition

It is often desirable to know the unordered amino acid composition of a protein prior to attempting to find the ordered sequence, as this knowledge can be used to facilitate the discovery of errors in the sequencing process or to distinguish between ambiguous results. Knowledge of the frequency of certain amino acids may also be used to choose which protease to use for digestion of the protein. A generalised method for doing this is as follows:

  1. Hydrolyse a known quantity of protein into its constituent amino acids.
  2. Separate the amino acids in some way.
  3. Determine the respective quantities of the amino acids.

Hydrolysis

Hydrolysis is done by heating a sample of the protein in 6 Molar ammonia evolved to determine the extent of amide hydrolysis.

Separation

The amino acids can be separated by elution gradient.

Quantitative analysis

Once the amino acids have been separated, their respective quantities are determined by adding a reagent that will form a coloured derivative. If the amounts of amino acids are in excess of 10 nmol, ninhydrin can be used for this - it gives a yellow colour when reacted with proline, and a vivid blue with other amino acids. The concentration of amino acid is proportional to the absorbance of the resulting solution. With very small quantities, down to 10 pmol, fluorescamine can be used as a marker: this forms a fluorescent derivative on reacting with an amino acid.

N-terminal amino acid analysis

Determining which amino acid forms the N-terminus of a Edman degradation is often contaminated by impurities and therefore does not give an accurate determination of the N-terminal amino acid. A generalised method for N-terminal amino acid analysis follows:

  1. React the peptide with a reagent which will selectively label the terminal amino acid.
  2. Hydrolyse the protein.
  3. Determine the amino acid by chromatography and comparison with standards.

There are many different reagents which can be used to label terminal amino acids. They all react with amine groups and will therefore also bind to amine groups in the side chains of amino acids such as lysine - for this reason it is necessary to be careful in interpreting chromatograms to ensure that the right spot is chosen. Two of the more common reagents are Sanger's reagent (2,4-dinitrofluorobenzene) and dansyl derivatives such as dansyl chloride. Phenylisothiocyanate, the reagent for the Edman degradation, can also be used. The same questions apply here as in the determination of amino acid composition, with the exception that no stain is needed, as the reagents produce coloured derivatives and only qualitative analysis is required, so the amino acid does not have to be eluted from the chromatography column, just compared with a standard. Another consideration to take into account is that, since any amine groups will have reacted with the labelling reagent, ion exchange chromatography cannot be used, and high pressure liquid chromatography should be used instead.

C-terminal amino acid analysis

The number of methods available for C-terminal amino acid analysis is much smaller than the number of available methods of N-terminal analysis. The most common method is to add carboxypeptidases to a solution of the protein, take samples at regular intervals, and determine the terminal amino acid by analysing a plot of amino acid concentrations against time.

Edman degradation

The Edman degradation is a very important reaction for protein sequencing, because it allows the ordered amino acid composition of a protein to be discovered. Automated Edman sequencers are now in widespread use, and are able to sequence peptides up to approximately 50 amino acids long. A reaction scheme for sequencing a protein by the Edman degradation follows - some of the steps are elaborated on subsequently.

  1. Break any disulfide bridges in the protein by oxidising with performic acid.
  2. Separate and purify the individual chains of the protein complex, if there are more than one.
  3. Determine the amino acid composition of each chain.
  4. Determine the terminal amino acids of each chain.
  5. Break each chain into fragments under 50 amino acids long.
  6. Separate and purify the fragments.
  7. Determine the sequence of each fragment.
  8. Repeat with a different pattern of cleavage.
  9. Construct the sequence of the overall protein.

Digestion into peptide fragments Peptides longer than about 50-70 amino acids long cannot be sequenced reliably by the Edman degradation. Because of this, long protein chains need to be broken up into small fragments which can then be sequenced individually. Digestion is done either by endopeptidases such as pepsin or by chemical reagents such as cyanogen bromide. Different enzymes give different cleavage patterns, and the overlap between fragments can be used to construct an overall sequence.

The Edman degradation reaction

The peptide to be sequenced is trimethylamine. This reacts with the amine group of the N-terminal amino acid.

The terminal amino acid derivative can then be selectively detached by the addition of isomerises to give a substituted phenylthiohydantoin which can be washed off and identified by chromatography, and the cycle can be repeated. The efficiency of each step is about 98%, which allows about 50 amino acids to be reliably determined.

Limitations of the Edman degradation

Because the Edman degradation proceeds from the N-terminus of the protein, it will not work if the N-terminal amino acid has been chemically modified or if it is concealed within the body of the protein. It also requires the use of either guesswork or a separate procedure to determine the positions of disulfide bridges.

Mass spectrometry

The other major direct method by which the sequence of a protein can be determined is Nobel Prize in chemistry in 2002. The protein is digested by an endoprotease, and the resulting solution is passed through a high pressure liquid chromatography column. At the end of this column, the solution is sprayed out of a narrow nozzle charged to a high positive potential into the mass spectrometer. The charge on the droplets causes them to fragment until only single ions remain. The peptides are then fragmented and the mass-charge ratios of the fragments measured. (It is possible to detect which peaks correspond to multiply charged fragments, because these will have auxiliary peaks corresponding to other isotopes - the distance between these other peaks is inversely proportional to the charge on the fragment). The mass spectrum is analysed by computer and often compared against a database of previously sequenced proteins in order to determine the sequences of the fragments. This process is then repeated with a different digestion enzyme, and the overlaps in the sequences used to construct a sequence for the protein.

Predicting protein sequence from DNA/RNA sequences

The amino acid sequence of a protein can also be determined indirectly from the mRNA or, in organisms that do not have translated.

References

  • Amino acid composition and protein sequencing.[1]
  • Henry Jakubowski. Biochemistry Online, chapter 2 B.[2]
  • Hanno Steen & Matthias Mann. The abc's (and xyz's) of peptide sequencing. Nature Reviews Molecular Cell Biology, 5:699-711, 2004.
  • Sergio Marchesini Michael W. King. Analysis of protein.[3]
  • R A Rastall. Investigating protein structure and function.[4]
  • Alberts Bray Johnson Lewis Raff Roberts & Walter. 1998. Essential Cell Biology: An Introduction to the Molecular Biology of the Cell. Garland Publishing, New York.
 
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Protein_sequencing". A list of authors is available in Wikipedia.