- Born 1937
- B.S. (1960) Seoul National University, South Korea
- M.S. (1962) Seoul National University, South Korea
- Ph.D. in Physical Chemistry (X-ray Crystallography), University of Pittsburgh (1966)
- Research Associate Massachusetts Institute of Technology (M.I.T.) (1966-70)
- Senior Research Scientist M.I.T. (1970-72)
- Assistant and Associate Professor, Biochemistry, Duke University (1972-78)
- Professor and Professor of the Graduate School, Department of Chemistry, UC Berkeley (1978- )
- Faculty Scientist, Physical Biosciences Division; Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley (1979- )
- Fulbright Travel Fellow (1962)
- N.I.H. Research Career Development Award (1976-79)
- Miller Research Professorship (UC Berkeley, 1983)
- Guggenheim Fellow (1985)
- E.O. Lawrence Award (US Department of Energy, 1987)
- Princess Takamatsu Award (Tokyo, Japan, 1989)
- The Ho-Am Prize ( Samsung Foundation, South Korea, 1994)
- Fellow, American Academy of Arts & Sciences (1994)
- Member, U.S. National Academy of Sciences (1994)
- Korean Academy of Science and Technology Prize in Science (South Korea, 2000)
- Legacy Laureate Award, University of Pittsburgh (2005)
- The Pride of Seoul National Univ. Alumni Award (Seoul, South Korea, 2006)
- Department of Chemistry Alumni Award, Univ. of Pittsburgh (2008)
- Alexander Rich Medal, M.I.T. (2014)
- Fellow, The American Association for the Advancement of Science (2018)
- Member of American Society of Biological Chemists, American Crystallography Association, American Chemical Society, and Biophysical Society
A. Whole-proteome “Tree of Life (ToL)”
An “Organism Tree of Life (ToL)” can be considered as a metaphorical and conceptual tree to capture a simplified narrative of the complex and unpredictable evolutionary courses of all living organisms. Currently, the most common approach has been to construct a “gene ToL”, as a surrogate for the organism ToL, by selecting a group of highly alignable regions of each of the select genes/proteins representing each organism. Such selected regions, however, account for a small fraction of all genes/proteins and an even smaller fraction of the whole genome of an organism. During the last decades, whole-genome sequences of many extant organisms have become available, providing an opportunity to construct a “whole-genome or whole-proteome ToL” using Information Theory without sequence alignment. Our group developed the “Feature Frequency Profile (FFP)” method, which is a variation of “Word Frequency Profile” method, commonly used to compare two books using Natural Language Analysis algorithms based on Information Theory. Using the FFP method we have been able to construct a “Whole-proteome ToL” for over 4,000 extant organisms for which whole genome sequences are available in the public genome database. The most surprising and unexpected feature of our ToL was that the founders of all 5 Kingdoms of all living organisms (Bacteria, Archaea, Fungi, Plants, and Animals) emerged in a “deep Burst” near the root of the ToL, a feature not observed in all earlier ToLs. Encouraged by this observation, we have started to construct whole genome/proteome ToL in separate Phylum, Class, and Order levels.
B. Whole genome variation of Human species.
Most regions of genomes of normal human cells have been found to have the same sequences among individuals, but a small fraction, spread throughout the genome, have variations within a population. Of these, the single nucleotide variations (SNVs) account for the largest number of variations and have been identified in over 80 million genomic positions out of 3 billion positions (loci) in a whole haploid genome. It has been widely accepted that the analysis of SNVs may be able to allow one to predict the genomic component of the disease susceptibility of individuals to complex diseases such as cancers, neurological diseases, autoimmune diseases and other traits. So far, the results from the current analysis methods (e.g. Genome-wide Association Studies method) and interpretation of the results have yielded information of limited predictive value of practical utility for making health-related decisions at the individual or population level without information of family histories.
Prevention and early diagnosis of cancer are the most effective ways of avoiding psychological, physical, and financial suffering. We developed a machine-learning method for statistically predicting individuals’ inherited susceptibility (and environmental/lifestyle factors, by inference) for acquiring the most likely type among a panel of 20 major common cancer types plus one “healthy” type. The results show that, depending on the type, about 33 to 88% of a cancer cohort has acquired its cancer type primarily due to inherited genomic susceptibility factors, and the rest primarily due to environmental/lifestyle factors. These cohort genomic susceptibilities with associated probabilities may provide practical information for health professionals and health policy makers related to prevention and/or early intervention of cancer. We are in the process of extending this approach to predict individual susceptibility.
C. Genomic studies of Ethnic populations.
An ethnic population has different meanings to different people, but, generally, is a group of people who have a “perceived notion” that its members share a set of unique inherited (genomic) and acquired (non-genomic) traits, such as ancestry, social and cultural norms, religion/belief, language and life style. Thus, ethnic group identity has a strong emotional component that divides the people into opposing categories of “us” and “them”, one of the primary causes for human conflict and suffering.
Recent availability of the genomic sequences of a large number of ethnic populations throughout the world (over 160 ethnic groups) provides an opportunity to estimate quantitatively the fraction of the whole genome that may account for the inherited genomic component for ethnicity and to find any relationship between ethnic grouping and genomic grouping. We have developed a method to compare whole-genome variations (Single Nucleotide Variations (SNVs)) between individuals using a text analysis method based on Information Theory to address these questions.
More Information
The following files are available for download in PDF format: