Text:

URL:

Basilis Gidas received his B.Sc. from the National Technical University of Athens Greece. He has an M.A. degree in Mathematics, M.S. degree in Physics, and Ph.D. degree in Mathematical Physics, all from the University of Michigan. He is an elected Fellow of the Institute of Mathematical Statistics. Before he joined the Faculty at Brown, he held appointments at Rockefeller University and the Institute for Advanced Studies (Princeton). In the past, he has made contributions in Mathematical Physics (quantum field theory) and in partial differential equations/differential geometry. He has worked in Computer Vision, Speech Recognition, Nonparametric Statistics, and, since the early 2000's, in Computational Molecular Biology. He has served on the National Research Council Advisory Panel for "Spatial Statistics and Image Processing", and is on the editorial board of the International Journal of Imaging Science and Technology.

The research interests of Professor Basilis Gidas the past eight years have been in transcriptional regulatory networks, signal transduction pathways,and ab initio protein folding, using Bayesian statistics and Chomsky type grammars. The work emphasizes: Myc regulatory networks and pathways in cell-growth, cell proliferation, and apoptosis, using Microarray and ChIp-chip data, and cross-species comparison; finding phosphorylation site motifs via tandem mass spectrometry, and structural information of kinases and substrates; ab initio protein folding using compositional/syntactic representations of proteins.

Probability theory on spaces of generalized functions. Gibbs distributions on spaces of tempered distributions. Construction of 2-D and 3-D quantum field theories. Renormalization of quantum field theory Hamiltonians. Spectral properties of Quantum Hamiltonians. Borel summability of ground states asymptotic expansions.

Singular Solutions of the Yang-Mills equations. Free boundary problems and Quark confinement. Symmetry properties, uniqueness, and a priori bounds of solutions of elliptic partial differential equations. Classification of singularities of conformal deformations of Riemannian metrics and other nonlinear elliptic equations.

Metropolis-type Monte Carlo simulation algorithms and simulated annealing. Simulation and optimization via the Langevin equation. Markov Random Field (MRF) estimation and consistency of pseudo-likelihood estimators, and of maximum likelihood estimators from complete or incomplete data. A variational method for estimating MRFs. Nonparametric estimation for continuous-time stochastic processes arising in speech recognition. Object identification via classification trees and stochastic grammars. Renormalization group methods for multiscale/multilevel image processing. Texture representation via MRFs with polynomial interactions. Tracking of moving objects via particle filters. Speech signal representation via nonlinear transformations and wavelets. Classification and clustering of stop consonants via nonlinear transformation and nonlinear discriminant analysis.

Probabilistic hierarchical/ syntactic models (analogous to Chomsky grammars) for identifying, representing, and analyzing transcription regulatory networks and signal transduction pathways. Identification of genes regulated directly and indirectly by combining microarray expression data, ChIp-chip data, and cross-species comparison information; identification of downstream pathways through which Myc functions in cell growth, cell-cycle proliferation, and apoptosis. Identifying phosphorylation sites motifs on the basis of tandem mass spectrometry data, protein-protein interactions, and structural information about kinases and substrates. Protein representation and ab initio folding via hierarchical/syntactic (also known as compositional) models.

Cellular processes such as cell-cycle, cell proliferation, apoptosis, cell-growth, cell differentiation, genome instability, cellular communication, and responses to external stimuli, are governed by interactions among DNA, proteins, RNAs, and a host of other molecules. Understanding the principles and the regulatory mechanisms underlying these processes is a central goal in biology. Our research addresses two aspects of the problem that have been studied extensively and seem to be within reach: (i) Transcription regulatory networks and downstream pathways through which transcription factors (TFs) function in specific cellular processes, and (ii) Signal Transduction pathways that transmit, process, and integrate external and internal signals. Our research address also structural proteomics especially the ab initio protein folding problem. Advances in these problems the past few decades have been made possible by the genome sequencing of several species, and the rapid development of experimental technologies (such as microarrays, tile-arrays, ChIp-chip, real-time PCR, yeast two-hybrid assay, tandem mass spectrometry, NMR, and crystallography) as well as the development of recent tools such as RNAi screening and fluorescent proteins.

A complete understanding of the regulatory networks and signaling pathways entails mathematical/probabilistic models that articulate complex biochemical phenomena, and integrate multiple biological knowledge and experimental data from more than one technology. The models need to represent phenomena at multiple levels. At the local level, the models must articulate the spatio-temporal cooperation and coherence of complex interactions of DNA, proteins, RNAs, and signal transducers, as well as the spatial-temporal distributions and abundance profiles of the molecules; these dependencies underly the regulatory controls that determine, for example, gene expression profiles and cellular decisions such as apoptosis and transitions from one cell-cycle phase to the next. At the global level, the models must articulate global regularities or patterns that represent the "syntax" or overall architecture of a network, pathway, or 3-D structure of a protein. The precise nature of the global and local aspects of a model is problem dependent. For example, a gene-finding model at the global level must represent the "syntax" of the concept "gene" as a collection of "motifs" or genomic sequences (e.g. TATA box, 5'UTR region, initial exon, alternating exon/intron, 3'UTR, Poly-A tail, intergentic regions, etc) concatenated according to precise but "random" rules that allow, for example, absence of TATA box, a single exon, or arbitrary number of exons; at the local level, the model must articulate the local variability of each motif or signal. Similar two level descriptions are necessary for models predicting the secondary structure of rRNAs or the 3-D structure of a protein. In transcription regulatory networks and pathways, the global representation includes the concatenation of a hierarchy of entities, e.g. small motifs or patterns that concatenate to form a module, which concatenate to form larger moduli, which in turn concatenate to form networks.

Bayesian Statistics and probability is a natural framework for designing both the local and global aspects of the models, and for accommodating multiple sources of data. The framework supports powerful computational algorithms such as dynamic programming and Monte Carlo type simulation and optimization algorithms. In many ways, the study of the problems for transcription regulation, signal pathways, and structure of proteins and RNAs, has a great deal of similarity to the study of computer vision, speech recognition, and other cognition problems. Our research aims at exploring existing and developing novel hierarchical/syntactic models similar to Chomsky grammars (that include HMM and context-free-grammars) for articulating the global properties of specific tasks in genomics, proteomics, and structural proteomics. Our current focus is on the following three projects:

Finding the genes targeted by Myc, correlating and quantifying the effect of Myc binding on gene expression level, identifying the

crucial targets of Myc and assigning target genes involved in cell-cycle and apoptosis, are problems of fundamental interest. In our work we study these problems by exploring hierarchical models and employing Bayesian statistics computational algorithms that integrate three types of information or data:(i) Cross-species DNA sequence comparison (especially Human and mouse) to identify genome segments that have been conserved by evolution. Such regions typically have a functional role, and MYC binding sites tend to conserved by evolution; (ii) Chromatin Immunoprecipitation array (ChIp-chip) data; this high-throughput technology localizes MYC (or any specific Transcription Factor) binding sites within 1000-2000 DNA base pairs; we combine this information with known MYC motifs (E-box) and cross-species comparison information to find potential binding sites for MYC via a Monte Carlo type procedure; (iii) Gene expression microarray data; these data are employed to cluster genes into Myc target genes and genes that are not affected by MYC, as well as to group genes according to their expression profiles over time.

Mast cells have a physiological role (they contribute positively to the immune system), and a pathological role (they play central role in allergies, including asthma). Our project focuses primarily on their pathological role. Upon activation by an allergen, mast cells signaling pathways have three main branches: (a) one towards degranulation (and the associated production of toxic molecules such as histamine), (b) another one towards gene transcription of cytokines and chmokines, and (c) and yet another branch towards production of eicosanoids (lipid type mediators). Tandem mass spectrometry (MS/MS) is the most promising high throughput technology for collecting data for mast signaling, and signaling pathways in general. MS/MS produces time series data for phosphorylated proteins. The project addresses three fundamental mathematical/computational problems: (i) identification of the proteins that participate in the pathways, (ii) clustering of the proteins on the basis of their phosphorylation profiles, and (iii) determining the topology of pathway network and analyzing its dynamical behavior. A key tool for solving problem (i) is a Bayesian statistical model for peptide fragmentation and generation of "theoretical" MS/MS spectra. Clustering of proteins (problem (ii)) is based on generalization of the well-known K-means clustering algorithm; the generalization involves a mixture of Gaussian probability densities whose parameters are learned via the EM algorithm. Problem (iii) is the most challenging, and is far from being fully understood primarily because the current data do not contain sufficient information to determine the topology of the pathway network. For this reason, we focus mainly in a sub-network near the receptor that contains a universal motif or module. To study this sub-network we device a stochastic dynamical system.

Many grants have been awarded.

Year | Degree | Institution |
---|---|---|

1970 | PhD | University of Michigan Ann Arbor |

1967 | MS | University of Michigan Ann Arbor |

1966 | MA | University of Michigan Ann Arbor |

1965 | BS | National Technical University of Athens |

Elected Fellow of the Institute of Mathematical Statistics

American Mathematical Society

American Statistical Society

Institute of Mathematical Statistics

American Statistical Society

Institute of Mathematical Statistics

APMA 1660 - Statistical Inference II |

APMA 2670 - Mathematical Statistics I |

APMA 2680 - Mathematical Statistics II |