Program for Research in Computing and Information Sciences and Engineering  
COMPUTATIONAL STATISTICS AND DATA ANALYSIS GROUP
This group is dealing with research work in two areas: • Computational Statistics: where we look for the explicit impact of computers on statistical methodology, such as: algorithms, computer graphics, computer intensive inferential methods, expert systems, neural networks, parallel computing and statistical databases. • Statistical Methodology for data analysis, where we explore for new data analysis strategies and methodologies such as: classification, data exploration, density estimation, design of experiments, pattern recognition/image analysis and robust procedures, comparison of statistical methodology and simulation of experiments.
“Combining Classifiers Involving Kernel Density Estimates and Gaussian Mixtures”, Sponsored by Office of Naval Research (PI Edgar Acuña) 20032005.
Medical University
of South Carolina: Barbara Tilley, Zhen Zhang Research Summaries Improvement of Supervised Pattern Recognition Techniques  Dr. Edgar Acuña The research deals with the use of computer intensive methods in statistics to improve pattern recognition techniques. Computer intensive methods involve three aspects: First, the use of powerful computers including parallel computers. Second, the development of efficient algorithms to carry out the procedures, and Third, efficient programming to perform the algorithms with accuracy and minimizing the running time. Pattern recognition has plenty of applications, but we are more interested in engineering and biomedical (Bioinformatics) applications. The expected research outcomes are: • Feature selection for nonparametric classifiets. This part has been the master thesis of my student Frida Coaquira, who will continue working in this topic in her doctoral thesis. • Improvement of classifiers based on Gaussian Mixtures through the use of bagging and boosting. Combining classifiers has a lot of applications even in unsupervised pattern recognition and involves parallel computation. This part has been considered in the master thesis of my student Luis Daza, who will continue working in this topic in his doctoral thesis. • Reduction of dimensionality using Partial Least Squares. We are combining partial least squares with logistic regression to reduce dimensionality. This technique will be an alternative to the overused principal component method. This topic will be the doctoral thesis of my student Jose Vega, who began his research in January 2003. • Application of statistical pattern recognition for microarray data. We are exploring the application of nonparametric classifiers and new clustering algorithms to gene expression data obtained using microarrays. These themes will be considered in the master thesis of my students Marggie Gonzalez and Santiago Velasco. They started already their research work and they expect to be done by the summer 2004. • Application of Parallel computation to pattern recognition. The main disadvantage of the nonparametric techniques that we use is that they take a lot of computing time. However most of the algorithms needed can be parallelized. We have seen already superb results working with a parallel environment of 8 processors. My student Elio Lozano is working in this topic in his master thesis. He will defend in July 2003. Elio will extend this research in his doctoral thesis. • Visualization techniques of microarray data. The preprocessing of microarray data and the analysis of the images obtained from them is the research topic of my master student Caroline Rodriguez. Bioinformatics—Dr. Daniel McGee The research is conducted in coordination with the BioInformatics Department of the Medical University of South Carolina and concentrates on the application and development of Bioinformatics techniques when applied to medical databases. In particular, neural networks, cluster analysis, genetic algorithms, principal component analysis, and normal statistical methodologies will be used and improved upon in these environments. Goals: • Made significant
improvements to the training process for neural networks when used on
medical and educational databases Bioinformatics, Dr. Jaime RamirezVick Development of an expert system for microarray statistical data analysis. An essential source
of genetic information for drug and functional genomics research Probabilistic Inference of Gene Function and Regulation More recently in the
study of gene function and regulation, approaches are based on Publications Journals E. Acuña, and A. Rojas, “Bagging classifiers based on kernel density estimators”. Proceedings of the International Conference on New Trends in Computational Statistics with Biomedical Applications, August 2001, pp 343350 (an extended version of this paper will appear on the Journal of The Japanese Society of Computational Statistics by the end of this year). Acuña, E., (2002) Combining Classifiers based on Kernel density classifiers and Gaussian mixtures. Computing Science and Statistics. Vol 33. Acuña, E., Rojas, A., and Coaquira, F. (2002). The Effect of Feature Selection on Combining Classifiers Based on Kernel Density Estimates. In K. Jajuga, A. Sokodowski, H.H Bock (Eds). Classification, Clustering and Data Analysis. Springer, Heidelberg, 161168. McGee, D., Lackland, D. et al, (2003). Trends in Blood Pressure Treatment: Some observations based on the Framingham study, Cardiovascular Reports and Reviews. (In Press). Acuña,. E. (2003) Combining classifiers based on kernel density estimators. Submitted to the Journal of Statistical Computation and Simulation Acuña,. E. (2003) Filters and wrappers for supervised classification. Submitted to Communications in Statistics: Simulation and Computation.
Acuña, E., (2002) Combining Classifiers based on Kernel density classifiers and Gaussian mixtures. Proceedings of the Interface 2002 Computing Science and Statistics. Vol 33. Lozano, E. and Acuña, E. (2003) Parrallel computation ok kernel density estiumates classifiers and their ensembles. To appear in Proceedings of the Conference in Computers, Communications and Control 2003. July 2003. Acuña, E., (2003) A comparison of filters and wrappers for feature selection supervised classification. Proceedings of the Interface 2003 Computing Science and Statistics. Vol 34. McGee, D, Lackland, D. et al, (2003) Trends in Blood Pressure Treatment: Some observations based on the Framingham study, Cardiovascular Reports and Reviews. (In Press). Daza, L. and Acuña, E.. (2003) Combining classifiers based on Gaussian Mixtures. To appear in Proceedings of the Conference in Computers, Communications and Control 2003. July 2003. Acuña E. and Coaquira, F. On the performance of ensembles based on kernel density estimation. To appear in Proceedings of the Conference in Computers, Communications and Control 2003. July 2003. Acuña
E., Coaquira, F., and Gonzalez, M. (2003) A comparison of feature
selection procedures for classifiers based on kernel density estimation.
To appear in Proceedings of the Conference in Computers, Communications
and Control 2003. July 2003. McGee, D., and Maldonado. (2003). Using coefficients of backpropagating neural networks to identify change points. To appear in Proceedings of the Conference in Computers, Communications and Control 2003. July 2003. Book Chapters/Articles in Collections Acuña, E, “Análisis Estadístico de Datos usando MINITAB para Windows”, Segunda Edición. John Wiley and Sons, New York (2202).

