PRECISE

Program for Research in Computing and Information Sciences and Engineering

ADVANCE DATA MANAGEMENT GROUP

Manuel Rodriguez -Martinez, manuelr@acm.org, Coordinator
Bienvenido Vélez, bvelez@acm.org
Pedro I. Rivera Vega, pedro.rivera@ece.uprm.edu
Fernando Vega, fvega@ece.uprm.edu

Description

Our research will focus on the design and implementation of next generation database and information retrieval systems with emphasis on Web information systems, handheld devices, reliability and fault tolerance, multimedia databases and cluster computing.

Vision

Establish a premier Research and Development Center dedicated to the advancement of Information Management Technologies, including: Database Management, Information Discovery, Network Middleware, and Database-Enabled Web Applications.

Mission

• The ADM Group is committed to conduct both theoretical and practical research aimed at the discovery of new theories and the development of new systems with particular emphasis on technologies with the potential for deep impact on the improvement of the quality of life in our society.

• The ADM group is committed to the advancement of Data Management technologies. Specific areas of emphasis include: Database Management, Information Retrieval and Discovery, Fault-tolerant Systems, Integrating Systems, Interoperability, and reliable Information Storage and Dissemination.

• The ADM group is committed to the integration of research and education through the rapid transfer of discoveries to the classroom. The group will be one of the main driving forces behind the development of the curriculum pertaining to Data Management at the University of Puerto Rico.

The ADM group is currently working on the following projects:

• Data Service Composition in Peer-to-Peer Architectures (P2P)
• Verizon Intelligent Home (iHome)
• Dynamic Image Retrieval and Composition Services in Distributed Information Systems
• Adaptively Replicated Information Services (ARIS)
• Fault-tolerant Mass Transit Passenger Information Systems (TU-PIS)
• Inforadar: Document Classification using Query Lookahead
• Open Source Operating Systems (collaboration with IBM)

Current Participation in Competitive Research Grants

O2S2 Sponsored by IBM (B. Velez and J. Arroyo (Co-PI)

Strategic R&D Alliances with other Academic and Industrial Institutions

• IBM – Open Source Operating Systems Project
• Tren Urbano - Passenger Information System
• CenSSIS Engineering Research Center - Dynamic Image Retrieval and Composition Services in Distributed Information Systems

Research Summaries

Interactive Queries Hierarchies for Effective Information Discovery at UPR-Mayagüez - Dr. Bienvenido Vélez

Advances in processor technologies suggest that future search engines will be capable of spending orders of magnitude more processing capacity per user request without inducing noticeably larger response times. A new information discovery technique called query lookahead invests additional computation on the eager evaluation of multiple queries automatically generated from an initial user query.

Query lookahead has the potential of improving search systems in at least two novel ways. First, it enables the deployment of anticipatory user interfaces capable of presenting the result sets of automatically generated refined queries ahead of time. Refined queries serve as categories upon which a large and imprecise result set can be organized. Second, query lookahead has the potential of improving the effectiveness of feature (e.g. term) selection algorithms. These algorithms can be improved by exploiting information about the result set induced by each potential feature when combined with the user query. This research focus on a new network search system, InfoRadar, exploiting query lookahead along these two lines. In response to a user query, InfoRadar displays a hierarchically organized selection of refined queries that we call an interactive query hierarchy. We have developed InfoRadar as a vehicle for testing our hypothesis that interactive query hierarchies can improve information discovery effectiveness. InfoRadar has three main software components: a multi-threaded Java applet, a server module and an indexing module. InfoRadar supports boolean queries using a syntax borrowed from the popular Altavista (www.altavista.com) search engine. In response to a query request from the applet, the InfoRadar server returns a hierarchy of queries together with their individual result set.

Data Composition Services in Peer-to-Peer Architecture – Dr. Manuel Rodríguez-Martínez

Next-generation Distributed Information Systems will consist of hundreds of thousands, perhaps millions, of diverse data sources located on geographically distributed networks like the Internet. In these types of large-scale distributed environments, heterogeneity in terms of hardware devices, software components, network connectivity and system configuration will be a fundamental characteristic of the data sources. In fact, these data sources might reside on high-end servers, desktop computers, mobile laptop computers, hand-held devices, intelligent sensors and appliances, or embedded computer systems.

Data integration and interoperation between these data sources will be a critical requirement to harvest the vast amounts of valuable information stored and maintained by the data sources. Information could be extracted from any available data source, whether it is a satellite image from an Earth Science database, or a phone book list, encoded in XML, that is extracted from a Palm-Pilot. Therefore, a data source site cannot be defined based on the size of stored data sets, or on the software environment being run, but rather, on whether other sites in the system retrieve the information held by the data source. In the other words, a data source is any site that provides a service to access some kind of data. Clearly, the distinction between what constitutes a client site and what constitutes a server site will be blurred, since any site can act as a client or as a service provider to another site in the system. Moreover, the sheer number and diversity of data sources implies that there cannot be a single authority that effectively coordinates and controls the access to data, or to the computational services in the system. These observations motivate us to conduct research and point us in the direction of a peer-to-peer dynamic environment [6,26] in which any site can request or serve data, and must engage in a cooperative effort aimed at satisfying the requests for data and services associated with the queries posed by interested end-users.

We envision a decentralized Peer-to-Peer software framework in which user-defined code and control is released to the local executing sites (client or data sources), which will decide which are the sites that will supply data, computational services, and the aggregation of results. The receiving site may partially execute its code on its local environment, and pass it along with partial results to next peer site, or coalition of peer sites that will continue with the computational process. This framework is based on a model for composition of data services, where one site performs a given task and ships its results, plus some control information, to another site that will continue with the computational process.

Implementation of a Prototype for Knowledge Management in Higher Education – Dr. J. Fernando Vega

The research consists in the implementation of a prototype of a knowledge system. The implementation includes two classes of distributed agents, one denoted user agent which acts as a knowledge broker for the user; the other denoted service agent receives queries to a knowledge repository from the user agents. The knowledge-base may be distributed and conformed by several knowledge repositories. In addition, the user client includes an expert system which assists in the construction of queries making inferences on the ontology or ontologies defined for the knowledge domain or domains present in the knowledge-base using a dialog-like or conversational interaction. It is important to highlight that in this proposal we refer not only to queries for information retrieval but for information storage as well. This is based on the highly interactive model for knowledge-management that has been devised and which assumes that users generate knowledge which becomes explicit through information resources and that are produced and analyzed in the source, i.e., by the user.

The Smart Mirrors Project - Pedro I. Rivera-Vega

We consider the problem of serving data being requested through the Internet, with the goal of completing the service in minimum time (turnaround time). We are working on a distributed peer to peer mirror system, the Smart Mirrors System that continuously collects information from peers, in order to decide the best approach to follow in serving each request. The information being interchanged describes two important factors that need to be taken into consideration by the system: the work load of each server and the network bandwidth. When a request is received the system uses this information, and further collaborates with the client site, in order to finally assign the task to the most appropriate server. The different research issues being considered are: architecture of the system, cost model to estimate service time of each particular server, interchange of information and service requests among peers, and real experimentation. We discuss these research issues under consideration and present preliminary results.

At this stage of the project, we are considering only static data requests, and a system in which each of the servers (peers) have the same data. We are in the process of real experimentation of our approach. We plan to further expand this project to the case in which a more dynamic environment exists in terms of data available in each of the servers, data requested by the users, and the configuration of peers in the system.

Publications

Refereed Conferences (with proceedings)

B. Vélez, J. E. Valiente, “Interactive Query Hierarchy Generation Algorithms for Search Result Visualization,” Proceedings of Internet and Multimedia Systems Applications (IMSA 2001).

B. Vélez, J. A. Torres, “Anticipatory User Interfaces for Search Result Visualization using Query Lookahead,” Proceedings of Americas Conference on Information Systems (AMCIS 2001) Best Paper Award.

Jairo E. Valiente and Bienvenido Vélez, “Inforadar-cl: A Cross-Lingual Information Discovery Tool Exploiting Automatic Document Categorization”. In Proceedings of IASTED International Conference on Information and Knowledge Sharing (IKS 2002). St. Thomas, V.I. November 2002.

J. Torres-Berrocal, B. Vélez-Rivera. “Elastically Replicated Information Services: Sustaining the Availability OF Distributed Storage Across Dynamic Topological Changes”, Sixth Annual Conference of the Southern Association for Information Systems (SAIS), Savannah Georgia March 2003.

Manuel Rodriguez-Martinez, Nick Roussopoulos, “Wide-Area Query Execution in MOCHA”, 2002 IASTED Conference on Information and Knowledge Sharing (IKS 2002).

Enna Z. Coronado, Manuel Rodriguez-Martinez, “SRE: Search and Retrieval Engine of TerraScope Earth Science Information System”, 2003 IASTED Conference on Computer Science and Technology (CST 2003).

Alcides Alvear, Manuel Rodriguez-Martinez, Pedro I. Rivera-Vega, Angel Villalain, Angel Ferra, “Development of a Database Middleware System to Support Remote Sensing Analysis over Distributed Data Sources”, 2003 IASTED Conference on Computer Science and Technology (CST 2003).

About Precise Research Publications People CISE Technical Lecture Series Laboratories Ph.D. in CISE Computer Research Conference Reports Important Links