Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Rivero L.Encyclopedia of database technologies and applications.2006

.pdf
Скачиваний:
11
Добавлен:
23.08.2013
Размер:
23.5 Mб
Скачать

Component-Based Generalized Database Index Model

Figure 2. Overall view of the system

page, however, still provides an iterator for other objects to allow for expansion.

At the code level, pages are passed to the index as “template” type during the first time the database is populated with real data or pointers to data locations. This makes the index independent of the page design and allows the index to work with any page type that is passed to it. The index iterator is therefore iterating through pages of any type, one page at a time. This is exactly the concept of database indexes: paged indexes to facilitate access and improve performance.

Unlike the in-memory page allocator, the index uses an index allocator to manage the storage of its own elements, the index pages. In practical application the index exists permanently on nonvolatile storage, like a hard disk or other mass storage media, since it is normally too large to fit entirely in memory. This means that the container will use a specialized allocator that takes the responsibility of retrieving the page from the storage into memory for access and controlling the different objects accessing the same page concurrently. This will ensure data integrity by applying a suitable access and locking policy (pinning the page). After the pages have been modified, they also need to be updated in the physical storage (flushing the page) by the allocator. A default, in-memory allocator is provided for simple applications where the whole index can fit in memory at one time. It is up to the system designer to use it or override it by providing a storage-dependent specialized allocator. Figure 2 provides the overall system model.

The separation between the container and the allocator allows for the overall system to be completely independent of the physical storage of the data. The container uses the standard allocator interface to ask for storage services without any knowledge of the real storage dynamics. A stack is added to the system to support general domain applications where each internal page can point to several nonlinearly ordered pages. In this case, all matching entries need to be temporarily pushed into a stack. Again, the stack is built as another container with its iterator and allocator. A stack is a specialized type of container that is obtained in the STL model by providing a stack adaptor that limits the access of a more flexible container to the access operations allowed by a stack, namely, push () and pop (). This is a good example of the use of adaptors provided by STL.

Finally, the query results are pushed into a cursor, which is yet another container assembled from STL components. In the end, we were able to provide a fully functional system with the careful design of models using STL components.

THE MODEL LIFE CYCLE

As in the case of building generalized database systems, the system model will evolve through four major phases during its life cycle:

1.The Generic Design phase, where the generic model framework is designed once (in this case, our system offers the generic design layout).

90

TEAM LinG

Component-Based Generalized Database Index Model

2.The Adaptation phase, wheresystem architects identify the requirements of a specific domain application and determine the corresponding system components needed in the model. System developers then instantiate this particular system, in which all generic domain attributes have been made specific, like the data and key types, the type of database tree structure, the access methods, the need for stacks, etc. We have adapted the generic model to implement several concrete designs supporting different data, trees, and access methods.

3.The Data Loading phase, where the initial loading of data into the system will determine the structure of the database and, hence, the access methods supported by the index. Data loading is generally carried out by the database administrator and/or system developer. It is done in one of two ways. For a small amount of data and for testing and debugging purposes of the system, the normal interface of the index, namely, the insert method, is used to insert the necessary data into the database. This is referred to as transactional data loading. Despite the fact that it needs to be done only once while populating the database, it can still be expensive when fully loading a largesize database. For this latter case, data can be loaded from an ASCII file into the database by writing a small program (often called a loader) that reads the whole ASCII file, separates the data records, and loads them directly to database pages. This approach is referred to as bulk loading and is capable of efficiently loading a large amount of data into the database index.

4.The Use phase, where the database is accessed by other applications through its standard interface.

Application programmers are interested only in this phase. The interface supports common database transactions like insertion, deletion, and updating.

Tree-based database indexes can contain the actual data in their nodes and leaves (B-trees) or only in their leaves (B+-trees and the like). Indexes can also be totally separated from physical data (like the case of multimedia applications, where each record can be a large image, a sound file, or even a whole video session). In this case, actual data is typically handled through pointers, which are much smaller in size than the actual records. The system architect determines if the index will load the actual data or will only load pointers to the actual data (along with the appropriate keys). In our modeling concepts, both cases are supported, as they are seen as different data types loaded to the index.

FUTURE TRENDS

C

Database systems are fundamental to any industrial quality database. So far, most commercial databases have used their proprietary design and implementation. Generalized index frameworks help in producing industrial quality databases; however, they suffer from the complexity of their code and the lack of modularity in their design. As component-based development is proving to be powerful and reliable, the emergence of software applications and frameworks that rely on components is gaining momentum. We already have CORBA, JavaBeans, and Web services that are following this trend. So far it has been mostly adopted in small-scale database implementations, but the trend to implement them on full-scale industrial database systems is evolving. Nystorm (2003) provides a good example of a component-based real-time application that demonstrates these concepts in database domain.

CONCLUSION

We have applied the component-based system concepts to introduce the paradigm of modular design to the database system in order to improve the quality of design. We used the STL components to build a generalized model. The STL concept allowed us to put a model for the system from the early design phases. We put components together to build subsystems and to further connect the subsystems together iteratively until we had a full index model. Implementation is manageable by collecting the components together according to the architectural design model and instantiating them to the final code. The system built after this architectural model is easier to implement than a non-component index of same capability; however, it still requires a significant level of programming. The system can be adapted to different contexts and requirements by rewriting only some of the components. All the basic components used in the system (containers, iterators, and algorithm components) are already provided as a complete COTS implementation, which greatly reduced the effort needed for the development of the index system.

REFERENCES

Aoki, P. M., (1998). Generalizing search in generalized search trees. In Proceedings of the 14th IEEE International Conference on Data Engineering, Orlando, FL (pp. 380-389). IEEE Computer Science Publishers.

91

TEAM LinG

Component-Based Generalized Database Index Model

Austern, M. H. (1999). Generic programming and the STL. Boston: Addison-Wesley.

Breymann, U. (2000). Designing components with the C++ STL: A new approach to programming. Boston: Addison-Wesley.

Butler, G., Chen, L., Chen, X., Gaffar, A., Li, J., & Xu, L. (2002). The know-it-all project: A case study in framework development and evolution. In K. Itoh & S. Kumagai (Eds.), Domain oriented system development: Perspectives and practices (pp. 101-117). London: Taylor & Francis Publishers.

Folk, M. J., & Zeollick, B. (1992). File structures (2nd ed.). Boston: Addison-Wesley.

Hellerstein, J. M., Naughton, J. F., & Pfeffer, A. (1995). Generalized search trees for database systems. In Proceedings of the 21st International Conference on Very Large Databases (pp. 562-573).

Hellerstein, J. M., Papadimitriou, C. H., & Koutsoupias, E. (1997). Towards an analysis of indexing schemes. In

Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 562573).

Lynch, C. A., & Stonebraker, M. (1988). Extended userdefined indexing with application to textual databases. In Proceedings of the Fourth International Conference on Very Large Databases (pp. 306-317). San Francisco: Morgan Kaufmann.

Musser, R. D., Derge, J. G., & Saini, A. (2001). STL tutorial and reference guide: C++ programming with the Standard Template Library. Boston: Addison-Wesley.

Nystorm, D. (2003). COMET: A component-based realtime database for vehicle control-system. Vasteras, Sweden: Malardalen University, Computer Science and Engineering Department. Malardalen University, Vasteras, Sweden.

Quartel, D. A. C., Sinderen, M. J. van, & Ferreira Pires, L. (1999). A model-based approach to service creation. In

Proceedings of the seventh IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems (pp. 102-110). Cape Town, South Africa: IEEE Computer Society.

Sanlaville, R., Faver, J. M., & Ledra, Y. (2001). Helping various stakeholders to understand a very large software product. In European Conference on Component-Based Software Engineering, ECBSE, Warsaw, Poland.

Schmidt, R. (1997). Component-based systems, composite applications and workflow-management. In Proceedings of Workshop Foundations of Component-Based Systems, Zurich, Switzerland (pp. 206-214).

Stonebraker, M. R., (1986). Inclusion of new types in relational database systems. Proceedings of the second IEEE International Conference on Data Engineering,

Washington, DC (pp. 590-597).

KEY TERMS

Access Methods: In the database domain, indexes are designed to access data that are stored in a specific structure. The type of data and the type of the structure used determine the procedures followed by the index to access these data, which is referred to as the access method.

Aging Effect: Software artifacts evolve over time due to the changes in domain requirements, platform, and even language. After their release, software needs to be modified regularly. These modifications introduce errors to the software and reduce its overall quality over time, which is often called the aging effect.

ANSI C++: Complete set of standards provided by the American National Standards Institute in collaboration with ISO standardization committee to define an industrial standard for the C++ programming language.

COTS: Commercial off-the-shelf software components that are generic enough to be obtained and used in different applications. They are often well designed and well implemented to offer good performance.

Logical Connectivity Rules: In STL, certain logical rules are defined to allow for connecting components to produce semantically correct artifacts. Incorrect connections are not allowed.

System Reengineering: When an existing software system requires major changes, typically by altering its original design, specific methodologies are used to help modify the design successfully. This process is referred to as system reengineering.

Template: At the programming level, a template allows programmers to fully write and debug their code without specifying certain types of variables. Later the template can be instantiated by providing it with the necessary variable types as needed. The same template can be instantiated several times with different sets of variable types.

92

TEAM LinG

 

93

 

Consistency in Spatial Databases

 

 

 

C

 

 

 

 

 

M.AndreaRodríguez-Tastets

University of Concepción and University of Chile, Chile

INTRODUCTION

During the past several years, traditional databases have been enhanced to include spatially referenced data. Spatial database management (SDBM) systems aim at providing models for the efficient manipulation of data related to space. Such type of manipulation is useful for any type of applications based on large spatial data sets, such as computer-aided design (CAD), very large scale integration (VLSI), robotics, navigation systems, and image processing.

Spatial data applications, (in particular, geospatial applications) differ from traditional data applications for the following reasons (Voisard & David, 2002):

spatial information deals with spatial and nonspatial data, which implies that the definition of spatial data types be closed under the operations applicable to them;

many spatial data are inherently uncertain or vague, which may lead to conflicting data (e.g., the exact boundary of a lake or pollution area is often unclear);

topological and other spatial relations are very important and are usually implicitly represented;

data are highly structured by the notion of object aggregation;

user-defined operations require an extensible underlying model; and

functions exist at both a low level of abstraction (e.g., points, lines, polylines) and a high level of abstraction (e.g., maps, thematic layers, configurations).

Spatial databases often deal with different kinds of data imperfections, which can be classified into uncertainty, imprecision/vagueness, incompleteness, and inconsistency (Pason, 1996). Whereas uncertainty, imprecision, vagueness, and incompleteness are usually seen as different types of data inaccuracy that arise from problems in data collection, inconsistency is the kind of data imperfection that results from the existence of contradictory data.

Contradictory data in spatial databases arise from different forms of errors (Cockcroft, 1997). A primary source of errors that can give rise to contradictions is

the inconsistency generated by conflicting descriptions of locations or the characteristics and qualities of spatial features. This kind of inconsistency is commonly associated with problems of positional or data inaccuracy. A secondary source of errors that can results in contradictions is the mismatch between stored data and structural or semantic consistency rules underlying the model of reality (e.g., a region that is represented by a polyline that is not closed). Database designers usually attempt to avoid this second kind of error by enforcing integrity constraints.

Research progress on consistency in spatial databases has been the result of an interdisciplinary effort. This effort has dealt with ontological issues (Frank, 2001) concerning the definition of semantic and topological consistency. It has also considered the appropriate conceptual frameworks for analyzing spatial consistency, the specification language of integrity constraints, and the design of computational–geometry algorithms to implement consistency checkers.

The following section describes models for defining consistency of spatial data that focus on topological consistency in the presence of multiple representation levels or in the integration of heterogeneous databases. Subsequently, the specification of integrity constraints in spatial databases is discussed to complete the background for presenting challenges and future trends in the treatment of consistency in spatial databases.

BACKGROUND

The conceptual bases for modeling consistency are topological and other geometric characteristics of spatial objects. In particular, spatial primitives and spatial relations are fundamental in the definition of consistency rules that enforce a particular model of space. Spatial primitives depend on the classical distinction between field-based and entity-based models (Shekhar, Coyle, Goyal, Liu, & Sakar, 1997). Each of these approaches to modeling space implies spatial primitives with their own definitions and rules. For example, the field-based approach to space includes the definition of tessellations, isolines, and triangular irregular network, with their corresponding definition rules. Likewise, the

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

entity-based approach includes definitions rules for lines and polygons.

In addition to spatial primitives, spatial relations play an important role in spatial information systems, because such relations describe spatial information and are used as the spatial criteria for querying spatial databases. Even though positional information is often a basis for specifying the spatial component of objects, spatial relations such as adjacency and containment do not require absolute positional data. Common spatial relations are typically grouped into three kinds: topological, orientation, and distance. Topological relations deal mainly with the concept of connectivity and are invariant under topological transformations, such as rotation, translation, and scaling. Orientation relations presuppose the existence of a vector space and are subject to change under rotation, while they are invariant under translation and scaling. Distance relations express spatial properties that reflect the concept of a metric and, therefore, change under scaling, but are invariant under translation and rotation.

Models for each kind of spatial relation exist, which have led to a productive research area referred as qualitative reasoning (Stock, 1997). Such models give definition rules for each type of spatial relations as well as consistency in the combination of spatial relations. In particular, there is a comprehensive method for analyzing the topological consistency of spatial configurations based on the logical consistency expressed by the composition of relations (Egenhofer & Sharma, 1993). For example, given that a spatial object A is inside of a second object B, and B is disjoint to a third object C, to keep the topological consistency, A must also be disjoint to C.

Two important cases that have been addressed in the modeling of consistency that differ from basic definition rules are consistency at multiple representations and consistency for data integration. The problem of multiple representations consists of data changing their geometric and topological structures due to changes in scale. Conceptually, multiple representations may be considered as different data sets that cover the same area with different levels of detail. Within the context of assessing consistency at multiple representation levels, topological relations are considered as first-class information, which must prevail in case of conflict (Egenhofer & Clementini, 1994. Kuipers, Paredaens, & den Busshe, 1997). Multiple representation levels in spatial databases may not imply inconsistent information, but rather, merely different levels of detail or scale. In such cases, topological consistency at the level of objects and objects’ interrelations must be enforced.

The modeling of topological consistency at multiple representation levels has been based on the definition of

Consistency in Spatial Databases

topological invariants across multiple representations (Tryfona & Egenhofer, 1997). Examples of topological invariants are the set intersections of boundaries and interiors as well as the sequence, dimension, type, and boundedness of boundary-boundary intersections. The comparison of topological invariants is used to define the topological consistency of objects and topological relations at multiple representation levels. In addition, topological invariants and consistency-checking of topological configurations have been basis for defining consistency of composite objects at multiple representations (Egenhofer , Clementini, & Di Felice, 1994).

Spatial data sets to be integrated are assumed to contain the same features or objects that can be extracted from several sources at different times. These data may vary in reliability, accuracy and scale of representation. Thus, integrating spatial information may create conflicts due to the different representations for the same features concerning, for example, shape, dimension, and positional accuracy. In this context, different types of consistency can be distinguished (Abdelmoty & Jones, 1997): total consistency, which occurs when two data sets are identical; partial consistency, which occurs when certain subsets of two data sets are identical; conditional consistency, which occurs when by applying a set of functions over a data set it becomes totally consistent with respect to another data set; and inconsistency level, which occurs when there is nothing in common between data sets.

The common approach to integrating different representations has assumed that, when no further information exists about the origin of data, both representations are adopted. The idea is to merge both representations such that the resulting representation is modeled as a vague or unclear one. In modeling these unclear boundaries, three alternatives are found: fuzzy models (Schneider, 2000, Usery, 1996), which are based on the theory of fuzzy sets and have been applied to spatial uncertainty; probabilistic models (Burrough, 1996; Finn, 1993), which are based on probability theory to model positional and measurement uncertainty; and exact models (Clementini & Di Felice, 1996, 1997; Erwing & Schneider, 1997), which map data models for spatial objects with sharp boundaries onto spatial objects with broad boundaries.

INTEGRITY CONSTRAINTS IN SPATIAL DATABASES

Integrity constraints enforce consistency in spatial databases. Integrity constraints must be taken into account when updating a database such that the semantics and quality of data are preserved. In the spatial domain,

94

TEAM LinG

Consistency in Spatial Databases

integrity constraints have been mainly used for preventing inconsistency concerning constrictions with respect to rules of geometric abstractions (e.g., polygons, lines, networks) that are used in the representation of spatial objects, whereas conflicting information about positional information has been treated as a problem of data accuracy.

In addition to traditional integrity constraints concerning static, transition, and transactional aspects of database systems, general rules about spatial data must ensure consistent updating of spatial databases. A typical classification of these spatial constraints follows (Cockroft, 1997):

Topological constraints address geometrical properties and spatial relations. They may be associated with structural considerations, such as that polygons must be closed polylines, or topological conditions, such as centerlines, must meet at intersections. Considering a subset of topological constraints, Servigne, Ubeda, Puricelli, and Laurini (2000) defined topo-semantic constraints as those that relate geometry with semantic conditions, as in the constraint that a city’s administrative region must be contained within its city limits.

Semantic integrity constraints are concerned with the meaning of geographic features (e.g., roads should not run through bodies of water).

User-defined integrity constraints are equivalent to business rules in nonspatial database management systems—DBMS (e.g., one must have the appropriate legal permission in order to install a gas station).

Like traditional database systems, in spatial databases, constraints at a conceptual and logical level are inherited by the implementation or physical level. These constraints are translated into a proprietary scripting language or into explicit constraints coded into the application programs. At a logical level, definitions of constraints based on topological relations have been defined (Hadzilacos & Tryfona, 1992). These constraints use a model for defining topological relations that are independent of absolute positional data, a model called the 9-intersection model (Egenhofer & Franzosa, 1991, 1994). In this model, spatial relations, and any geometric operators, can be declared as atomic topological formulae and combined in topological sentences to specify topological constraints.

With an object-oriented perspective, an extension of the OMT (object modeling technique) model to OMT-G (object-oriented data model for geographic applications) provides primitives to represent spatial data with their respective integrity constraints (Borge, Davis, & Laender,

2001, 2002). Constraints are defined by using spatial primitives and constructs for spatial relationships. Within C such an object-oriented data model, constraints are encapsulated as methods associated with spatial classes.

Some of these constraints, in particular aggregation rules, are formulated with operators over parts (i.e., spatial objects) and quantifications over sets, such that they are not expressed in first-order logic.

The specification of constraints can be done by different techniques of knowledge representation. Some attempts have been made to provide end users with easy mechanisms that hide the logic involved in specifying constraints (Cockcroft, 2001; Servigne, Ubeda, Puricelli, & Laurini, 2000). A proposal for constraint specification allows users to define constraints in an English-like language (Servigne, Ubeda, Puricelli, & Laurini, 2000). Basic components of the language are entity classes, relations, and qualifiers (e.g., forbidden, at least n times, at most n times, or exactly n times). Following the same idea, a more recent work (Cockcroft, 2001) extends the previous specification to include attribute values in the topological constraints. For example, “a butterfly valve must not intersect a pipe if the diameter of the pipe is greater than 40 inches.” Interfaces with high-level languages to specify integrity constraints are standalone software tools that are integrated with a geographic information system (GIS).

FUTURE TRENDS

The treatment of consistency in spatial databases has focused mainly on defining integrity constraints and enforcing them for data manipulation. Although there are advances in issues concerning formal models for detecting and defining inconsistency in different processes of spatial data manipulation, such as multiple representation and data integration, most of the effort is still at the conceptual level or in isolated implementations. In this context, there exist important issues and challenges in the treatment of consistency for current spatial databases, which are summarized as follows:

Inaccuracy as inconsistency: The conflicting information that arises from data inaccuracy (e.g., conflicting positional information) may be treated as inconsistency if there exists an explicit integrity constraint that reflects this kind of conflicting information. What will be treated as a problem of data inaccuracy or a problem of inconsistency is still something that needs clarification.

Space does not automatically generate attribute constraints: The spatial dimension of an object

95

TEAM LinG

does not automatically constrain the attributes of the object or the attributes or spatial components of other objects. It should be, however, an easy way to specify spatial integrity constraints.

Partial consistency: A geometric representation can be totally or partially consistent such that queries based on spatial criteria over such representation are not necessarily inconsistent.

Spatial relations: Spatial relations are usually implicitly represented and may not need positional accuracy. So, what is inconsistency with respect to objects’ positional information may not be inconsistency with respect to spatial relations between objects.

Consistency of composite objects: Composite objects treat aggregations of objects and impose constraints with respect to their parts to enforce consistency.

Application-independent vs. application-de- pendent integrity constraints: Application-in- dependent integrity constraints associated with spatial primitives can be built into the system with ad-hoc implementations. Application-dependent constraints, in contrast, require facilities for the generation of code to impose the constraints.

Consistency with propagation of updates: A modification in a spatial database may cause simultaneous updates in a large number of records with consequences in the consistency of data.

Consistency at multiple representation levels: Spatial databases may need to treat different levels of detail in the spatial representation. Consistency at multiple representations needs to be based on research that goes beyond considering topological relations, to incorporate, for example, orientation and distance relations.

Consistency across heterogeneous spatial databases: Distributed and interoperating spatial information systems are not always designed under the same conceptual model, with the result that what is consistent in one database is inconsistent in another. Further studies are needed to analyze the consistent integration of data not only at the geometric level but also at the semantic level.

Management of inconsistency tolerance: In the presence of inevitable inconsistencies in a database, it is necessary to find strategies that provide consistent answers despite the fact that the database is inconsistent with respect to a set of integrity constraints. For example, a database with conflicting positional information may still provide consistent answers with respect to topological relations.

Consistency in Spatial Databases

Context dependence of integrity constraints: Some integrity constraints are associated with the computational application of particular spatial operators (e.g., area and intersection). Context may determine the integrity constraint that is required for a particular use.

Integration of solutions: Advances in modeling consistency in spatial databases, such as models of consistency at multiple representations and data integration, should be integrated to give solutions to real problems.

Efficient algorithms: The complexity of spatial data requires efficient algorithms for implementing consistency models of spatial data.

CONCLUSION

This article has discussed models of consistency and the specification of integrity constraints that emphasize the current issues and challenges for handling consistency in spatial databases. In this discussion, relevant aspects are composite objects by spatial aggregation, topological relations, multiple representation levels and integration of spatial databases. It is argued that consistency cannot be considered as a binary condition, with data being either consistent or inconsistent, but not both. Rather, consistency is best seen as a spectrum of relative conditions that can be exploited depending on the use of data.

Since spatial databases and spatial–temporal databases have vast domains of applications, it is expected that they will become a widely used technology. In addition, with the free access of available information, more data from different sources is available. In this context, data consistency in spatial databases is a current and challenging area of research and development.

REFERENCES

Abdelmoty, A., & Jones, C. (1997). Toward maintaining consistency in spatial databases. Sixth International Conference on Knowledge Management (pp. 293-300). Las Vegas: ACM Press.

Borges, K., Davis, C., & Laender, A. (2001). OMT-G: An object-oriented data model for geographic information applications. GeoInformatica, 5, 221-260.

Borges, K., Davis, C. & Laender, A. (2002). Integrity constraints in spatial databases. In J. Doorn & K. Rivero (Eds.), Database integrity: Challenges and solutions

(pp. 144-171). Hershey, PA: Idea Group Publishing.

96

TEAM LinG

Consistency in Spatial Databases

Burrough, P. (1996). Natural objects with indeterminate boundaries. In A. Frank (Ed.), Geographic objects with indeterminate boundaries GISDATA (pp. 30-28). London: Taylor & Francis.

Clementini, E., & Di Felice, P. (1996). An algebraic model for spatial objects with indeterminate boundaries. In A. Frank (Ed.), Geographic objects with indeterminate boundaries GISDATA (pp. 155-169). London: Taylor & Francis.

Clementini, E., & Di Felice, P. (1997). Approximate topological relations. International Journal of Approximate Reasoning, 16, 73-204.

Cockcroft, S. (1997). A taxonomy of spatial integrity constraints. GeoInformatica, 1, 327-343.

Cockcroft, S. (2001). Modelling spatial data integrity constraints at the metadata level. In D. Pullar (Ed.), GeoComputation. Retrieved February 15, 2004, from http://www.geocomputation.org/2001/

Egenhofer, M. (1997). Consistency revised.

GeoInformatica, 1(4), 323-325.

Egenhofer, M., E., Clementini, E., & Di Felice, P. (1994). Evaluating inconsistency among multiple representations. Spatial Data Handling, 901-920.

Egenhofer, M., & Franzosa, R. (1991). Point-set topological spatial relations. International Journal of Geographic Information Systems, 5, 161-174.

Egenhofer, M., & Franzosa, R. (1994). On the equivalence of topological relations. International Journal of Geographic Information Systems, 8, 133-152.

Egenhofer, M., & Sharma, J. (1993). Assessing the consistency of complete and incomplete topological information. Geographical Systems, 1, 47-68.

Erwing, M., & Schneider, M. (1997). Vague regions. Symposium on Advances in Spatial Databases, LNCS, 1262

(pp. 298-320). Berlin, Germany: Springer-Verlag.

Finn, J. (1993). Use of the average mutual informaiton index in evaluating error and consistency. International Journal of Geographic Information Science, 7, 349-366.

Frank, A. (2001). Tiers of ontology and consistency constraints in geographic information systems. International Journal of Geographic Information Science, 15, 667-678.

Hadzilacos, T., & Tryfona, N. (1992). A model for expressing topological constraints in geographic databases. In A. Frank, A. Campari, & U. Formentini (Eds.), Theories and methods of spatio-temporal reasoning in geographic

space COSIT ’92, LNCS, 639 (pp. 252-268). Berlin, Ger-

C

many: Springer-Verlag.

Kuipers, B., Paredaens, J., & den Busshe, J. (1997). On topological equivalence of spatial databases. In F. Afrati & P. Kolaitis, (Eds.), 6th international conference on database theory, LNCS, 1186 (pp. 432-446). Berlin, Germany: Springer-Verlag.

Pason, S. (1996). Current approaches to handling imperfection information in data and knowledge bases. IEEE Transactions on Knowledge and Data and Engineering, 8, 353-371.

Schneider, M. (2000). Metric operations on fuzzy spatial objects in databases. 8th ACM Symposium on Geographic Information Systems (pp. 21-26). ACM Press.

Servigne, S., Ubeda, T., Puricelli, A. & Laurini, R. (2000). A methodology for spatial consistency improvement of geographic databases. GeoInformatica, 4, 7-24.

Shekhar, S., Coyle, M., Goyal, B., Liu, D.-R., & Sakar, S. (1997). Data models in geographic information systems. Communications ACM, 40,103-111.

Stock, O. (1997). Spatial and temporal reasoning.

Dordrecht, The Netherlands: Kluwer Academic Library.

Tryfona, N., & Egenhofer, M. (1997). Consistency among parts and aggregates: A computational model. Transactions on GIS, 1, 189-206.

Usery, E. A. (1996). Conceptual framework and fuzzy set implementation for geographic features. In A. Frank, (Ed.), Geographic objects with indeterminate boundaries GISDATA (pp. 71-85). London: Taylor & Francis.

Voisard, A., & David, B. (2002). A database perspective on geospatial data modeling. IEEE Transactions on Knowledge and Data and Engineering, 14, 226-246.

KEY TERMS

Inconsistency Tolerance: It is the strategy that lets a system answers and processes data despite the fact that the databases are inconsistent.

Multiple Spatial Representations: Multiple representation levels encompass changes in geometric and topological structure of a digital object that occur with the changing resolution at which the object is encoded.

Objects with Indeterminate Boundaries: These are objects that lay in one of two categories: objects with sharp boundaries but whose position and shape are

97

TEAM LinG

unknown or cannot be measured exactly;, or objects with not well-defined boundaries or for which it is useless to fix boundaries.

Spatial Consistency: It refers to the agreement between data representation and a model of space.

Spatial-Integrity Constraints: They refer to constraints that address properties with respect to a model of the space. They are usually classified into topological integrity constraints, semantic integrity constraints, and user-defined integrity constraints.

Topological Consistency: Two representations are consistent if they are topologically equivalent or if the

Consistency in Spatial Databases

topological relations are valid under certain consistency rules.

Topological Invariants: These are properties that are invariant under topological transformations. Examples of topological invariants are the interior, boundary, and exterior of spatial objects.

Topological Requivalence: Two representations are said to be topologically equivalent if one can be mapped into the other by a topological transformation of the real plane. Examples of topological transformations are rotation, scale change, translation, and symmetry.

98

TEAM LinG

 

99

 

Converting a Legacy Database to Object-

 

 

 

C

Oriented Database

 

 

 

 

 

Reda Alhajj

University of Calgary, Canada

Faruk Polat

Middle East Technical University, Turkey

INTRODUCTION

We present an approach to transfer content of an existing conventional relational database to a corresponding existing object-oriented database. The major motivation is having organizations with two generations of information systems; the first is based on the relational model, and the second is based on the object-oriented model. This has several drawbacks. First, it is impossible to get unified global reports that involve information from the two databases without providing a wrapper that facilitates accessing one of the databases within the realm of the other. Second, organizations should keep professional staff familiar with the system. Finally, most of the people familiar with the conventional relational technology are willing to learn and move to the emerging object-oriented technology. Therefore, one appropriate solution is to transfer content of conventional relational databases into object-oriented databases; the latter are extensible by nature, hence, are more flexible to maintain. However, it is very difficult to extend and maintain a conventional relational database.

We initiated this study based on our previous research on object-oriented databases (Alhajj & Arkun, 1993; Alhajj & Elnagar, 1998; Alhajj & Polat, 1994, 1998); we also benefit from our research on database re-engineering (Alhajj, 1999; Alhajj & Polat, 1999). We developed a system and implemented a first prototype that takes characteristics of the two existing schemas as input and performs the required data transfer. For each of the two schemas, some minimum characteristics must be known to the system in order to be able to perform the transfer from tuples of the relational database into objects in the object-oriented database. We assume that all relations are in the third normal form, and the two schemas are consistent; that is, for every class, there is a corresponding relation. This consistency constraint necessitates that attributes with primitive domains in a certain class have their equivalent attributes in the corresponding relation, and values of non-primitive attributes in a class are determined

based on values of the corresponding foreign keys in the corresponding relation. Concerning the migrated data, it is necessary that consistency and integrity are maintained. Finally, the transfer process should not result in any information loss.

BACKGROUND

There are several approaches described in the literature to facilitate accessing content of a relational database from an object-oriented application. DRIVER (Lebastard, 1995) proposes an object wrapper that allows relational database reusing. It uses relational database management systems as intelligent file managers and proposes object models on top of them. The user is expected to provide the mapping between the two schemas. Persistence (Agarwal, Keene & Keller, 1995; Keller, Agarwal & Jensen, 1993) is an application development tool that uses an automatic code generator to merge C++ applications with relational databases. The application object model is mapped to a relational schema in the underlying database. Therefore, object operations are transformed into relational operations and vice versa. The benefits and risks of the migration process are discussed in Keller and Turner (1995). The authors argue that storing data in a relational database and providing a wrapper to allow programming in an object programming language provides more benefit at significantly reduced risks and costs as compared with migrating to an object-oriented database. We argue that providing a wrapper adds a performance cost to be paid every time the data is processed because the mapping between objects and tuples is repeated dynamically in each session. But, the mapping cost is paid only once by migrating to an object-oriented database.

MAIN THRUST

In this section, we present the basic necessary analysis that results in the information required to transfer the

Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.

TEAM LinG

Соседние файлы в предмете Электротехника