Papers
Topics
Authors
Recent
2000 character limit reached

Web data modeling for integration in data warehouses (0705.1457v1)

Published 10 May 2007 in cs.DB

Abstract: In a data warehousing process, the data preparation phase is crucial. Mastering this phase allows substantial gains in terms of time and performance when performing a multidimensional analysis or using data mining algorithms. Furthermore, a data warehouse can require external data. The web is a prevalent data source in this context, but the data broadcasted on this medium are very heterogeneous. We propose in this paper a UML conceptual model for a complex object representing a superclass of any useful data source (databases, plain texts, HTML and XML documents, images, sounds, video clips...). The translation into a logical model is achieved with XML, which helps integrating all these diverse, heterogeneous data into a unified format, and whose schema definition provides first-rate metadata in our data warehousing context. Moreover, we benefit from XML's flexibility, extensibility and from the richness of the semi-structured data model, but we are still able to later map XML documents into a database if more structuring is needed.

Citations (5)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.