Andrew Jones CO7107 Data Warehousing and
CO7107 Data Warehousing and Multi-dimensional Databases
Table of Contents
“What is our business? And what business are we in?” Peter Drucker.
In this report, the nature of a DBMS (Database Management System) will be considered. In addition, how it works will be investigated and an exploration of the market will be undertaken to ascertain what is considered to be leading in the field. An organisation’s needs for data, how this may have changed over the years and how DBMS has evolved to cope with these changes will be considered. PICK OUT KEY POINTS RE. THESE FOR CONCLUSION
A database, (db), is “A structured set of data held in
computer storage and typically accessed or manipulated by means of specialized
This specialized software introduces the term database management system (DBMS) which is used to describe the application suite which administers and manages the database. Within the database, software tools are principally employed for the: storing, indexing, modification, retrieval, transitioning, and querying of data within the system.
Gartner defines a DBMS as a complete software system used to define, create, manage, update and query a database, by which we mean an organized collection of data that may be structured in multiple formats and stored in some form of storage medium.
The DBMS market is defined by relational and nonrelational
database management products that are suitable for a broad range of
enterprise-level transactional applications.
The DBMS is an interface between the database and the entities that need and require access to the database.
Figure 1 DBMS as an interface
Advantages of the DBMS
Disadvantages of a DBMS
· Data sharing
· Increased costs
· Data security
· Management complexity
· Data integration
· Maintaining currency
· Referential integrity
· Frequent upgrade/replacement cycles
· Minimized data inconsistency
· Improved data access
· Improved decision making
· Increased end-user productivity
Characteristics of a DBMS the advantages and disadvantages
A DBMS is composed of three principal elements, these are:
1. Physical database management
Transparent to the user is the storage and retrieval of the logical database over multiple physical files in the storage medium.
Figure 2 Physical and Logical Databases
2. Database engine
The Database Engine is the principal service for storing, processing, and securing data; it is this package element that is responsible for the: create, read, update and delete (CRUD) functionality of the DBMS.
Figure 3 Composition of a DBMS
Figure 3 Composition of a DBMS
The db schema is a logical data storage structure. The schema is composed of a number of schema objections, known as a collection. Some examples of schema objects are:
· database links
The DBMS manages a single logical presentation of the data safeguarding concurrency, data integrity, security and backup.
Of the DBMS on the market, those considered to be leading
products are Oracle, Microsoft, IBM and SAP. These organisations are so market
dominant that there is little left for rival companies other than those
involved in very specific market niches, such as Neo Technology with their
graph product. Hybrid capabilities
extend the functionality of the RDBMS to more use cases, making a stronger case
for their ubiquity.
Figure 4 Gartner Magic Quadrant of DBMS providers
Figure 5 five DBMS models
Figure 5 five DBMS models
Figure 6 DBMS Rankings by popularity
Figure 7 DBMS Model by popularity
The other models employ a BASE methodology. Basically Available, Soft state, Eventual consistency. This BASE model does not provide the guarantees in data transactions that ACID offers, but reflects the complexities of the data managed by these DBMS.
The final form of DBMS is an emergent form called the in memory model. These databases’ data is “stored in main memory to facilitate faster response times”. (Rouse, in-memory database, 2012)
Business run on data: the amount of a sale, the cost of a purchase, and the wages of staff: but businesses survive on cash flow and strategic planning; typically one of the first strategic Information Technology and Information Systems decision an organisation makes is its accounting system. Its first serious venture into Business Intelligence (BI). The business can now start to gather, report and plan consistently on the organisations Key Performance Indicators (KPI).
Businesses do not start big. They start small and grow.
Figure 8 Commercial Industrial Sectors
For a small business information needs are initially awfully basic and can be addressed predominantly utilising lists, but as the business grows, and the models for management start to formalise, there becomes a requirement for the enhanced management of data and information sources. During this business growth, the business starts to develop formal reporting structures for its operational data, tactical information, together with its strategic management information. Sources of this data and information include:
· accounting information
o sales order processing
o purchase order processing
o profit and loss accounts
o balance sheet
· Inventory management
· Production data
Over time the organisation will have generated empirical data to work with, consequently it can begin to forecast, complete empirical comparisons and plan for the future needs. If the business itself continues to grow, it may require other functions to perform its operational and strategic activities. Such functions are:
Also there will be an extension of IT into other areas of business. Unless planned for, local and workgroup developed databases and technology infrastructure will prove difficult and costly to integrate into an overall strategy. There may also be data model alignment issues as a result of mergers or acquisitions.
As the business grows into the large phase, data security issues will rise to prominence. The changes in data value are now very real and the threats to them incredibly serious. Breaches in security are costly to the organisation.
In this large phase of business development the physicality of data requirements will also change. There will be a lot more computers, tablets, phones, scanners, copiers, printers and cameras to manage. In addition there will be the large scale server, mainframe and networking installations required to service the demands, needs and wants of the business. Fundamentally it comprises a lot of infrastructure.
Business Intelligence (BI), is an analytical data analysis process aimed at enhancing business performance by facilitating key decision makers with the ability to collect, store, retrieve and analyze data to make better-informed decisions. It provides for a consistency, which may otherwise be lacking, in reporting and analysis. As the maxim goes: knowledge is power: but the transitioning from data, through information to result in knowledge is time consuming.
With this in mind there is and should only be “one version of the truth”, a single set of data that provides the source for all analysis, planning and decision making.
Data warehousing and BI have become almost interchangeable terms, as definitions vary for what is a very broad term. It is not just a decision support system, nor is it a methodology, it does not require a data warehouse, and it is not the tool set.
Figure 9 consistency in reporting
Data quality is important in the creation of the data warehouse, as the adage goes: garbage in garbage out.
BI is/can be about monitoring change. As a business organization it is beneficial to be able to react to change. In addition the business may want to cause change. Small businesses can survive and prosper on the instincts and intuition of the proprietor; decisions are not necessarily made from factual evidence. As the business grows, layers of management are introduced and “buy in” from stakeholders is required. The business formalizes and professionalizes.
Change happens at a rate; the rate of this change, n, will
affect an organisations’ ability to deal with the results, positive or negative
of the change. The faster the rate of n the more agile the organisation needs
to be in order to facilitate a successful outcome to the change. The less agile
In databases a flat file refers to data files that contain
records with no structured relationships.
In SQL for Dummies, an introduction to Structured Query Language, Allen G. Taylor notes that the advantage of a flat file is that it takes up less space than a structured file. However, it does require the application to have knowledge of how the data is organized within the file.
Almost pervasively thought of as:
A flat file database is a database that stores data in a
plain text file. Each line of the text file holds one record, with fields
separated by delimiters, such as commas or tabs.
A hierarchical database consists of a collection of records that are connected to each other through links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records.
Two influential database systems that rely on the hierarchical model are IBM’s Information Management System (IMS) [IBM 1978a, McGee 1977] and MRI’s System 2000 [MRI 1974, 1979]. The first IMS version was developed in the late 1960s by IBM and by North American Aviation (Rockwell International) for the Apollo moon-landing program. (Silberschatz, Korth & Sudarshan, 2010)
A network database consists of a collection of records connected to one another through links. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records. (Silberschatz, Korth & Sudarshan, 2010)
The relational model is cantered on this idea: the organization of data
into collections of two-dimensional tables called “relations.”
Figure 10 the mathematics of ER Model
ER Model is based on Strong Mathematical Foundations. The ER model is based on:
1. Set Theory,
2. Mathematical Relations,
3. Modern Algebra,
4. Logic, and
5. Lattice Theory.
In the relational model the mathematical relation construct is used to express the “structure of data values,” while in the ER model the same construct is used to express the “structure of entities.”
Model has Explicit Linkage between Entities. The linkage between
entities is explicit in the ER model while in the relational model is implicit.
In addition, the cardinality information is explicit in the ER model, and some
of the cardinality information is not captured in the relational model.
An object-oriented database management system (OODBMS),
sometimes shortened to ODBMS for object database management system), is a
database management system (DBMS) that supports the modelling and creation of
data as objects. This includes some kind of support for classes of objects and
the inheritance of class properties and methods by subclasses and their
objects. There is currently no widely agreed-upon standard for what constitutes
Also known as JSON store or database
Document Stores provide
more functionality: the system recognizes the structure of the objects stored.
Objects (or documents) may have a variable number of named attributes of
various types (integers, strings, and possibly nested objects), objects can
grouped into collections, and the system provides a simple query mechanism to
search collections for objects with particular attribute values.
Formally, a graph is just a collection of vertices and edges—or, in less intimidating language, a set of nodes and the relationships that connect them. Graphs represent entities as nodes and the ways in which those entities relate to the world as relationships. This general-purpose, expressive structure allows us to model all kinds of scenarios. (Robinson, Webber & Eifrem, 2013)
are the simplest NoSQL databases. Every single item in the database is stored
as an attribute name (or "key"), together with its value.
Key-value Stores provide
a distributed index for object storage, where the objects are typically not
interpreted by the system: they are stored and handed back to the application
as BLOBs. However, these systems usually provide object replication for
recovery, partitioning of the data over many machines, and rudimentary object
Are optimized for
queries over large datasets, and store columns of data together, instead of
Don’t forget to comment on how data requirements haven’t changed significantly however as the organisation grows there are additional functionalities that generate data that need to be included in the information system TRY TO FIND REF.
On the horizon are a number of other db related ideas: CloudDB, DBaaP, DBaaS, DBaaA
In conclusion, the market for DBMS is dominated by four major organisations: Oracle, Microsoft, IBM and SAP. All of these organisations have successfully read the market requirements for data management systems and users of other systems have been profiled and collectively marginalised into exceptionally narrow market niches.