How do people make decisions in their everyday life? We recall information related to a particular sphere, analyze previous experience, and shape our activities correspondingly. What makes this modus operandi possible? Slews of data our brain stores and retrieves whenever we need it.
Companies adopt a similar approach when managing their business affairs, using the information they have accumulated to develop insights, streamline workflow, and map out new marketing strategies or improve the existing ones. Realizing the vital importance of data, organizations, however, don’t always handle it properly which loses them more than $600 billion a year. Evidently, companies as often as not fail to store, organize, and process coal-and-ice data efficiently. To counter such gruesome statistics, enterprise data warehouse concepts (DWH) have been introduced.
The acronym EDW stands for an enterprise data warehouse – a repository that hoards and manages corporate information related to the organization’s historical functioning. This electronic data warehouse receives information provided by a number of flat file sources, including CRM, ERP, and various manual recordings. Keeping all the data in one virtual place rules out managing multiple storage facilities and allows stakeholders to query and get access to it with the further utilization for business intelligence (BI). The latter encompasses a set of techniques that turn raw data into insights to be acted upon.
Lately, the idea of an enterprise-wide data warehouse has been gaining traction causing the appearance of related or similar models, that, however, have some differences from the classical enterprise warehouse of information.
Conventionally, EDW (aka relational database) is understood as a virtual depot of structured data that have prescribed relations between them, can be arranged in the form of a table, easily accessed and retrieved by users manually or via specific querying tools. Such enterprise data warehouse solutions have a significant volume never going below 100GB.
Smaller size repositories called data marts are hardly suitable for enterprises. Today, data marts are typically viewed as banks containing information organized by domains united into a system. This model is known as enterprise data warehouse bus architecture that enables to segment the unwieldy warehouse into smaller sections that are easier to deal with.
Another term related to the EDW umbrella concept is a data lake. Unlike the two previous data tanks, the data lake (aka non-relational database) contains unstructured or mixed information. Being too awkward and messy to be leveraged for BI purposes, these huge data collections nevertheless find wide application in machine learning and data mining.
Whatever type you opt for, it can definitely add value to your business.
Our experience in implementing EDW projects for various organizations shows that this technology brings a number of perks to the end users.
All these benefits stem from the peculiar mode of organization EDW has.
10 tips to implement a DWH for bank in 9 months
There are five basic elements of EDW architecture.
These components of the data pipeline can be arranged differently, which conditions the type of EDW architecture.
It is the bread-and-butter model in which the database has a direct connection to the UI where users make their queries. Being theoretically foolproof, this scheme is subject to a number of issues when operated practically.
First of all, working with volumes of data over 100GB slows down processing speed and yields confused query results as often as not. Secondly, such a system can sieve away non-required data only if the input query is precise, so UI wielding requires additional care. Finally, this data warehouse configuration has restricted flexibility, with analytical capabilities and report complexity being significantly hamstrung if not handicapped.
In view of its sluggishness and volatility, one-tier warehouse architecture is seldom employed for handling large amounts of data and performing advanced queries.
Here, between the UI and the database a data mart level is inserted where the data is distributed according to the sphere it refers to. These middle-level instances serve as a database extension and considerably streamline data access since every mart contains only domain-specific dossiers. Moreover, in such a model data availability can be restricted to certain employees or departments only, which enhances the security of the entire EDW.
Yet, solving the querying problem doesn’t come without a price. Opting for this model, the organization has to allocate resources both for additional hardware and for the integration of data marts with the core platform.
Both types of EDW mentioned above practice two-dimensional data representation, similar to what Excel or Google Sheets do. The three-tier warehouse design additionally utilizes online analytical processing (OLAP) cubes with their multi-dimensional data representation model. It looks like a combination of a number of Excel tables and exceedingly facilitates data processing and analyzing since users can move between the dimensions at will. Being very flexible, this architecture is susceptible to fine-tuning, so the cubes can be configured to provide access to the whole warehouse data or each mart individually.
OLAP model proved to be very efficient for data processing and analyzing, so most data warehouse providers include it into their set of services.
Traditionally, organizations have relied on on-premises EDWs that were located on the hardware belonging to the company and thus simplified the work and responsibilities of data engineers. However, this approach involves exorbitant expenditures – both for the technological and physical infrastructure and for a team of experts who set up, maintain, operate and provide security to the system.
Realizing the inadequacy of this practice, the majority of businesses today have switched to cloud-based EDWs. What makes these data warehouses superior to the legacy on-premises ones?
It is the cloud data warehouse with the longest history dating back to 2013 which, together with the high-quality of services and familiar PostgreSQL-based syntax, accounts for its leading position as to the number of deployments. Its column-oriented storage facilities are split into clusters of nodes, each having its own CPU, RAM, and storage room. As a client’s needs evolve, they can hire another node to make use of.
A definite forte of this warehouse is its seamless integration with a plethora of tools and platforms – both by the same vendor (like Amazon QuickSight) and by a number of third-parties (Periscope Data, Tableau, and IBM Cognos, to mention a few).
Unlike Amazon Redshift, the architecture of this EDW allows for separate upscaling of computing and storage facilities. Such a model makes it more flexible and user-friendly, allowing users to focus on either enhancing the speed of data processing or the augmentation of storage space. Plus, it can be set up on top of either Microsoft Azure or AWS infrastructure, which significantly broadens its functional capacities.
Being a Microsoft product, this ecosystem makes use of the vendor’s relational cloud database (Azure SQL Database) and enables access to and support of a whole gamut of Microsoft-powered technologies and tools.
Like other cloud databases, this EDW has an MPP architecture enabling it to draw data not only from numerous databases but from a variety of SaaS platforms as well. What lets this data warehouse stand out among the competitors is a rich scope of pricing options for customers to choose from in accordance with their unique needs.
Launched by Google, this EDW works well with its technologies (like MapReduce and Google Cloud Storage). A fine sample of serverless architecture, it performs continuous and dynamic computational and storage provisioning without users observing details of the allocation process. Alas, it falls short of products of other vendors in terms of tunability, but its highly scalable billing-per-query pricing plan dovetails exactly into what users need and pay for.
As you see, the range of available options is rather wide, so making the right choice may be a challenge.
There are several crucial questions you have to ask yourself when selecting an EDW solution for your organization.
If it is structured data that can be represented by rows and columns of a spreadsheet (like inventories or user data), a relational database is what the doctor ordered for your company. If the data to be stored consists of emails, videos, podcasts, pictures, texts, and other semi-structured information, a non-relational database suits it better. With totally unstructured data, you should go for a data lake instead of the conventional EDW.
If the data exceeds 2 terabytes and is likely to accrue, think non-relational warehouse. If your data is smaller in size but requires advanced analytics, a relational data warehouse will cost you less.
Also, the method of scaling matters greatly. Some solutions (like Snowflake) perform scaling automatically while others (for example, Amazon Redshift) require manual addition of nodes.
If you are more after analytics than immediate response, speed doesn’t matter that much. But if every little change in the data is of the moment for your business goals, your choice of EDW is conditioned by (guess what?) the previous parameter. The scale and the performance have a direct correlation, so a bigger scale spells a greater speed. In case both scale and analytics are what you look for in EDW, a wise tradeoff between these two indices must be sought.
Providers take various criteria as a unit of payment calculation. Thus, you may have to pay for the amount of data you store, the total size of storage facilities you rent, the period of service, or the number of queries you run. So you should base your choice on the type of activities you will perform most while working with EDW and find the solution with the lowest prices for this very activity.
If you are a startup with a small staff you are sure to want a self-optimizing data warehouse and free your employees from the headache of managing the facilities you rent. Yet, manual maintenance of the warehouse pipeline will enable you to tailor it to your specific needs and thus have greater control over its flexibility, performance, and your expenditures.
Every entrepreneur would like to spend minimal time and money on adopting new technologies and tools and training personnel to use them. So engaging the services of an EDW provider make sure your conventional tools work well within the database and your current software needs as little customization as possible.
Let’s hope these guidelines were instrumental in your choosing an enterprise data warehouse that suits you to a tee. Now you have to implement it in your organization.
While executing enterprise data warehouse projects, we at DICEUS have developed a universal roadmap that streamlines and facilitates the implementation process tremendously.
The ultimate purpose of a data warehouse is to offer stakeholders accurate and timely information on the company’s performance to take steps to improve it. By interviewing the organization’s decision-makers we determine what data is crucial for them to enter into the warehouse to be analyzed and acted upon.
We gather all requirements for hardware, software, testing, analysis, reporting, and employee training that condition the successful implementation of the project. This stage also includes determining data backup strategy and failure recovery plan.
At this stage, we identify data sources and define the way data structures in the warehouse will be stored, processed, and made available. This model serves as a wireframe for developing logical and physical data structures determined by the established requirements.
Now we can start building tables or OLAP cubes. The latter model is more elaborate and takes longer to implement since it includes numerical values (aka grouping measures), dimensions (like geographical areas, time units), and data granulation. A matter of our special care is making sure OLAP cubes are promptly updated as soon as the same procedure is applied to the warehouse.
Next comes the choice of the form of sharing data stored in the warehouse that must be accessed via any device, be it a desktop or a smartphone. To do that, the experts of DICEUS choose a front-end tool that fits with the customer’s requirements and business goals. The two key elements of the front-end that we get as the output are reports and dashboard features.
Once the data warehouse is ready, it must be tested. Thus, we check the quality of operation, reveal possible problems, and address them before the actual deployment is implemented.
At this stage, the completed enterprise data warehouse is launched and the company’s employees are taught to operate it. Adequate training is important since, if the end users fail to learn its ins and outs, all the previous effort may be rendered useless.
In the data-driven world of the 21st century, master of information is master of situation. By pooling critical information in a single enterprise data warehouse and having constant access to it, companies can analyze trends, develop meaningful insights, and plan their professional activities correspondingly. DICEUS can be your reliable partner in implementing top-notch EDW projects that will propel your business on the way to a financial bonanza.