How do people make decisions in their everyday life? We recall information related to a particular sphere, analyze previous experience, and shape our activities correspondingly. What makes this modus operandi possible? Slews of data our brain stores and retrieves whenever we need it.
Companies adopt a similar approach when managing their business affairs, using the information they have accumulated to develop insights, streamline workflow, and map out new marketing strategies or improve the existing ones. Realizing the vital importance of data, organizations, however, don’t always handle it properly which loses them more than $600 billion a year. Evidently, companies as often as not fail to store, organize, and process coal-and-ice data efficiently. To counter such gruesome statistics, enterprise data warehouse concepts (DWH) have been introduced.
Enterprise DWH: What is it and what is it not?
The acronym EDW stands for an enterprise data warehouse – a repository that hoards and manages corporate information related to the organization’s historical functioning. This electronic data warehouse receives information provided by a number of flat file sources, including CRM, ERP, and various manual recordings. Keeping all the data in one virtual place rules out managing multiple storage facilities and allows stakeholders to query and get access to it with the further utilization for business intelligence (BI). The latter encompasses a set of techniques that turn raw data into insights to be acted upon.
Lately, the idea of an enterprise-wide data warehouse has been gaining traction causing the appearance of related or similar models, that, however, have some differences from the classical enterprise warehouse of information.
Conventionally, EDW (aka relational database) is understood as a virtual depot of structured data that have prescribed relations between them, can be arranged in the form of a table, easily accessed and retrieved by users manually or via specific querying tools. Such enterprise data warehouse solutions have a significant volume never going below 100GB.
Smaller size repositories called data marts are hardly suitable for enterprises. Today, data marts are typically viewed as banks containing information organized by domains united into a system. This model is known as enterprise data warehouse bus architecture that enables to segment the unwieldy warehouse into smaller sections that are easier to deal with.
Another term related to the EDW umbrella concept is a data lake. Unlike the two previous data tanks, the data lake (aka non-relational database) contains unstructured or mixed information. Being too awkward and messy to be leveraged for BI purposes, these huge data collections nevertheless find wide application in machine learning and data mining.
Whatever type you opt for, it can definitely add value to your business.
Guaranteed software project success with a free 30-minute strategy session!
Enterprise data warehouse benefits
Our experience in implementing EDW projects for various organizations shows that this technology brings a number of perks to the end users.
- A reliable single storage place. Even if your data is strewn across far-sundered regions and several providers, the enterprise warehouse behaves as if it were a unified repository. Moreover, the stored data is futureproofed and never meant for deletion, so you can be sure that once it gets there, the dossier won’t be lost.
- 24×7 data access. The data is made available for any authorized viewer in real-time mode. This access is significantly facilitated by the ETL (extract-transform-load) approach adopted by contemporary warehouse providers. Instead of preliminary data processing before uploading it to the warehouse (which was practiced previously), now the unchanged data is directly transferred to the data depot enabling swifter access and analysis.
- Convenient data arrangement. The data model structuring of the information that segments it into thematic chunks (with metadata referring to the source of information) enables chapter-and-verse orientation among the plethora of data kept in the warehouse. Plus, it is divided according to the time of its origin, which makes observing tendencies a breeze.
- Foolproof operationality. Numerous cases of employing this technology testify to the fact that various industries can benefit from EDW, healthcare, sports, insurance, and transportation included. Thus, even if your employees are far removed from the IT realm, making use of enterprise data warehouse services will present no difficulty to them.
- Tracking opportunities. State-of-the-art EDWs enable monitoring, auditing, and vetting data sources to find and dispose of errors. Also, keeping compliance with data protection regulations is smooth without the necessity to examine multiple data locations.
- Presenting a 3D image of the customer. Enterprise data warehouses are meant to furnish a comprehensive picture of the company’s clients and their buying behavior. This essential knowledge will help the organization to perform predictive analysis and tweak or even overhaul its business strategy and ultimately enhance revenues.
All these benefits stem from the peculiar mode of organization EDW has.
Related article: 10 tips to implement a DWH for bank in 9 months
Enterprise data warehouse architecture and its components
There are five basic elements of EDW architecture.
- Data sources. The system draws upon financial and banking apps, IoT gadgets, organizational software (ERP and CRM), and all mobile and online resources that provide raw information.
- The warehouse per se. In its ecosystem, data is aggregated and transformed (standardized and dimensionalized).
- Meta-data unit. Controlled by a metadata manager, this module is a separate component within the EDW. It contains explanations on what a piece of data relates to. All this “data about data” is divided into technical (for instance, information on the initial source) and business (like sales region).
- A collection of APIs. These are data tool integrations that enable data manipulation. Today, the outdated ETL model with tools handling the data before it reaches the warehouse is increasingly replaced by the ETL scheme where data handling occurs within the warehouse.
- UI. It is viewed as a presentation space where a user can access the data leveraging analytical tools and then share it with other stakeholders.
These components of the data pipeline can be arranged differently, which conditions the type of EDW architecture.
It is the bread-and-butter model in which the database has a direct connection to the UI where users make their queries. Being theoretically foolproof, this scheme is subject to a number of issues when operated practically.
First of all, working with volumes of data over 100GB slows down processing speed and yields confused query results as often as not. Secondly, such a system can sieve away non-required data only if the input query is precise, so UI wielding requires additional care. Finally, this data warehouse configuration has restricted flexibility, with analytical capabilities and report complexity being significantly hamstrung if not handicapped.
In view of its sluggishness and volatility, one-tier warehouse architecture is seldom employed for handling large amounts of data and performing advanced queries.
Here, between the UI and the database a data mart level is inserted where the data is distributed according to the sphere it refers to. These middle-level instances serve as a database extension and considerably streamline data access since every mart contains only domain-specific dossiers. Moreover, in such a model data availability can be restricted to certain employees or departments only, which enhances the security of the entire EDW.
Yet, solving the querying problem doesn’t come without a price. Opting for this model, the organization has to allocate resources both for additional hardware and for the integration of data marts with the core platform.
Both types of EDW mentioned above practice two-dimensional data representation, similar to what Excel or Google Sheets do. The three-tier warehouse design additionally utilizes online analytical processing (OLAP) cubes with their multi-dimensional data representation model. It looks like a combination of a number of Excel tables and exceedingly facilitates data processing and analyzing since users can move between the dimensions at will. Being very flexible, this architecture is susceptible to fine-tuning, so the cubes can be configured to provide access to the whole warehouse data or each mart individually.
OLAP model proved to be very efficient for data processing and analyzing, so most data warehouse providers include it into their set of services.
Enterprise data warehouse examples
Traditionally, organizations have relied on on-premises EDWs that were located on the hardware belonging to the company and thus simplified the work and responsibilities of data engineers. However, this approach involves exorbitant expenditures – both for the technological and physical infrastructure and for a team of experts who set up, maintain, operate and provide security to the system.
Realizing the inadequacy of this practice, the majority of businesses today have switched to cloud-based EDWs. What makes these data warehouses superior to the legacy on-premises ones?
- Consistent availability. If an internally-based EDW goes down, all the work is halted until it is put into operation again. It never happens with cloud-powered warehouses since their facilities are distributed across several data hubs. In case one of them experiences issues, other data centers step into the breach unbeknownst to users.
- Scalability. The growth of an organization spells the need for additional on-premises data warehouse facilities and its shrinkage leaves a lot of valuable machinery unused. Cloud EDWs allow for prompt downward or (typically) upward scalability of the warehouse capacity to meet the shifting purposes of the customer.
- Cost-saving. Typically, clients pay for renting EDW cloud facilities with software update payments included in the subscription fee. Thus, companies never splurge on purchasing and maintaining costly equipment (which was often bought with a room to grow into and therefore a part of it remained unused for quite a time), nor on keeping a whole bunch of IT specialists on the regular payroll.
- Security. Cloud EDW providers take professional care of security measures and keep abreast of compliance standards, which takes one more bother off the customer’s hands.
What are the most popular cloud EDW providers?
It is the cloud data warehouse with the longest history dating back to 2013 which, together with the high-quality of services and familiar PostgreSQL-based syntax, accounts for its leading position as to the number of deployments. Its column-oriented storage facilities are split into clusters of nodes, each having its own CPU, RAM, and storage room. As a client’s needs evolve, they can hire another node to make use of.
A definite forte of this warehouse is its seamless integration with a plethora of tools and platforms – both by the same vendor (like Amazon QuickSight) and by a number of third-parties (Periscope Data, Tableau, and IBM Cognos, to mention a few).
Unlike Amazon Redshift, the architecture of this EDW allows for separate upscaling of computing and storage facilities. Such a model makes it more flexible and user-friendly, allowing users to focus on either enhancing the speed of data processing or the augmentation of storage space. Plus, it can be set up on top of either Microsoft Azure or AWS infrastructure, which significantly broadens its functional capacities.
Azure SQL Data Warehouse
Being a Microsoft product, this ecosystem makes use of the vendor’s relational cloud database (Azure SQL Database) and enables access to and support of a whole gamut of Microsoft-powered technologies and tools.
Like other cloud databases, this EDW has an MPP architecture enabling it to draw data not only from numerous databases but from a variety of SaaS platforms as well. What lets this data warehouse stand out among the competitors is a rich scope of pricing options for customers to choose from in accordance with their unique needs.
Launched by Google, this EDW works well with its technologies (like MapReduce and Google Cloud Storage). A fine sample of serverless architecture, it performs continuous and dynamic computational and storage provisioning without users observing details of the allocation process. Alas, it falls short of products of other vendors in terms of tunability, but its highly scalable billing-per-query pricing plan dovetails exactly into what users need and pay for.
As you see, the range of available options is rather wide, so making the right choice may be a challenge.
How to choose the right enterprise data warehouse solution?
There are several crucial questions you have to ask yourself when selecting an EDW solution for your organization.
- What types of data are going to be stored in the warehouse?
If it is structured data that can be represented by rows and columns of a spreadsheet (like inventories or user data), a relational database is what the doctor ordered for your company. If the data to be stored consists of emails, videos, podcasts, pictures, texts, and other semi-structured information, a non-relational database suits it better. With totally unstructured data, you should go for a data lake instead of the conventional EDW.
- How much storage space do I need?
If the data exceeds 2 terabytes and is likely to accrue, think non-relational warehouse. If your data is smaller in size but requires advanced analytics, a relational data warehouse will cost you less.
Also, the method of scaling matters greatly. Some solutions (like Snowflake) perform scaling automatically while others (for example, Amazon Redshift) require manual addition of nodes.
- How fast do I need the queried data?
If you are more after analytics than immediate response, speed doesn’t matter that much. But if every little change in the data is of the moment for your business goals, your choice of EDW is conditioned by (guess what?) the previous parameter. The scale and the performance have a direct correlation, so a bigger scale spells a greater speed. In case both scale and analytics are what you look for in EDW, a wise tradeoff between these two indices must be sought.
- What will I be paying for?
Providers take various criteria as a unit of payment calculation. Thus, you may have to pay for the amount of data you store, the total size of storage facilities you rent, the period of service, or the number of queries you run. So you should base your choice on the type of activities you will perform most while working with EDW and find the solution with the lowest prices for this very activity.
- How much do I want to participate in the warehouse maintenance?
If you are a startup with a small staff you are sure to want a self-optimizing data warehouse and free your employees from the headache of managing the facilities you rent. Yet, manual maintenance of the warehouse pipeline will enable you to tailor it to your specific needs and thus have greater control over its flexibility, performance, and your expenditures.
- Can I still use my regular tools working with EDW?
Every entrepreneur would like to spend minimal time and money on adopting new technologies and tools and training personnel to use them. So engaging the services of an EDW provider make sure your conventional tools work well within the database and your current software needs as little customization as possible.
Let’s hope these guidelines were instrumental in your choosing an enterprise data warehouse that suits you to a tee. Now you have to implement it in your organization.
Implementing enterprise data warehouse: An algorithm to follow
While executing enterprise data warehouse projects, we at DICEUS have developed a universal roadmap that streamlines and facilitates the implementation process tremendously.
Stage 1. Defining business goals
The ultimate purpose of a data warehouse is to offer stakeholders accurate and timely information on the company’s performance to take steps to improve it. By interviewing the organization’s decision-makers we determine what data is crucial for them to enter into the warehouse to be analyzed and acted upon.
Stage 2. Collecting requirements for implementation
We gather all requirements for hardware, software, testing, analysis, reporting, and employee training that condition the successful implementation of the project. This stage also includes determining data backup strategy and failure recovery plan.
Stage 3. Data modeling
At this stage, we identify data sources and define the way data structures in the warehouse will be stored, processed, and made available. This model serves as a wireframe for developing logical and physical data structures determined by the established requirements.
Stage 4. OLAP or tabular design
Now we can start building tables or OLAP cubes. The latter model is more elaborate and takes longer to implement since it includes numerical values (aka grouping measures), dimensions (like geographical areas, time units), and data granulation. A matter of our special care is making sure OLAP cubes are promptly updated as soon as the same procedure is applied to the warehouse.
Stage 5. Front-end development
Next comes the choice of the form of sharing data stored in the warehouse that must be accessed via any device, be it a desktop or a smartphone. To do that, the experts of DICEUS choose a front-end tool that fits with the customer’s requirements and business goals. The two key elements of the front-end that we get as the output are reports and dashboard features.
Stage 6. Testing
Once the data warehouse is ready, it must be tested. Thus, we check the quality of operation, reveal possible problems, and address them before the actual deployment is implemented.
Stage 7. Deployment
At this stage, the completed enterprise data warehouse is launched and the company’s employees are taught to operate it. Adequate training is important since, if the end users fail to learn its ins and outs, all the previous effort may be rendered useless.
Guaranteed software project success with a free 30-minute strategy session!
In the data-driven world of the 21st century, master of information is master of situation. By pooling critical information in a single enterprise data warehouse and having constant access to it, companies can analyze trends, develop meaningful insights, and plan their professional activities correspondingly. DICEUS can be your reliable partner in implementing top-notch EDW projects that will propel your business on the way to a financial bonanza.