We are happy to share with our readers the podcast with Khaled Gharaibeh, the Business Intelligence Director of a bank. DICEUS CEO Illia Pinchuk talked with Khaled about his IT and technological experience in the banking sector, particularly DWH development and implementation projects. This podcast is worth listening to for someone interested in data warehouses, data management in banks, new digital capabilities, and technologies. Enjoy listening to or reading the interview!
Illia: Thank you very much for joining this set of sessions with the financial experts.
Khaled: Thank you for hosting me on this podcast.
Illia: Please tell us a bit about your career in banking.
Khaled: I started my career in 1997 in web application development. In the 2000s, I worked on systems like ERP implementation, Java backend services implementation, and deployments. After that, I moved into mobile application development and went live with production and SMS gateways. And then, I moved into banking/mobile telco data and implementation of data engineering projects starting in 2013.
The projects I managed were polyglot persistence systems. Implementing a data persistence system would require storing the data of a business entity or a bank in multiple versions, schemas, and engines with different representations. It was quite a big project, and we processed 1.5 billion records of data into these subsystems. So then, I moved into the same concept but with a different financial system, which is not a bank, but a mobile telco.
I currently work in a local bank in Jordan, where I manage the whole data department, data engineering, data warehouse, data management systems, and business intelligence reporting.
Illia: You have pretty extensive experience. Since your journey is about data warehouses, can you tell me what the importance of having a data warehouse in a bank is?
Khaled: Usually, a bank holds customer information as non-financial and their financial assets. The magical thing about economic systems — they have behaviors on the data or the assets the customers hold: deposits, loans, and salary accounts. So, the bank has to know where they stand. Are they making money or not? As a bank, I need to see if I am moving correctly by time or not. Did I perform well last month compared to this one? To get that knowledge, you need to have one source of truth where you persist in this target. So, this is why financial institutes need data warehouses. Where’s the big value in that? Well, how to implement such a thing?
Illia: That is my next question. Can you name five phases or milestones of building a data warehouse for the bank?
Khaled: I’ll tell you five milestones from my experience. The first milestone is to understand the source systems. Where would you pull the data you need to build this data warehouse? What am I pulling data from? Am I pulling it from a banking system? I’m pulling it from a CRM system. Is it Salesforce? Is it JIRA? And the problem is not only identifying it. The big question is, “Can I trust the data in that system?”
Who adds the data? Who inserts the data in those systems? Is it inserted there correctly? Is it auditable or not? So that’s the first milestone, which is the most important one.
The second milestone is to find the golden source. For example, we don’t have only one system that runs the whole bank. So, the second milestone is to finish and correctly select which fields or data points you want to use from these source systems.
The third one is to understand the speed of change. Are they quickly changing? Why should I reach that milestone and accomplish it? Because it’s very important for capacity and planning for the architecture of the data warehouse. I mean, how quickly the balance of an account changes. Is it changing? Daily, monthly, and hourly. And the KYC, for example, the customer’s phone number. How fast is it to be changed? Do customers change their mobile numbers every year or once per month?
Based on that, you will design the data models to be used by a business team. So, for example, a risk team would like to have a data model that they will use for their business reporting that would need to have A, B, C, and D data points. And they wanted to be captured daily or monthly.
Achieving that design is part of milestone number four. Milestone five is to lock down the implementation plan and the mechanics of how you would implement this data warehouse. Is it a waterfall? Is it agile? It’s a management milestone where you identify and confirm the implementation plan.
Illia: It’s like a manual for someone who wants to create. Well, I know different types of architecture can be applied to building data warehouses. How to decide on the right architecture?
Khaled: You’re asking me how many tiers I should have in implementing the data warehouse or how much to hold data from source systems and the data pipeline, design, or architecture. Well, for banks in specific, that’s my own opinion.
Probably, others will disagree. It’s best to use either a two-tier or three-tier architecture. Data is mostly transactional, right? And you need a staging area to populate and load the data daily. Cause we are a bank, right? And then you build on that staging area data, whatever you pull daily, into a transformation process where you crunch that data from the staging area and append it into your data warehouse. Tax and dimensions, entities, incrementally, daily.
You have to do that unless you would like to cut the whole data lineage and simply do it in one tier and just push the data into the data models without a multi-step approach. But, then, a multi-step approach or two-tier or three-tier architecture is also needed to reconcile your data while transforming the data you need to over and over automatically, not manually daily. So, you need multiple tiers to do that—the reconciliation and verification.
Illia: I think everyone has their own opinion, but yours, I believe, is a perfect approach to doing things. Can you name, please, the three most significant challenges of building a data warehouse?
Khaled: My biggest challenge was understanding resource systems. It was like a tough journey. Knowing the source systems and how data is added and deleted will cover maybe more than 70% of your challenges. The most important thing about systems is why this system is here. Why do I have a core banking system that manages deposits, for example, this way?
Someone, a vendor, or an implementation team did that workflow for a reason, and you can’t quickly get that reason simply out of the mouth of someone, right? That’s the hardest challenge. The second challenge was data loading. I mean, reconciliation. The automation of data loading. You don’t want to load the data manually, right? And you don’t want to wake up at 2:00 AM or maybe at 8:00 AM and discover that something broke while you are asleep. Right? So the challenge is to fix the data loading automated process on the fly without me waking up.
That’s business checking. What about disk spaces and constrained violations in primary foreign keys? What if someone, during the day, added a field alpha, while he should have 1, 2, 3 — numeric values and things broke up? This is the challenge of all data engineers now and then.
The third challenge is to find the best stack to use. So, you have to pick the proper resources, hardware, software, and licenses to meet that KPI. I mean, the stack plus the resources who will work on the stack as a combo.
Illia: What are the pros and cons of deploying a data warehouse in the cloud versus on-premises?
Khaled: You can expand horizontally and vertically as flexibly as you want. Don’t worry about stretching your resources on the cloud. That’s the only pro; everything else is a con in the cloud. I mean, cost, monitoring, although things work well on the cloud, you are transparent for any downtimes. Well, on-prem has lots of cons, like elasticity. It’s not easy to stretch your resources on the premise. Your data center has its capacity, and you are sharing this data center with all the banks.
That’s not okay. It requires the data center to procure more virtualization, which introduces lots of management like DevOps or InfoSec.
So, that’s a problem. But the pros are that you hold the data in your house, and nobody can attack it. And you store your data in your building, not on the cloud. So, although the cloud has its own security rules and governance, it’s becoming much more rigid.
Illia: Khaled, I would like to speak about Oracle. So many banks build DWHs, using the Oracle database management system. Can you recommend an open-source database management system that can be used for building a DWH for a bank?
Khaled: My recommendation would not be too scientific because some people would disagree with that. I find Postgre an exemplary database for data warehouse implementation. It has its problems, but, in general, Postgres is good.
Illia: What project management methodology is more suitable for such a project? You already mentioned agile and waterfall, but what is your personal opinion?
Khaled: I pushed to use the Agile practice. So, I took the project as an experiment mode with the team. Let’s see. How can we build our first small newbie data warehouse? Instant, like it’s virgin, right? 0.1 version. It involved lots of exploration, data reporting, and reporting on status; what did you find in the core ranking? What did you see in the KYC update database? Did you find a field that talks about mobile numbers or not? Where did you find the balances of the loans? Lots of questions with no answers. So, that’s why we chose the Agile approach.
After six months of the iterations, we kept using an agile approach. Still, we are now mechanically using the exact delivery velocity and know what to do every sprint. It is somehow like a waterfall approach, but it’s a concise waterfall approach.
It’s a two-week waterfall approach. So, I went agile, but you can also go waterfall if your customer would be patient enough to wait for you to deliver a version of your data warehouse after one year.
Illia: Worldwide regulations often change, meaning some changes should be made in core banking systems. How often do you make changes in your data marts and adapt those extract, transform, and load mechanisms to be compliant with the changes in the source systems?
Khaled: Our core banking system depends on configuration. It’s not implementation-based. When we introduce a new loan product, it’s a configuration matter in the core banking system. When a new product is introduced, it’s dynamically populated in our automated daily data load, which is one of the challenges you asked me about.
We wake up on the second day and see the new service or product populated in the data warehouse. So we need a new data model for that specific new product.
We introduce new services and products every month or six weeks. So we either update or change an existing data model in the data warehouse or present a new data model for that particular service or product. So, for example, we introduced two data models in November and December.
Illia: How much time does it take to implement the first version of the data warehouse for the bank?
Khaled: It’s a six-month period. We had version 0.1. It was a very big data warehouse, yet the framework was there. The data loading was there.
Illia: Would you recommend any solution for visualizing the data?
Khaled: As a technologist, I would answer a sincere question. Don’t use offline tools, such as Excel or Power BI. Desktop version because you can’t track changes offline.
If you use a version control tool to visualize things like part charts and dashboards, you will probably build your aggregation functions. You have to track your changes. There are lots of excellent visualization tools now. All tools work fine.
Illia: What are the top three recommendations for someone who would like to build a DWH but has never done it before?
Khaled: I don’t want to say a cheesy statement. Everybody says it. Start small. Think big.
You have to start small and have a bigger scope in your mind, but you have to start with small data. Data is an important term. There’s lots of data, so you can’t absorb it in one day, right? Instead, you must zoom in on one small, specific low-hanging fruit success.
You must keep looking ahead and building on that small initiative, one step at a time. You can do that by running multiple iterations quickly, like a sprint version, I mean iterations, and an agile approach where you can see your progress very fast every two weeks.
The second recommendation – you have to go with a minor footprint software, run your data loading, and use simple tools like Jenkins. For example, don’t go one shot into an enterprise edition of Oracle.
Illia: You liked managing this project for five years. Can you name the biggest lessons learned from your experience?
Khaled: The biggest lesson I learned is understanding the deliverable’s real value. I’ll give you an example. We need to build a data model for salaries. We need to develop a model that captures salary transfers for our customers. The lesson I learned is the true value of that model. I mean the profit and not necessarily the money profit. As a bank, we will get it if this model pops up today or tomorrow and people use it.
I need to understand the true value of the deliverable before implementing it. Why? Because I don’t waste my time. Your time, experience, money, and money as a bank. Just because we need something excellent.
That’s the biggest lesson I learned during these five years, and still learning it because people, once they find out that we have rich data models, they start to be hungry for reports and analysis. So they started asking too much about trends, analysis, and aggregations.
Illia: Can an external service provider like DICEUS build a data warehouse for a bank?
Khaled: You can select DICEUS as an external vendor, but you have to ensure the journey for both sides is in sync. You need to make them work as you want them to work in a scientifically correct way.
Many success stories have been delivered with external vendors. Therefore, I recommend partnering with external vendors to build the data warehouse.
Illia: Can you share with our audience what master data is?
Khaled: Master data is critical data in the bank. The most important data is the person’s data. I mean the information of the customer’s name, date of birth, gender, etc. Their accounts and balances. Another critical data is the complexity of the values, for example, the known balance.
You have to know the master data in your bank. You have to know where they are stored. I mean, who stores them or manages them in the bank? This is master data.
Illia: Where should master data, in your opinion, be stored, like in a data warehouse or core banking platform?
Khaled: It should be in a separate system. Let’s call it MDM, a Master Data Management system. But it should be highly integrable with the other systems. So, for example, with the data warehouse, core banking, business process management system, and CRM. Because a data steward and data committee will manage the master data in that system.
So the catalog, data complexity, and governance are all in a separate system. I don’t want to call it separate because it’s integrable, right? But other methods, like the data warehouse and the core banking system, should call the MDM to know what rules and controls to apply to that metadata.
Illia: Speaking about the fast-growing demand for the next-level customer experience in banks, what’s your opinion on digital banks like Revolut and similar?
Khaled: Now, as we look at it, digitally transformed or new banks are ruling the world now. And lending applications. So the world will converge into digital banks. I am not saying that you will not go to the branch someday. But most of your transactions will happen online or through apps.
Much work has already been done to trust digitally transformed banks’ approaches. Even existing banks are going online. Transforming to digital benefits a lot because it’s a win-win for everybody if you cut costs in non-digital.
If you cut the cost of the non-digital processes, you can use that money to lend to people at lower interest rates because you lower expenses when you go digitally.
Illia: How do you see the online banking sector’s future, and what differentiations could help banks gain a competitive edge?
Khaled: It depends on one factor – that would give banks a competitive edge against other localization. Let’s say, for example, understanding the local market. Because a digital bank with 10 services in country A might not work exactly with these 10 services in country B, probably you’ll need only three services in country B.
So, if you can understand the demographics and the economy of that target country, that would differentiate you as a bank, especially if it’s a big bank.
Imagine that there’s a banking application that automatically shows features based on your IP address. The localization of that service is based on regulations, compliance, AML, and language.
The US business differs from Europe, the Middle East, Africa, and Southeast Asia, right? So, localization might be a differentiator.
Illia: Thank you very much for all your answers. It was a pleasure to see you. I’m sure that our audience will enjoy our discussion.