ETL process development and data modeling for REsurity

Project overview
Within the cooperation with DICEUS, REsurety wants to update an existing system for downloading constantly updating various energy data sets. The collaboration with REsurity started in June 2021. Since then, we’ve already successfully finished a pilot project and several more significant projects related to a data warehouse. Our Python data engineer demonstrated his expertise by joining the project to create a new functionality module for REsurity. Now, he is working on data modeling and ETL process development for data analytics purposes.
Client information
REsurety is the leading analytics company supporting the clean energy economy. Operating at the intersection of weather, power markets, and financial modeling, they empower the industry’s key decision‑makers with best-in-class value, risk intelligence, and tools to act on it.
Business challenge
The client’s fundamental need is to extract big data sets from various data sources. The data should be structured and prepared for data science analysts for further processing: risk management, and project portfolio to work effectively with customer-related projects. Big data is updated quickly (some can be changed every minute). For terabytes of data to be outputted to the required destinations efficiently and in the format needed by the data scientists’ team, data pipelines for each new energy source should be created in a respective convenient format. For example, a winter energy company is considered a single data source that needs data in a required format.
Technical challenges
The development of the ETL process is a challenging task. The key technical challenges our team faced were unstable data sources, many variations of how these data sets should be loaded, continuous data flow, and the speed and frequency of data updates. In addition, sometimes data analysts create different requirements and requests, which can slow the process.
Solution delivered
Based on general classes and methods, our developer developed and continues developing the ETL processes from scratch for each database and data set. The project’s end goal is to deliver the software product running on AWS to gather from a given data source and output the Snowflake tables.

Let’s discuss how we can help with your project
Key features
Structured and consistent data
With effective ETL pipelines, data scientists have access to structured data in the convenient format required by the customer-related project.
Data formats suitable for data science
According to the data analysts’ requirements, data is extracted, processed, and uploaded to the respective source in a convenient format.
Continuous data flow
Our team ensured the appropriate management of continuous data flow to timely deliver the needed data to the Snowflake tables.

SQL
AWS