ETL process development and data modeling for REsurity

Project overview

Within the cooperation with DICEUS, REsurety wants to update an existing system for downloading constantly updating various energy data sets. The collaboration with REsurity started in June 2021. Since then, we’ve already successfully finished a pilot project and several more significant projects related to a data warehouse. Our Python data engineer demonstrated his expertise by joining the project to create a new functionality module for REsurity. Now, he is working on data modeling and ETL process development for data analytics purposes.

Client REsurity

Team 2 members

Country United States of America

Duration Ongoing since June 2021

Client information

REsurety is the leading analytics company supporting the clean energy economy. Operating at the intersection of weather, power markets, and financial modeling, they empower the industry’s key decision‑makers with best-in-class value, risk intelligence, and tools to act on it.

Business challenge

The client’s fundamental need is to extract big data sets from various data sources. The data should be structured and prepared for data science analysts for further processing: risk management, and project portfolio to work effectively with customer-related projects. Big data is updated quickly (some can be changed every minute). For terabytes of data to be outputted to the required destinations efficiently and in the format needed by the data scientists’ team, data pipelines for each new energy source should be created in a respective convenient format. For example, a winter energy company is considered a single data source that needs data in a required format.

Technical challenges

The development of the ETL process is a challenging task. The key technical challenges our team faced were unstable data sources, many variations of how these data sets should be loaded, continuous data flow, and the speed and frequency of data updates. In addition, sometimes data analysts create different requirements and requests, which can slow the process.

Solution delivered

Based on general classes and methods, our developer developed and continues developing the ETL processes from scratch for each database and data set. The project’s end goal is to deliver the software product running on AWS to gather from a given data source and output the Snowflake tables.

Let’s discuss how we can help with your project

Send request

Key features

Structured and consistent data

With effective ETL pipelines, data scientists have access to structured data in the convenient format required by the customer-related project.

Data formats suitable for data science

According to the data analysts’ requirements, data is extracted, processed, and uploaded to the respective source in a convenient format.

Continuous data flow

Our team ensured the appropriate management of continuous data flow to timely deliver the needed data to the Snowflake tables.

Value to our client

Reduced expenses on AWS servicesIneffective ETL pipelines run on AWS can cost lots of money to the client. Thus, data should be effectively transformed into the Snowflake tables in a structured format.

Reduced time for data uploadingData modeling should be done for data sets that can range up to 10 000 000 items and more to effectively extract it, with no time delays when data scientists load the data.

Improved data quality and accuracyETL pipelines should be stable and reliable. It allows data analysts to get timely, high-quality data in the required and correct format.

Increased trust and industry developmentDue to the robust SaaS solutions that REsurity provides to its customers, the renewable energy companies, the energy industry can develop more efficiently.

Our tech stack

Python

SQL

Snowflake

AWS

Bitbucket Pipelines