← Back

Sentia Solutions

I first began working at Sentia as a co-op from January 2023 to August 2023 for a total of 8 months as my final co-op placement of my undergraduate career. Directly afterwards, during the final 8 months of my degree, I continued to work for them as a part-time employee. After graduation, I was welcomed to a permanent full-time position at the company which lasted from June 2024 to December 2024. The following is a major project which I worked on, parts of it beginning as a co-op, and more complex elements being finished during my time as a permanent full-time employee.

Building a Centralized Log Management System: A Deep Dive into Scalability, Automation, and Reliability

At Sentia, I led the design and implementation of a centralized log management system designed to aggregate and process logs from a wide variety of customer devices which generated logs in different formats and frequencies, presenting unique challenges for data ingestion. This project was a culmination of my studies in data engineering and software architecture, including my implementation of the medallion architecture, as well as the use of Azure Databricks, PySpark, and Azure Data Factory, and transformed how we monitored customer infrastructure.


The Problem: Siloed Logs and Delayed Insights

Our customers operated in distributed cloud environments, generating gigabytes of server logs daily from several device types such as switches, servers, storage devices, virtual machines (VMs), and other network appliances. These logs were critical for monitoring system health and detecting security breaches. However, the logs were stored in siloed systems across multiple cloud providers (e.g., AWS, Azure, GCP) and on-premise servers. Retrieving and analyzing this data was a manual, time-consuming process, often taking days. This delay hindered incident response and posed significant risks to our service.

The challenge was clear: How could we consolidate, process, and analyze logs from 50+ diverse environments in near-real time, while ensuring scalability, reliability, and cost efficiency?


The Solution: A Scalable, Automated Log Management System

1. Data Ingestion: Bridging Diverse Environments with Azure Data Factory

The first step was to build a robust data ingestion pipeline that could collect logs from 50+ customer environments, each with its own format, protocol, and storage system. I chose Azure Data Factory (ADF) for this task due to its flexibility and scalability.

2. Data Processing: Transforming Raw Logs with Azure Databricks and PySpark

Once the logs were ingested, the next challenge was to process and transform them into a structured format suitable for analysis. This is where Azure Databricks and PySpark came into play.

3. Storage: Optimizing Costs and Performance with Medallion Architecture

To store the processed logs, I designed a medallion architecture, a multi-layered data storage pattern that balances cost, performance, and accessibility.

4. Real-Time Monitoring and Alerts: Power BI and Azure Logic Apps

To provide customers with real-time insights, I built a monitoring dashboard using Power BI. The dashboard visualized key metrics like server uptime, error rates, and security compliance status.


Challenges and Root Cause Analysis

One of the most challenging aspects of the project was diagnosing and resolving performance bottlenecks in the data processing pipeline. For example, during load testing, we noticed that certain PySpark jobs were taking significantly longer to complete.

The medallion architecture (bronze, silver, gold layers) was implemented to optimize storage costs and query performance. However, this approach introduced several challenges:


Impact and Takeaways

The centralized log management system delivered transformative results:

This project was a testament to the power of data-driven decision-making and continuous optimization. It also reinforced the importance of root cause analysis in building reliable, high-performance systems.