Automation2023

Automated Report Generation System

Developed a backend-driven analytics and automated report generation platform with structured outputs and real-time data processing.

Year

2023

Category

Automation

Stack

NestJS, Flutter, PostgreSQL

Automated Report Generation System

Key Results

4 min

Per Report

down from 6–8 hrs

Report Volume

no added headcount

180+

Monthly Reports

fully automated

0

Format Errors

since launch

01

The Challenge

A healthcare consultancy was producing 50+ client reports per month entirely by hand — pulling data from three incompatible systems, formatting in Word, and emailing PDFs. Each report took 6–8 hours. The team couldn't take on new clients without hiring, but hiring wasn't in the budget.

02

What We Built

A NestJS backend with BullMQ job queues for async report generation, a PostgreSQL warehouse normalising inputs from all three source systems, and a WeasyPrint PDF renderer driven by HTML templates. A Flutter app lets account managers trigger and track reports from their phones.

03

Results

Report generation time dropped from 6–8 hours to 4 minutes. Monthly report volume tripled without adding headcount. Formatting errors fell to zero. The system now generates over 180 reports per month automatically.

04

Before & After

Before

Analysts manually exported data from three incompatible systems every report cycle

Each report took 6–8 hours of senior analyst time to compile and format

Word document formatting was inconsistent — every report looked slightly different

Reports were emailed manually — no tracking, no audit trail, no self-serve

After

Scheduled ETL jobs pull, normalise, and warehouse data from all three sources automatically

Reports generated in under 4 minutes via an async job queue

HTML templates with WeasyPrint produce pixel-perfect, brand-consistent PDFs every time

Clients receive a secure download link automatically. Account managers track delivery in the Flutter app

05

How We Built It

We started with data before UI — until the ETL pipeline and warehouse schema were solid, building a report generator on unreliable data would have been wasted effort.

1

Data audit & warehouse design

Audited all three source systems, mapped their schemas, and designed a normalised PostgreSQL warehouse. Built ETL jobs in NestJS scheduled tasks that run nightly and on-demand.

2

Report template system

Designed HTML/CSS report templates with the client's brand team. Built the NestJS template engine that injects warehouse data into templates and passes the result to WeasyPrint for PDF rendering.

3

BullMQ job queue & delivery

Wrapped report generation in a BullMQ job so the API returns immediately with a job ID. Jobs run in workers, upload the finished PDF to S3, and notify the client with a signed download URL.

4

Flutter management app

Built the Flutter app for triggering ad-hoc reports, monitoring job status via WebSocket, configuring delivery schedules, and viewing report history with download links.

06

System Architecture

A NestJS API handles requests. BullMQ workers do the heavy lifting asynchronously. WeasyPrint runs as a sidecar process. Generated PDFs go to S3 — the app only ever touches signed URLs.

1

ETL Pipeline

NestJS Scheduled Tasks

Three NestJS services, one per source system, run on a nightly cron. Each pulls incremental data, normalises to the warehouse schema, and upserts to PostgreSQL.

2

Data Warehouse

PostgreSQL

Normalised schema as the single source of truth. Materialized views pre-compute expensive aggregations so report generation runs fast SELECTs, not heavy joins.

3

Report API

NestJS REST

Accepts report trigger requests, validates parameters, enqueues a BullMQ job, and returns a job ID immediately. Exposes job status and history endpoints.

4

Job Queue

BullMQ + Redis

Workers pick up report jobs, query the warehouse, render the HTML template, call WeasyPrint, and upload the PDF to S3. Redis stores job state for real-time status updates.

5

PDF Renderer

WeasyPrint

Runs as a Python sidecar called via child_process from NestJS. Converts fully-rendered HTML/CSS to PDF with pixel-perfect fidelity to designed templates.

6

Flutter App

Flutter + Riverpod + WebSocket

Account managers trigger reports, monitor job progress via WebSocket subscription, and download finished PDFs. Riverpod manages real-time job state.

07

Tech Stack

Backend

NestJSTypeScriptNode.js

Queue

BullMQRedis

Mobile

FlutterDartRiverpod

Database

PostgreSQL

PDF

WeasyPrintPython

Storage

AWS S3

08

How We Approached the Problem

The core insight was that this problem was 90% data plumbing and 10% PDF generation. Most teams would reach for a reporting tool. We went the other way — warehouse the data first, then report generation becomes a simple template render against clean, reliable data.

Alternatives considered & rejected

Off-the-shelf reporting tool (Metabase / Looker)

These tools are excellent for internal dashboards but poor for client-facing branded PDFs. The client needed reports that looked like they came from their design team.

Puppeteer for PDF generation

Puppeteer added a full Chromium dependency. WeasyPrint produces equivalent output from pure HTML/CSS with a much smaller footprint and is easier to run in Docker.

Direct DB connections to source systems

Two source systems were SaaS products with rate-limited APIs. The ETL-to-warehouse approach decoupled us from source system availability and rate limits entirely.

09

Data Modelling

The warehouse schema is deliberately denormalised for read performance. Materialized views pre-aggregate KPIs so the template render is a simple SELECT, not a multi-join aggregation.

10

API Layer

ReportService enqueues BullMQ jobs and returns immediately. The ReportProcessor runs in a worker — queries the warehouse, renders the template, generates the PDF, uploads to S3, and updates the job record.

11

Database Functions

Refresh the materialized view after each ETL run and fetch report history with a single query for the Flutter history screen.

12

Frontend Connection

The Flutter app triggers jobs via REST and tracks real-time progress via WebSocket. A Riverpod notifier subscribes to the WebSocket gateway and rebuilds the UI as job state changes — no manual refresh needed.

13

Lessons Learned

Warehouse first, reporting second

Starting with a clean warehouse meant report generation was a 2-day task once the data was right. The ETL work took 3 weeks — that ratio was exactly correct and paid dividends immediately.

Materialized views are underused

Every expensive aggregation timed out in production against real data volumes. Moving them to materialized views — refreshed after each ETL run — dropped report query time from 45 seconds to under 200ms.

WeasyPrint needs explicit page break rules

Tables split across pages at awkward points. Adding CSS page-break-inside: avoid on rows and page-break-before: always on section headers gave clean pagination without custom logic.

BullMQ progress events need throttling

The processor emitted a progress event on every DB row processed, flooding the WebSocket gateway. Throttling to once per second reduced WebSocket traffic by 95% with no noticeable UX difference.

Start a project

Have something similar in mind?