🔬

UCReview AI

AI pipeline for auditing university course sheets

AI/LLMResearchPython

Overview

During my Research Initiation Grant at FEUP, I built UCReview AI — a pipeline that automatically scrapes university course sheets, parses their structure, and uses LLMs to audit them for completeness and quality. The goal: make course information more transparent and comparable for students.

Stack

PythonLLM APIsWeb ScrapingData Processing

Preview

🔬

Add screenshot 1

Drop an image here

🔬

Add screenshot 2

Drop an image here

What I learned

Designed a robust scraping pipeline handling malformed and inconsistent HTML
Built a provider-agnostic AI factory (OpenAI, Anthropic, local models) for flexible LLM calls
Engineered prompts for structured data extraction from messy academic documents
Learned when AI adds genuine value vs. when regex or heuristics are just better
Wrote a research report documenting methodology, results, and limitations

Build log

Struggles, findings, decisions, breakthroughs — the honest story.

🔴Challenge

Course sheets have zero consistency

Every faculty formats their course sheets differently. Some are PDFs, some HTML, some Word exports converted to web. The parser had to be fault-tolerant by design.

🔀Decision

Multi-provider factory pattern

Instead of hardcoding OpenAI, I built a factory that could swap providers. This saved the project when one provider had downtime during a critical testing phase.

💡Finding

LLMs hallucinate on academic jargon

Early runs had the model confidently misinterpreting Portuguese academic terminology. Had to add a validation layer and constrain outputs to defined categories.

✨Breakthrough

Structured output solved everything

Switching to JSON-mode / structured outputs dramatically improved reliability. Constraining the model's response format cut hallucination rate by ~80%.

🔴Challenge

Rate limits at scale

FEUP has hundreds of course sheets. Hitting API rate limits mid-run was painful. Built an async queue with backoff and checkpointing so runs could resume without starting over.