UCReview AI
AI pipeline for auditing university course sheets
Overview
During my Research Initiation Grant at FEUP, I built UCReview AI — a pipeline that automatically scrapes university course sheets, parses their structure, and uses LLMs to audit them for completeness and quality. The goal: make course information more transparent and comparable for students.
Stack
Preview
Add screenshot 1
Drop an image here
Add screenshot 2
Drop an image here
What I learned
- Designed a robust scraping pipeline handling malformed and inconsistent HTML
- Built a provider-agnostic AI factory (OpenAI, Anthropic, local models) for flexible LLM calls
- Engineered prompts for structured data extraction from messy academic documents
- Learned when AI adds genuine value vs. when regex or heuristics are just better
- Wrote a research report documenting methodology, results, and limitations
Build log
Struggles, findings, decisions, breakthroughs — the honest story.
Course sheets have zero consistency
Every faculty formats their course sheets differently. Some are PDFs, some HTML, some Word exports converted to web. The parser had to be fault-tolerant by design.
Multi-provider factory pattern
Instead of hardcoding OpenAI, I built a factory that could swap providers. This saved the project when one provider had downtime during a critical testing phase.
LLMs hallucinate on academic jargon
Early runs had the model confidently misinterpreting Portuguese academic terminology. Had to add a validation layer and constrain outputs to defined categories.
Structured output solved everything
Switching to JSON-mode / structured outputs dramatically improved reliability. Constraining the model's response format cut hallucination rate by ~80%.
Rate limits at scale
FEUP has hundreds of course sheets. Hitting API rate limits mid-run was painful. Built an async queue with backoff and checkpointing so runs could resume without starting over.