← Back to Projects

LLM Scientific QA

Citation-grounded QA system over scientific literature

Focus: RetrievalScope: End-to-endStack: Python, LangChain, Semantic Scholar APIGitHub →

Why I built this

Scientific QA needs citations and multi-paper synthesis. Most systems produce fluent answers with weak traceability, and I wanted to explore how to enforce citation discipline.

What I learned

Structured synthesis templates enforced citation discipline better than post-hoc verification. Multi-paper retrieval requires careful query expansion and reranking. What didn't work: initial over-reliance on single-paper retrieval.

TL;DR

Built a citation-grounded QA system that retrieves from multiple scientific papers, synthesizes answers with explicit citation anchors, and verifies citations post-generation.

Why this matters

Scientific QA needs citations and multi-paper synthesis. Most systems produce fluent answers with weak traceability.

Problem

Scientific QA needs citations and multi-paper synthesis. Most systems produce fluent answers with weak traceability.

Constraints

Multi-paper scope: answers must synthesize information across multiple research papers. Citation accuracy: every claim must map to specific paper sections. Latency: retrieval and synthesis must complete in reasonable time.

System design

Retrieval pulls from multiple papers and uses reranking to find the most relevant sources. Synthesis generates answers with citation anchors attached to each claim. After generation, there's a verification step that checks citation coverage. The system has contracts: it won't answer unless citation coverage meets a threshold.

Evaluation

I use a scientific query set plus a paper corpus. Metrics include citation coverage, claim support checks, and multi-paper coverage. Failure modes I track: retrieval misses, citation gaps, and unsupported claims. Regression strategy: citation accuracy and claim support checks catch regressions before deployment.

Results

Stronger grounding and traceability through citation-first generation and verification gates.

Trade-offs & Lessons

Multi-paper retrieval increases compute but improves reliability. Structured synthesis templates enforced citation discipline. The initial over-reliance on single-paper retrieval was a mistake. Multi-paper retrieval requires careful query expansion and reranking.

What I'd Improve Next

Add citation quality scoring to prioritize high-impact papers. Implement cross-paper consistency checking. Build domain-specific retrieval strategies for different scientific fields.