APPLIED AI · PROGRAM MANAGEMENT · COMMUNICATIONS
How to Build an AI That Admits It Doesn't Know
An AI assistant designed around retrieval, not generation, cleared the guidance bottleneck for engineers and gave security experts their attention back.
Inside Amazon Security, engineers were opening tickets they didn't need to open. The answers usually already existed, buried across PDFs, wikis, and old archives, while the launch clock kept running. More than a thousand of those questions came in a day. The first attempts to fix it with AI made things worse: a chatbot that invents plausible security guidance is more dangerous than no chatbot at all.
Background
Software development engineers clearing Amazon's pre-launch security bar regularly needed guidance on policy, implementation requirements, and review expectations. When they couldn't find an answer, they filed a ticket, which was correct behavior but turned senior security experts into help-desk support. Roughly 60% of the daily volume was repetitive questions that should never have reached a human.
The details
- All security documentation was consolidated into a single S3 bucket, the exclusive data source for a retrieval-augmented generation pipeline.
- Model temperature was moved from 0.7 to 0.2, forcing near-deterministic, factual responses instead of creative ones.
- A prompt template mandated exact source citations and returned a null response when the answer was not in the verified documentation.
- Those constraints drove accuracy from 49% to 86%, and ticket volume fell 60% after launch.
- The portal now averages 1,000 daily visits across the AWS builder community.
- The comms function drove adoption with leadership briefing decks, memos, and PRFAQs, plus runbooks, live and recorded training, self-directed learning, and an internal newsletter, reaching more than 4,000 people.
Why it matters
Most enterprise AI fails not because the model is weak but because nobody decided what the tool should do when it doesn't know. This one worked because someone made that call. A security environment cannot tolerate a confident wrong answer, and engineering the assistant to cite its sources or say nothing made it credible where a standard chatbot would have been a liability. The accuracy jump from 49% to 86% is the whole story. Reliability, not capability, is what gets an AI tool trusted inside a high-stakes workflow.