Our Approach to Trustworthy AI: Using Frameworks to Manage Risk and Accelerate Development

Conner Brew

Senior Manager, Data & AI Delivery

Last updated:

Published:

May 8, 2025

Artificial intelligence has generated new risks for consumers, businesses, and society itself. So far, MIT has categorized more than 1,600 risks posed by AI, a number that will only increase as AI becomes more powerful and autonomous. The risks of untrustworthy AI applications include:

Reputational harm
Financial damages
Real physical harm (e.g., an autonomous vehicle failing)

These risks put the responsibility on organizations to design, build, and launch trustworthy AI applications through developing standards of care. At WillowTree, we believe that identifying and addressing AI risks begins before we write a line of code.

But this stance isn’t just about risk, it’s also about opportunity. Building trustworthy AI from the ground up opens doors to innovation that would otherwise remain closed due to concerns about risk. Our clients move forward with confidence, knowing their AI solutions have been developed with safety, security, explainability, and privacy.

The Cost of Quality: Why We Identify and Address Issues Before We Build

In software development, “cost of quality” refers to the cost a team might willingly take on in order to catch and mitigate issues as early in development as possible. Because the later an issue is detected, the more it costs to fix — and the nature of AI means damages could spiral exponentially compared to traditional software.

The IBM System Science Institute found that the cost of repairing issues during maintenance in production may cost up to 100x as much as preventing the issue during the design phase, as detailed in the graphic below.

Infographic showing IBM System Science Institute's findings on the cost to fix software defects in the Design, Implementation, Testing, and Maintenance phases — Image source: ResearchGate

Now consider two other findings from IBM on the cost of data breaches:

The average cost of a data breach is $4.88 million and growing each year.
Companies who use AI security and automation extensively in prevention save an average of $2.22 million versus those who don’t.

To reduce costs then, we want to prevent as many issues as possible during the product design phase. As in, before we start building anything. But that has a tradeoff. We might spin our wheels endlessly, looking for hidden errors and trying to predict hypothetical issues before we even get started work.

This problem only gets more difficult the more AI risks organizations like MIT find. To ensure trustworthy AI then, we need a solution that can keep pace with the growing number of risks.

Risk Management Frameworks: How We Accelerate the Development of Trustworthy AI Solutions

To stay fast while managing risk and preventing as many issues as possible, we leverage resources like the NIST AI Risk Management Framework (RMF) developed by the National Institute of Standards and Technology, an agency of the United States Department of Commerce.

NIST’s playbook powers us to rapidly identify the steps and safeguards we need at every stage of the AI development lifecycle, reducing the time it takes to define and document requirements.

From there, we conduct an exercise as part of our Agentic AI Use Case Workshops wherein we review solutions architecture against NIST’s trustworthiness principles, as shown in the following graphic.

7 principles of trustworthy AI defined by the National Institute of Standards and Technology (NIST) — Image source: National Institute of Standards and Technology (NIST)

We then map each trustworthiness principle to the elements in our architecture that facilitate them. For example, an AI-powered transcription tool that records medical notes must adhere to the principle of reliability. Because of that, we’d need to design extra safeguards against problems like hallucination. We’d plan for extra effort and development cost to capture LLM outputs and check them for hallucination likelihood before sending the output to the user.

This approach bakes transparency into our AI solutions from the start through monitoring and observability. Moreover, it assures fairness and reliability through robust evaluation.

With this approach, we can identify gaps in our product design before we’ve written a single line of code. That’s how we prevent issues before they happen rather than fixing them later. With emerging techniques like AI-First software development making it faster than ever to go from design to production, our trustworthiness-first approach becomes even more impactful as we ensure that solutions deliver maximum value with maximum safety from the very beginning.

A Real-World Example of Building Trustworthy AI: Improving Patient Experience After Emergency Care Discharge

We recently designed an AI system to support patients after their discharge from emergency care. To help patients transition without feeling overwhelmed from no longer having access to a physician or medical assistant, our AI system could:

Understand users’ care questions in natural language
Provide reputable knowledge resources
Record clinician appointments

By reviewing our solutions architecture against NIST’s AI trustworthiness principles, we rapidly identified the design and implementation requirements we needed to plan for:

Safe: What if the AI system gave poor or incorrect information? We needed to build in mechanisms to prevent AI hallucinations and add a supervisor guardrail to stop the system from giving improper medical advice.
Secure & Resilient: Healthcare demands high standards for securing patient medical data. To be maximally secure, our new system had to integrate with an existing data platform and leverage its security measures. It also needed to be resilient, meaning we’d need to implement well-architected principles to ensure high availability.
Explainable & Interpretable: Enabling an LLM to explain its rationale helps us improve explainability, but this doesn’t happen automatically. Using techniques like LLM-as-a-Judge helps us measure the quality of LLM responses and provides visibility into the reasoning behind some outputs.
Privacy-Enhanced: Last, we had to consider how users’ health data would be protected. In this case, it was clear our AI system must not share any data outside of the user and their healthcare provider. We also needed to ensure that if the user entered other private information into the conversational interface (e.g., a social security number or other PII), that data could be detected and anonymized without being unnecessarily preserved.

After evaluating the planned solution in light of each trustworthiness principle, we identified specific techniques and architectural components that our project plan needed. We identified these requirements in a one-hour workshop, as opposed to painfully uncovering them over the life of the project.

Build Trustworthy AI Solutions With Confidence

The importance of developing trustworthy AI solutions cannot be overstated, nor can the importance of proactive risk management and thoughtful design to create safe, reliable, and ethical AI products. By using tools like risk management frameworks, we identify and address AI risks early, saving time and cost while ensuring our clients’ AI solutions are built on a foundation of trust and reliability.

This blog looked at trustworthy AI at the system level (i.e., the requirements to build a trustworthy AI solution for a specific use case). However, trustworthy AI is ultimately an organizational capability, one where overarching governance frameworks guide the development of all AI solutions.

Whether you need help developing trustworthy AI at the system level or organizational level, we can help. Learn more by exploring our AI Strategy & Governance services.

Table of Contents

Conner Brew

Senior Manager, Data & AI Delivery