INSIGHT:

Using AWS Bedrock Agents to control 3rd party APIs

CREATED ON
February 5, 2025

There’s many articles out there talking about Retrieval Augmented Generation (RAG) and the use of tools / functions / APIs to enrich responses with contextual information.

AWS Bedrock (and other AI managed services) these days have built in capabilities for tool use and knowledge bases to help with this. But instead of using it for augmenting information returned to users, I wanted to know:

  • Can you have an agent autonomously chain or make multiple requests from an API to fulfil a user’s request?
  • Instead of using tools or functions to provide a contextual response to a user, can I get it to perform operations using just an API spec?

And whilst I was having this inner monologue with myself, I was staring at my smart home heating app and I thought… why not get it to control my central heating?I have a Tado smart heating system - a number of smart thermostatic valves attached to my radiators with accompanying thermostats to control my boiler. This allows me to independently control the temperature in all of my rooms through an app, to efficiently heat my home (and only when I’m using specific rooms). You can find out more about it here - https://www.tado.com/ So I went about building an AI agent capable of taking the instruction manual of a 3rd party API and a user request, and figuring out what calls to make to fulfill the request.This is what I ended up with - it’s a fully working chatbot client shamelessly ripped from Vercel’s AI Chatbot Starter Template, along with Terraform to deploy a Bedrock Agent:

https://github.com/foyst/tado-bedrock-agent-app

You can see more about Vercel’s starter template here:https://chat.vercel.ai/

https://github.com/vercel/ai-chatbot

What you need

To build an agent similar to this, you need:

  • An AWS Bedrock Agent
  • A suitable foundation model (I went with Anthropic Claude Sonnet 3.5, for no reason other than it’s the best rated for general tool use and overall accuracy - YMMV)
  • An OpenAPI spec for the 3rd party system you wish to integrate with (or function definitions if you’re connecting to AWS Lambdas)

The AI agent is capable of taking an OpenAPI spec and a user request, and deciding what calls it needs to make and in what order. I can make single simple requests or more complex chained calls. It’s not perfect but effectively it can figure out what middleware is needed which can be dynamic and flex to whatever request is made.

The Architecture

AWS Bedrock is capable of directly calling third-party APIs, given either an OpenAPI spec or function definition (if you’d prefer to call lambda functions directly within AWS). The challenge with Tado is it requires authentication, so it’s not something an AI agent can do on its own.Instead of making API calls directly, AWS Bedrock can also return control to the calling system by providing details of the API call it wants to make. So I used this to build a thin chat client that passes user requests to Bedrock, makes authenticated calls to Tado using API details the agent suggested, passes the Tado response back to the agent and finally creates a response to the user.

The front-end client itself does nothing more than take the url, method and json payload the agent generates, and forwards that onto Tado with an authorisation header. I used a third party nodejs library (https://github.com/mattdavis90/node-tado-client) to handle this. 

So why use a service like AWS Bedrock? I was building this stuff from scratch in 2023 - creating AI PoCs using GPT-3 at the time that could take user requests, and combined with vector databases and embeddings provide contextual and accurate responses. Bedrock does a lot of this heavy lifting now, providing integrations into tooling such as S3 for storing knowledgebases and OpenAPI for integrating other tools.

Importantly too it utilises techniques like Chain of Thought (CoT) and provides tracing information to open up the “black box” that is AI a little, making it easier to understand how these agents come to the responses they generate.

For example, when testing the agent within the Bedrock UI you can provide canned API responses for the request it makes, and then view the trace steps it uses to generate its response. On the right in the screenshot below you can see the agent's “thought process” within the <thinking></thinking> tags.

The Code

There’s a few key segments of the code in my repository that handles the processing of the Agent and Tado interactions, which I’ll discuss here.

The first step is to create an instance of the Bedrock Agent client:

const bedrockClient = new BedrockAgentRuntimeClient({
  region: "us-east-1",
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
    sessionToken: process.env.AWS_SESSION_TOKEN!,
  },
});

N.B. always try to avoid passing in credentials like this if your environment allows other methods of sourcing an IAM role. I do this purely for demo purposes.

Next we craft an agent request, which takes the initial user prompt and the agentId of the deployed Bedrock Agent. Sending this to the bedrockClient triggers results in the agent’s first API call that it wants to make:

const agentInput: InvokeAgentRequest = {
    agentId: process.env.AGENT_ID!,
    agentAliasId: "TSTALIASID",
    sessionId: chatId,
    inputText: input,
  };

const command = new InvokeAgentCommand(agentInput);
const response = await bedrockClient.send(command);

Next, I pass this response into a method called processResponseCompletion that does the following:

  1. Checks if an API request has been suggested by the agent
  2. Parses this agent request into an API request and passes it to Tado
  3. Parses the response from Tado and passes it back to the agent
  4. Recursively calls processResponseCompletion repeatedly passing the next API request until the agent has made all the requests it needs to fulfil the initial user request
const completion = await processResponseCompletion(
    agentInput,
    response.completion
  );

async function processResponseCompletion(
  agentInput: InvokeAgentRequest,
  chunks: AsyncIterable<ResponseStream>
): Promise<string | undefined> {
	...

To process an agent response, we enumerate all the chunkEvents within the chunk:

for await (const chunkEvent of chunks) {

Then we check if that chunk contains details about an API request the agent wants to make (which is returned in the form of the returnControl metadata):

if (chunkEvent.returnControl !== undefined) {
      const apiInvocationInput =
		chunkEvent.returnControl.invocationInputs![0].apiInvocationInput!;

If an API call is suggested by the agent, we parse the details into a request and pass it to Tado:

const url = apiInvocationInput.apiPath!;
const apiParameters = apiInvocationInput.parameters!;
response = await tadoClient.apiCall(parameterisedUrl, method, data);

Then we take the response from the Tado API and craft another InvokeAgentRequest, passing in the original agentInput combined with the response from Tado:

const agentInputWithTadoResponse: InvokeAgentRequest = {
        ...agentInput,
        sessionState: {
          returnControlInvocationResults: [
            // ReturnControlInvocationResults
            {
              // InvocationResultMember Union: only one key present
              apiResult: {
                // ApiResult
                actionGroup: apiInvocationInput.actionGroup, // required
                agentId: agentInput.agentId,
                apiPath: apiInvocationInput.apiPath,
                confirmationState: "CONFIRM",
                httpMethod: apiInvocationInput.httpMethod,
                httpStatusCode: 200,
                responseBody: {
                  TEXT: {
                    body: JSON.stringify(response),
                  },
                },
              },
            },
          ],
          invocationId: chunkEvent.returnControl.invocationId,
        },
      };

      const command = new InvokeAgentCommand(agentInputWithTadoResponse);

This updated agent request is then passed back to Bedrock:

const updatedResponse = await bedrockClient.send(command);

And finally, to handle the agent wanting to make subsequent requests the loop calls processResponseCompletion recursively until the agent has exhausted all the calls it needs to fulfil the request

const finalResponse = await processResponseCompletion(
        agentInputWithTadoResponse,
        updatedResponse.completion!
      );

References

https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/bedrock/
https://docs.aws.amazon.com/bedrock/latest/userguide/agents-returncontrol.html

https://docs.aws.amazon.com/bedrock/latest/userguide/agents-session-state.html#session-state-return-control

https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-agent-runtime_example_bedrock-agent-runtime_InvokeAgent_section.html

The Results

With the agent deployed and the chatbot web client running, you can do some pretty powerful stuff. In the above screenshot I sent Bedrock a relatively complex request involving querying the Tado API and then presenting the results in a tabular format. A projection of this data in this format isn’t something Tado can do out of the box.The real neat’o bit about Bedrock is that it can make as many API calls as necessary within a single request to fulfil the user’s prompt.

To fulfil the above request, Bedrock made this API call to Tado:
/api/v2/homes/<my_home_id>/zones GET

After learning the corresponding zoneIds for the rooms I requested, it then knew to make the following calls to get their respective temperatures:
api/v2/homes/<my_home_id>/zones/1/state GET
api/v2/homes/<my_home_id>/zones/3/state GET
api/v2/homes/<my_home_id>/zones/6/state GET

Remember, at this point the only code I’ve written is to parse the api invocation request the agent suggested, pass it to Tado, then pass the result back to the agent. From the OpenAPI spec it figured out what endpoints to call.

Let’s make things a little more interesting. Up to this point all of the api calls Bedrock has made are GET requests. In this next prompt, it will need to submit a new desired target temperature to my Tado system:

This time, the agent made these requests:
/api/v2/homes/<my_home_id>/zones GET
/api/v2/homes/<my_home_id>/zones/1/state GET
/api/v2/homes/<my_home_id>/zones/1/overlay PUT

{
    "termination": { "type": "MANUAL" },
    "type": "MANUAL",
    "setting": { "type": "HEATING", "power": "ON", "temperature": { "celsius": 20.4 } }
}

Using the Tado API spec, it knew to call the state endpoint to get the current room temperature, increased it by 3 degrees and then passed that in as the json payload to the overlay endpoint. To date, LLMs have been notoriously bad at math. However from my experience with the Claude models, it not only got the calculations right but it was also smart enough to increase the temperature based off of the current room temperature (and not the set temperature, which would’ve just resulted in it matching what the room already is).

Sprinkle a bit of J.A.R.V.I.S in there

For anyone familiar with the Iron Man comics and films, Tony Stark has a pretty cool AI assistant that aids him with everything. Typically tasks a little more complex than controlling his home heating, but nonetheless we can add a bit of his style into our interactions.

To do this, you can update the agent instructions within AWS Bedrock. Navigate to your deployed agent within the AWS Bedrock Console and select the “Edit in Agent Builder” button at the top right. Then scroll down to the “Instructions for the Agent box”:

Then, add a little bit of J.A.R.V.I.S prompting in there:

Now, once you’ve saved and prepared the model, you can go back to the chat client and make the request again and hey presto:

You can use the Agent instructions for many things, such as tailoring the way it responds to users, how it handles requests, safeguards you’d like including etc

The Tips

  • AWS Bedrock provides versioning of your agents, which is really useful for when you want to make controlled and audited changes to your model behaviour. This can become a nuisance however when you’re prototyping and want quick turnarounds. Instead of having to release a new version and alias with every change you can use the TSTALIASID alias to prepare and immediately test new model tweaks.
  • As always, YMMV when it comes to finding the right foundation model, so don’t be afraid to experiment but find a way to methodically evaluate each to find the one that’s best for you. I found some models were better at judging what the right API calls were to make.
  • Invest time in tweaking the OpenAPI spec so it’s easy for an LLM (or a human for that matter) to understand the purpose of each endpoint and how it can be used. Replace specialised (or proprietary) terminology with common alternatives so it’s more likely the LLM can understand its purpose. For example with Tado, going further I’d probably reword the overlay endpoint description to refer to “active” or “current” heating overrides.

The Pitfalls

  • Occasionally, the agent (despite being told the Tado API expects json request payloads) would generate a request in XML. Not sure why, but I can only imagine because the Bedrock agent API context is written in XML so sometimes it gets confused. A little bit of prompt engineering coercing it to always generate API calls using json seemed to alleviate that
  • Took a little bit of remembering how NextJS works and catching up with server actions, and getting tripped up with the differences in runtime environment between client and server and forgetting what’s available where. It makes it really easy to blend client and server side code but as someone who’s more used to having completely separate front and back ends it’s just a little unnerving.

The Use Cases

Most of what I’ve built here was mainly to satisfy my own curiosity of what AI Agents can be capable of, but I can envisage some commercial usages for this too.One in particular is for freeform experimentation of user journeys. If you have an arsenal of existing API endpoints that power flows (in your website for example), in opening them up to a chatbot interface like I’ve done here you may discover creative ways and requests in which your customers may actually want to interact with your product that you haven’t spotted before.Perhaps instead of that 20-step arduous process, your users can achieve the same result using a couple of guided prompts from an LLM?I would suggest recording the different journeys users take here, identifying common flows and building these out in your existing product - AI agents are great for showing what’s possible but should always be replaced by a more robust and well tested flow.

Conclusion

I was able to prove that AI agents can be used for more than just enriching user requests, and agents are capable through techniques like Chain of Thought to understand how to leverage an API to fulfil user requests. What I’ve shared here can be used as the foundational pattern for integrating and opening up 3rd party APIs to new and novel user experiences.

LATEST INSIGHTS

Ready to shake things up in your digital world?

Let's chat

Thank you! Your submission has been received - we will be in touch shortly.
Unfortunately, your message has not sent. Please re-try or email hello@doza.consulting.