INSIGHT:
Using AWS Bedrock Agents to control 3rd party APIs
There’s many articles out there talking about Retrieval Augmented Generation (RAG) and the use of tools / functions / APIs to enrich responses with contextual information.
AWS Bedrock (and other AI managed services) these days have built in capabilities for tool use and knowledge bases to help with this. But instead of using it for augmenting information returned to users, I wanted to know:
And whilst I was having this inner monologue with myself, I was staring at my smart home heating app and I thought… why not get it to control my central heating?I have a Tado smart heating system - a number of smart thermostatic valves attached to my radiators with accompanying thermostats to control my boiler. This allows me to independently control the temperature in all of my rooms through an app, to efficiently heat my home (and only when I’m using specific rooms). You can find out more about it here - https://www.tado.com/ So I went about building an AI agent capable of taking the instruction manual of a 3rd party API and a user request, and figuring out what calls to make to fulfill the request.This is what I ended up with - it’s a fully working chatbot client shamelessly ripped from Vercel’s AI Chatbot Starter Template, along with Terraform to deploy a Bedrock Agent:
https://github.com/foyst/tado-bedrock-agent-app
You can see more about Vercel’s starter template here:https://chat.vercel.ai/
https://github.com/vercel/ai-chatbot
To build an agent similar to this, you need:
The AI agent is capable of taking an OpenAPI spec and a user request, and deciding what calls it needs to make and in what order. I can make single simple requests or more complex chained calls. It’s not perfect but effectively it can figure out what middleware is needed which can be dynamic and flex to whatever request is made.
AWS Bedrock is capable of directly calling third-party APIs, given either an OpenAPI spec or function definition (if you’d prefer to call lambda functions directly within AWS). The challenge with Tado is it requires authentication, so it’s not something an AI agent can do on its own.Instead of making API calls directly, AWS Bedrock can also return control to the calling system by providing details of the API call it wants to make. So I used this to build a thin chat client that passes user requests to Bedrock, makes authenticated calls to Tado using API details the agent suggested, passes the Tado response back to the agent and finally creates a response to the user.
The front-end client itself does nothing more than take the url, method and json payload the agent generates, and forwards that onto Tado with an authorisation header. I used a third party nodejs library (https://github.com/mattdavis90/node-tado-client) to handle this.
So why use a service like AWS Bedrock? I was building this stuff from scratch in 2023 - creating AI PoCs using GPT-3 at the time that could take user requests, and combined with vector databases and embeddings provide contextual and accurate responses. Bedrock does a lot of this heavy lifting now, providing integrations into tooling such as S3 for storing knowledgebases and OpenAPI for integrating other tools.
Importantly too it utilises techniques like Chain of Thought (CoT) and provides tracing information to open up the “black box” that is AI a little, making it easier to understand how these agents come to the responses they generate.
For example, when testing the agent within the Bedrock UI you can provide canned API responses for the request it makes, and then view the trace steps it uses to generate its response. On the right in the screenshot below you can see the agent's “thought process” within the <thinking></thinking> tags.
There’s a few key segments of the code in my repository that handles the processing of the Agent and Tado interactions, which I’ll discuss here.
The first step is to create an instance of the Bedrock Agent client:
const bedrockClient = new BedrockAgentRuntimeClient({
region: "us-east-1",
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
sessionToken: process.env.AWS_SESSION_TOKEN!,
},
});
N.B. always try to avoid passing in credentials like this if your environment allows other methods of sourcing an IAM role. I do this purely for demo purposes.
Next we craft an agent request, which takes the initial user prompt and the agentId of the deployed Bedrock Agent. Sending this to the bedrockClient triggers results in the agent’s first API call that it wants to make:
const agentInput: InvokeAgentRequest = {
agentId: process.env.AGENT_ID!,
agentAliasId: "TSTALIASID",
sessionId: chatId,
inputText: input,
};
const command = new InvokeAgentCommand(agentInput);
const response = await bedrockClient.send(command);
Next, I pass this response into a method called processResponseCompletion that does the following:
const completion = await processResponseCompletion(
agentInput,
response.completion
);
async function processResponseCompletion(
agentInput: InvokeAgentRequest,
chunks: AsyncIterable<ResponseStream>
): Promise<string | undefined> {
...
To process an agent response, we enumerate all the chunkEvents within the chunk:
for await (const chunkEvent of chunks) {
Then we check if that chunk contains details about an API request the agent wants to make (which is returned in the form of the returnControl metadata):
if (chunkEvent.returnControl !== undefined) {
const apiInvocationInput =
chunkEvent.returnControl.invocationInputs![0].apiInvocationInput!;
If an API call is suggested by the agent, we parse the details into a request and pass it to Tado:
const url = apiInvocationInput.apiPath!;
const apiParameters = apiInvocationInput.parameters!;
…
response = await tadoClient.apiCall(parameterisedUrl, method, data);
Then we take the response from the Tado API and craft another InvokeAgentRequest, passing in the original agentInput combined with the response from Tado:
const agentInputWithTadoResponse: InvokeAgentRequest = {
...agentInput,
sessionState: {
returnControlInvocationResults: [
// ReturnControlInvocationResults
{
// InvocationResultMember Union: only one key present
apiResult: {
// ApiResult
actionGroup: apiInvocationInput.actionGroup, // required
agentId: agentInput.agentId,
apiPath: apiInvocationInput.apiPath,
confirmationState: "CONFIRM",
httpMethod: apiInvocationInput.httpMethod,
httpStatusCode: 200,
responseBody: {
TEXT: {
body: JSON.stringify(response),
},
},
},
},
],
invocationId: chunkEvent.returnControl.invocationId,
},
};
const command = new InvokeAgentCommand(agentInputWithTadoResponse);
This updated agent request is then passed back to Bedrock:
const updatedResponse = await bedrockClient.send(command);
And finally, to handle the agent wanting to make subsequent requests the loop calls processResponseCompletion recursively until the agent has exhausted all the calls it needs to fulfil the request
const finalResponse = await processResponseCompletion(
agentInputWithTadoResponse,
updatedResponse.completion!
);
https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/bedrock/
https://docs.aws.amazon.com/bedrock/latest/userguide/agents-returncontrol.html
https://docs.aws.amazon.com/bedrock/latest/userguide/agents-session-state.html#session-state-return-control
https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-agent-runtime_example_bedrock-agent-runtime_InvokeAgent_section.html
With the agent deployed and the chatbot web client running, you can do some pretty powerful stuff. In the above screenshot I sent Bedrock a relatively complex request involving querying the Tado API and then presenting the results in a tabular format. A projection of this data in this format isn’t something Tado can do out of the box.The real neat’o bit about Bedrock is that it can make as many API calls as necessary within a single request to fulfil the user’s prompt.
To fulfil the above request, Bedrock made this API call to Tado:
/api/v2/homes/<my_home_id>/zones GET
After learning the corresponding zoneIds for the rooms I requested, it then knew to make the following calls to get their respective temperatures:
api/v2/homes/<my_home_id>/zones/1/state GET
api/v2/homes/<my_home_id>/zones/3/state GET
api/v2/homes/<my_home_id>/zones/6/state GET
Remember, at this point the only code I’ve written is to parse the api invocation request the agent suggested, pass it to Tado, then pass the result back to the agent. From the OpenAPI spec it figured out what endpoints to call.
Let’s make things a little more interesting. Up to this point all of the api calls Bedrock has made are GET requests. In this next prompt, it will need to submit a new desired target temperature to my Tado system:
This time, the agent made these requests:
/api/v2/homes/<my_home_id>/zones GET
/api/v2/homes/<my_home_id>/zones/1/state GET
/api/v2/homes/<my_home_id>/zones/1/overlay PUT
{
"termination": { "type": "MANUAL" },
"type": "MANUAL",
"setting": { "type": "HEATING", "power": "ON", "temperature": { "celsius": 20.4 } }
}
Using the Tado API spec, it knew to call the state endpoint to get the current room temperature, increased it by 3 degrees and then passed that in as the json payload to the overlay endpoint. To date, LLMs have been notoriously bad at math. However from my experience with the Claude models, it not only got the calculations right but it was also smart enough to increase the temperature based off of the current room temperature (and not the set temperature, which would’ve just resulted in it matching what the room already is).
For anyone familiar with the Iron Man comics and films, Tony Stark has a pretty cool AI assistant that aids him with everything. Typically tasks a little more complex than controlling his home heating, but nonetheless we can add a bit of his style into our interactions.
To do this, you can update the agent instructions within AWS Bedrock. Navigate to your deployed agent within the AWS Bedrock Console and select the “Edit in Agent Builder” button at the top right. Then scroll down to the “Instructions for the Agent box”:
Then, add a little bit of J.A.R.V.I.S prompting in there:
Now, once you’ve saved and prepared the model, you can go back to the chat client and make the request again and hey presto:
You can use the Agent instructions for many things, such as tailoring the way it responds to users, how it handles requests, safeguards you’d like including etc
Most of what I’ve built here was mainly to satisfy my own curiosity of what AI Agents can be capable of, but I can envisage some commercial usages for this too.One in particular is for freeform experimentation of user journeys. If you have an arsenal of existing API endpoints that power flows (in your website for example), in opening them up to a chatbot interface like I’ve done here you may discover creative ways and requests in which your customers may actually want to interact with your product that you haven’t spotted before.Perhaps instead of that 20-step arduous process, your users can achieve the same result using a couple of guided prompts from an LLM?I would suggest recording the different journeys users take here, identifying common flows and building these out in your existing product - AI agents are great for showing what’s possible but should always be replaced by a more robust and well tested flow.
I was able to prove that AI agents can be used for more than just enriching user requests, and agents are capable through techniques like Chain of Thought to understand how to leverage an API to fulfil user requests. What I’ve shared here can be used as the foundational pattern for integrating and opening up 3rd party APIs to new and novel user experiences.
LATEST INSIGHTS