How to call a Lambda function in another account from AWS Redshift¶
Frequently when making ML predictions, you want to write those predictions back to your data warehouse. This is common in the case of lead scoring, customer churn predictions, or product recommendations based past product purchases. The source data lives in a warehouse, and the predictions should too.
Often the prediction model lives in AWS Lambda and the warehouse is AWS Redshift. Calling the lambda function from the Redshift cluster should be straightforward. However, often the data science AWS account containing the prediction models is different from the engineering or DBA AWS account where the Redshift cluster lives. This is frequently done to separate out billing or to assign the AWS costs to different cost centers.
Calling from one to the other becomes tricky. It's a problem we had to solve for our Redshift integration at Modelbit, and we couldn't find a guide to solve the problem. Fortunately we were able to make it work, and we're excited to share our solution with you!
Getting Started, and Running this Notebook¶
As this is a Python notebook, we'll provide runnable code in Python to make it work yourself. You can do all this from the AWS console or the CLI as well. In any case, we'll need a few things to get started:
# We'll need boto3, the API to AWS configurations, to do all the permissions work. We'll use redshift_connector
# to make the actual function call. Both are installable via pip.
import boto3
# Paste in the account IDs of the source account (with the Redshift) and the destination (with the lambda)
destination_aws_account_id = "123456789"
source_aws_account_id = "987654321"
# Let's also get the cluster identifier of the calling Redshift cluster (i.e. its name)
# and the ARN of the callee lambda
destination_lambda_arn = "arn:aws:lambda:<REGION>:{destination_aws_account_id}:function:<FUNCTION_NAME>"
source_redshift_cluster_identifier = "<CLUSTER_IDENTIFIER>"
You'll note we assume you're logged into AWS, as is usually the case. If not, most folks prefer to run aws configure
at the command line. You can also pass AWS private keys or keyfiles to the boto3.client
calls, as explained here. To start, make sure you're logged into the AWS account with the Redshift cluster.
Permissions: Role for Redshift¶
For this to work, we'll need to create an IAM role that Redshift will assume to run the lambda function. The role itself will need to be a Redshift service role, i.e. it has the ability to be assumed by AWS Redshift. Its trust policy document will look like so:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "redshift.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
You can create this role via the Create Role button in the IAM Roles Page in the AWS console, via the CLI, or with this Python code. Here's we'll name it RedshiftCallLambdaRole
:
client = boto3.client('iam')
response = client.create_role(
RoleName = 'RedshiftCallLambdaRole',
AssumeRolePolicyDocument = """{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "redshift.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}"""
)
redshift_role_arn = response['Role']['Arn']
Permissions: Policy for Redshift¶
Now that we have a role, it needs its policy. This policy needs to allow the role, which Redshift will run as, to call the lambda function in your other account. Let's create it here:
response = client.create_policy(
PolicyName = 'RedshiftCallLambdaPolicy',
PolicyDocument = """{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "lambda:invokeFunction",
"Resource": "%s"
}
]
}""" % (destination_lambda_arn) # destination_lambda_arn getting substituted into the "%s" earlier in the string
)
redshift_policy_arn = response['Policy']['Arn']
Permissions: Attach the Policy to the Role, and the role to your Redshift cluster¶
Let's associate the policy we just created to the role we made a moment ago:
client.attach_role_policy(
RoleName = 'RedshiftCallLambdaRole',
PolicyArn = redshift_policy_arn
)
Finally, we need to associate the role to the Redshift cluster so it can be used when Redshift takes actions -- like calling our lambda function:
client = boto3.client('redshift')
client.modify_cluster_iam_roles(
ClusterIdentifier = source_redshift_cluster_identifier,
AddIamRoles = [redshift_role_arn]
)
Voilà! We're officially done with the Redshift side of the permissions jungle.
Permissions: Resource-based policy to the Lambda function¶
The last piece of the permissions puzzle is to tell your lambda function to accept invocations from your Redshift cluster -- or specifically from the role your Redshift cluster will be acting as. That's the role you just created.
As always, you can do this from the CLI, or from the "Configuration" tab of your Lambda function in the AWS console. We'll give you the Python code below.
Important: Log out of AWS and back in! You're currently logged into AWS with the source account. But the Lambda function is in the destination account. You'll need to log into the destination account to continue. As before, do so with aws configure
in the terminal, or pass keys into your boto3.client
calls.
Now that you're logged into the destination account, let's give Lambda those extra permissions:
client = boto3.client('lambda')
client.add_permission(
FunctionName = destination_lambda_arn, # This parameter also accepts full ARNs, which we make use of here
StatementId = 'AllowCrossAccountRedshift',
Action = 'lambda:InvokeFunction',
Principal = redshift_role_arn
)
There you have it! redshift_role_arn
now has permissions to run invokeFunction
on your lambda function.
Creating and executing a SQL function that runs your lambda function¶
Now that permissions is squared away, we're ready to call our cross-account lambda function from our Redshift cluster! To do that, we're going to create a Redshift external function.
We're going to make use of redshift_connector, AWS's Python API to Redshift SQL, for this part. You'll need the normal Redshift connection info here.
import redshift_connector
conn = redshift_connector.connect(
host = "<REDSHIFT HOST ADDRESS>",
database = "<REDSHIFT DB NAME>",
user = "<REDSHIFT USERNAME>",
password = "<REDSHIFT PASSWORD>"
)
Now let's create the external function that calls our lambda function!
conn.execute("""
CREATE EXTERNAL FUNCTION callMyLambda(arg1 int, arg2 varchar) -- these args go straight to Lambda
RETURNS varchar
STABLE
LAMBDA '%s'
IAM_ROLE '%s'
""" % (destination_lambda_arn, redshift_role_arn))
Boom! Now we have a Redshift function that calls out to a Lambda function in another account. It does so by specifying destination_lambda_arn
as the full ARN of your lambda function, using the role redshift_role_arn
that we created which has special cross-account privileges.
You can call your function from Redshift any time:
conn.execute("SELECT callMyLambda(1, 'Hello, world!')")
That's all for now!¶
We hope this gives you a helpful guide to calling Lambda functions in separate AWS accounts. If you get stuck, please email me!
And if you're using this to deploy ML models into your warehouse, consider trying Modelbit. We think it's pretty cool. ;)