Many Teradata customers are interested in integrating Teradata Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Teradata Vantage with AWS Kinesis service.
Although this approach has been implemented and tested internally, it is offered on an as-is basis. Neither AWS nor Teradata provide validation of Teradata Vantage with AWS services.
We encourage your feedback. We want to understand what you found useful and how we can improve this guide.
Please send your feedback to shamira.joshua@teradata.com and wenjie.tehan@teradata.com.
Disclaimer: This guide includes content from both AWS and Teradata product documentation.
Overview
AWS Kinesis is a streaming service that makes it easy to collect, process, and analyze real-time, streaming data.
Kinesis streaming data platform offers Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams. Kinesis Data Streams is manually managed and can store data in the stream for up to seven days, in which transformation can be done with the data. Kinesis Firehose is fully managed, and collects the data and stores it in Amazon S3, Redshift, Splunk and Elasticsearch. Kinesis Video streams is used to stream live video and Kinesis Data Analytics can process and analyze streaming data using standard SQL.
Teradata Vantage Native Object Store (NOS) makes it easy for users to explore data in external object stores like Amazon S3 using standard SQL and application interfaces like ODBC, JDBC, .NET, Python and R native drivers. No special object storage-side compute infrastructure is required to use NOS. You can explore data located in Amazon S3 bucket by simply creating a NOS table definition to point to the bucket you are authorized to access.
This guide describes the process to stream data from source to Amazon S3 via AWS Kinesis firehose, transform it to JSON format by an AWS Glue ETL job, and then use Teradata NOS to access data from Amazon S3. Lambda functions and a CloudWatch event rule is also created to automate the whole process.
Prerequisites
You are expected to be familiar with AWS Kinesis, Lambda, CloudWatch services, and Teradata Vantage.
You will need the following accounts, and systems:
• An AWS account
• A Teradata Vantage instance with SQLE 17.0+
• An Amazon S3 bucket to store streaming data
• An Amazon S3 bucket to store JSON files
• IAM roles that allow Glue Crawler, ETL and Lambda services
• AccessKeyId and SecretAccessKey
Getting Started
Create Amazon S3 buckets
Amazon S3 buckets can be created using instructions here. Two buckets are needed in this example: one to store streaming data (i.e., ptctstoutput), and another one to store JSON files (i.e., awspilbucket) after transformation.
Create IAM role
AWS services require you to use roles to allow the service to access resource in other services on your behalf. In this example, three roles are needed – a role for Kinesis Firehose, a role for Glue, and a role for Lambda.
Kinesis Firehose role will be created on the fly. Instructions below create roles for Glue and Lambda services.
Create Firehose Delivery System
Create Glue ETL Transformation Job
Accessing Streaming Data Using NOS
Create Lambda functions, Trigger, and CloudWatch event
Run