Article

AWS Kinesis Firehose and Teradata Vantage

Many Teradata customers are interested in integrating Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Vantage with AWS Kinesis service.

Wenjie Tehan

September 22, 2021 3 min read

Many Teradata customers are interested in integrating Teradata Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Teradata Vantage with AWS Kinesis service.

Although this approach has been implemented and tested internally, it is offered on an as-is basis. Neither AWS nor Teradata provide validation of Teradata Vantage with AWS services.

We encourage your feedback. We want to understand what you found useful and how we can improve this guide.

Please send your feedback to shamira.joshua@teradata.com and wenjie.tehan@teradata.com.

Disclaimer: This guide includes content from both AWS and Teradata product documentation.

Overview

AWS Kinesis is a streaming service that makes it easy to collect, process, and analyze real-time, streaming data.

Kinesis streaming data platform offers Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams. Kinesis Data Streams is manually managed and can store data in the stream for up to seven days, in which transformation can be done with the data. Kinesis Firehose is fully managed, and collects the data and stores it in Amazon S3, Redshift, Splunk and Elasticsearch. Kinesis Video streams is used to stream live video and Kinesis Data Analytics can process and analyze streaming data using standard SQL.

Teradata Vantage Native Object Store (NOS) makes it easy for users to explore data in external object stores like Amazon S3 using standard SQL and application interfaces like ODBC, JDBC, .NET, Python and R native drivers. No special object storage-side compute infrastructure is required to use NOS. You can explore data located in Amazon S3 bucket by simply creating a NOS table definition to point to the bucket you are authorized to access.

This guide describes the process to stream data from source to Amazon S3 via AWS Kinesis firehose, transform it to JSON format by an AWS Glue ETL job, and then use Teradata NOS to access data from Amazon S3. Lambda functions and a CloudWatch event rule is also created to automate the whole process.
Picture1-(1).png

Prerequisites

You are expected to be familiar with AWS Kinesis, Lambda, CloudWatch services, and Teradata Vantage.
You will need the following accounts, and systems:

•   An AWS account
•   A Teradata Vantage instance with SQLE 17.0+
•   An Amazon S3 bucket to store streaming data
•   An Amazon S3 bucket to store JSON files
•   IAM roles that allow Glue Crawler, ETL and Lambda services
•   AccessKeyId and SecretAccessKey

Getting Started

Create Amazon S3 buckets
Amazon S3 buckets can be created using instructions here. Two buckets are needed in this example: one to store streaming data (i.e., ptctstoutput), and another one to store JSON files (i.e., awspilbucket) after transformation.

Create IAM role
AWS services require you to use roles to allow the service to access resource in other services on your behalf. In this example, three roles are needed – a role for Kinesis Firehose, a role for Glue, and a role for Lambda.
Kinesis Firehose role will be created on the fly. Instructions below create roles for Glue and Lambda services.

Create Firehose Delivery System

Create Glue ETL Transformation Job

Accessing Streaming Data Using NOS

Create Lambda functions, Trigger, and CloudWatch event

Run

Stay in the know

Subscribe to get weekly insights delivered to your inbox.

Business Email*

Country*

Yes

I consent that Teradata Corporation, as provider of this website, may occasionally send me Teradata Marketing Communications emails with information regarding products, data analytics, and event and webinar invitations. I understand that I may unsubscribe at any time by following the unsubscribe link at the bottom of any email I receive.

address1

Your privacy is important. Your personal information will be collected, stored, and processed in accordance with the Teradata Global Privacy Statement.