Hi, I’m Scott Dykstra. I lead Teradata’s Cloud Solutions team. I’m a data scientist, and I help Teradata’s largest global customers deploy their analytics environments. You already know that Teradata’s the undisputed industry leader for enterprise analytics, but you’d probably be surprised by how optimized we are for the cloud. Today I’m going to demonstrate Teradata’s integration with Cloud Services, our self-service simplicity, and a couple powerful analytic functions. Let’s get started. I’ll have three demonstrations today. I’m going to start with Vantage Developer, where I can run code and show the machine learning engine function and advanced SQL engine functions embedded within SQL, as well as the native object store capability to pull data in from an object store. I’m going to show Vantage Analyst, where we can review the result set of analytic functions like nPath, for example, without running any code. Finally, I’m going to show Vantage Operations, where we can manage and monitor sites and perform DBA activities like managing our backups and scaling up and down. So let’s get started with Vantage Developer. My first use case that I’m going to demonstrate is the native object store, where we can directly access object store services like S3 on AWS or Azure Blob storage. Let’s say we have a data set of customer churn events for telco data, and let’s say we have a call detail record for our customers stored in S3. It’s infrequently accessed, so we put it in S3 and we maybe have our customer record, which is much more frequently accessed, in Vantage. The first thing we’d want to do to join the S3 with the data in Vantage using native object store is to create a foreign table object that points Vantage to the S3 buckets or file.
Hi, I’m Scott Dykstra. I lead Teradata’s Cloud Solutions team. I’m a data scientist, and I help Teradata’s largest global customers deploy their analytics environments. You already know that Teradata’s the undisputed industry leader for enterprise analytics, but you’d probably be surprised by how optimized we are for the cloud. Today I’m going to demonstrate Teradata’s integration with Cloud Services, our self-service simplicity, and a couple powerful analytic functions. Let’s get started. I’ll have three demonstrations today. I’m going to start with Vantage Developer, where I can run code and show the machine learning engine function and advanced SQL engine functions embedded within SQL, as well as the native object store capability to pull data in from an object store. I’m going to show Vantage Analyst, where we can review the result set of analytic functions like nPath, for example, without running any code. Finally, I’m going to show Vantage Operations, where we can manage and monitor sites and perform DBA activities like managing our backups and scaling up and down. So let’s get started with Vantage Developer. My first use case that I’m going to demonstrate is the native object store, where we can directly access object store services like S3 on AWS or Azure Blob storage. Let’s say we have a data set of customer churn events for telco data, and let’s say we have a call detail record for our customers stored in S3. It’s infrequently accessed, so we put it in S3 and we maybe have our customer record, which is much more frequently accessed, in Vantage. The first thing we’d want to do to join the S3 with the data in Vantage using native object store is to create a foreign table object that points Vantage to the S3 buckets or file. That Create Foreign Table statement simply tells Vantage where in S3 the data resides. So I’m going to create this Create Foreign Table statement in my code editor, and now it’ll create a new object in Vantage that simply tells Vantage where in S3 the data is. So now an end-user like a business analyst or a data scientist, they can run queries against this table as if it were simply a table within Vantage and they don’t have to know exactly where the S3 bucket is or put in the S3 location, right? Now that we have this foreign table with access to S3, let’s ad hoc pull in data from S3, pull in this customer call detail record from S3, and join it with our core customer record data in Vantage. So with just four lines of SQL, I can actually create that join. So I’m going to take this four lines of SQL and paste it in my code editor, and when I run this query, I’m now joining data ad hoc from S3 with the customer record in Vantage, and you’ll see the result set here as a new join table of our call detail records in S3 with our customer record in Vantage, and now that we have this join table —oh, great question. “Why not store everything in S3?” Well, at the enterprise level, performance is important, and a lot of our customers put their frequently accessed data in Vantage for high performance, and maybe newer data sets or less frequently accessed data they store in S3 for cheaper cost. So now that we have this join table, let’s perform some analytics around it. We have a join table of call detail records from S3 with our customer record in Teradata, and now let’s use Teradata’s powerful nPath function to view the pass of our customers to an outcome. In this case, it’s telco churn. Now you’ll see here this nPath function is an advanced SQL function from Teradata that we can embed within standard SQL, and we can provide some parameters about how we want to run this function, where to partition it, what table to run it on, how to order it, etcetera. This nPath function, which I’m now running, is going to take this join table of S3 data and data within Vantage and it’s going to show us the pass of our customers to an outcome—in this case, telco churn—and as you see from the result set here, we can see the first step that the customer took, maybe calling into a call center, and the final step, closing an account. “What if you use Spark?” Great question. Why would I want to use Vantage? Well, three huge benefits. The first is SQL users can benefit from these machine learning engine and advanced SQL engine functions. You don’t have to export data out of the data warehouse into another analytics platform like Spark. And finally, it’s a single workflow deployed at scale. So you can run this single workflow using the SQL you’re already using where the data already resides and deploy it at scale with Vantage in the cloud or on-premise. We just demonstrated using the developer environment how easy it is to embed machine learning engine and advanced SQL engine functions like nPath, joining data ad hoc from S3 with data already in Vantage, but let’s say we wanted to review the result set of that without running any code. Opening up our Vantage Analyst environment here, you’ll see that we have four different capabilities—a workflow editor, where we can create and manage workflows, data analytic, end-to-end workflows; create rules about data transformations and analytic calculations; we can create sophisticated predictive models without running any code; and we can explore the pass of our customers to a certain outcome—in this case, telco churn. So let’s use that nPath function that we just run in our code editor and let’s actually view the result set of that join table and see how our customers are getting to a churn event. What is the path? So I’m going to log into this system here and reconnect to that telco churn database, choose the start and end data with which we want to perform this analysis, and now we’re running the nPath function behind the scenes using Teradata’s machine learning engine. And now we have a powerful visualization of our customers from the start of an account all the way to account closure. Now you’ll see a lot of times the path is going to their account summary online, reviewing their contract, reviewing their bill, calling into the call center around a bill dispute, for example. Maybe it’s a service complaint. Nine thousand, three hundred and fifty customers in this data set viewed their account summary and then went to a service complaint. Telco companies want to understand why are customers churning, what is getting them from logging onto the website to a bill dispute to a service complaint to account closure, and this nPath function, which we can simply embed within Teradata SQL or visualize via Vantage Analyst allows us to simply understand that in a unique visualization. Great question. “How do I productionalize this?” Well, Teradata’s unique in its flexibility to deploy this machine learning engine and native object store capabilities both on premise and in the cloud, public cloud, right? AWS, Azure and soon GCP. So we can productionalize this on-premise or in the cloud, right? We’re not limited to where we can deploy these capabilities. Finally, I want to show the Vantage Operations environment, where we can manage and monitor sites and perform DBA-like activities, scale up and down, for example. Now, in this case, we’re only looking at one AWS site, but we could quickly provision another site on AWS or Azure in just a few minutes. Viewing this AWS site, we have some site information, for example, where our Vantage environment is deployed, what AWS region, information about the SQL and machine learning engines that are being run, how many terabytes, the size, etcetera. We can self-service scale or compute and scale our storage. That’s a good question. “Can we scale our compute and storage independently?” Of course. That’s absolutely a requirement of any modern cloud architecture. Self-service using this Vantage Operations environment, you as a customer can increase the size of your compute instance, add more compute instances on the fly, increase your storage incrementally, and you can do that independently. You don’t have to do both together. I could go from 100 terabytes to 110, or I can go from 10 instances to 12 compute instances, and I can do that independently of each other. Now, it also allows us to connect to S3, right? We already showed the native object store capability, but we have the ability to self-service, connect to our own customer bucket in S3. Finally, we can view utilization of our CPU and disk space, and we can manage our backups. So let’s say I want to schedule my backup at 1 a.m. Monday morning to avoid conflict with any ETL jobs that are running, I can do that self-service in this operations environment. As a data scientist or business analyst, the biggest challenges we all face are data prep, getting access to the right data, having the power and scale to leverage large enough data sets and high enough quality data sets to get valuable results, and then once we’ve done all that, how do we productionalize, actually deploying your analytics in a way that drives business outcomes? That’s where a lot of experiments fall apart. What I’ve demonstrated today is how to leverage all the data in your data warehouse plus external data sets in your ecosystem, like object store data, and to process it leveraging powerful analytic and machine learning engine functions, scaling up and down when needed, and the flexibility to productionalize both on-premise and in the cloud, all with Vantage.