Teradata Vantage - MAPS: 2. Small Tables in Sparse Maps
This is Carrie Ballinger. I’m a Senior Technologist in the Technology & Innovation Office here at Teradata. This is the second of a series of 3 TechBytes on the topic of the MAPS feature. So let’s get started. This is the one we are going to talk about sparse maps. In our previous TechByte, we looked at the contiguous maps, which are the maps that cover a range of AMPs. And that you get a new contiguous map every time you expand your system. The contiguous maps are only created at system expansion time. However the second type of map – the sparse map – is something that you actually have more control over. You can create your own sparse maps if you wish. The purpose is to allow tables to reside on one or just a small number of the AMPs in the system instead of being spread across all the AMPs in the system. Sparse maps reside within contiguous maps; so if you look at the graphic on the slide, you see we have a sparse map that’s holding a Codes table and a Stores table and it is within TD_Map1. So the AMPs that are selected to support tables moved into the sparse maps will always be within that one contiguous map. You automatically get 2 sparse maps when you get onto the 16.10 Teradata Database software level. These default sparse maps – one of them will be a single–AMP sparse map, and the other will be a multi-AMP sparse map. The multi-AMP sparse map will cover a number of AMPs that will be equal to the number of nodes in your configuration. So if you got a 4-node system, you’ll get a one-AMP sparse map, and you’ll get a 4-AMP sparse map. Now, you can start moving your small tables in to these sparse maps immediately as soon as you get on the appropriate software level. So you don’t have to wait for a hardware expansion to start taking the advantage of the MAPS facility to accommodate these small tables. So let’s take a look at why you might want to move small tables into a sparse map. Consider a table, like a Promotion table, where you have 5 rows. If you are scanning that Promotion table, you are going to be engaging every single AMP in your configuration whether or not that AMP owns one of those 5 promotion rows. And that’s going to be an all-AMP step. Every AMP is actually going to do a lot of different things – Now, these are all very, very small activities but have to be done on every AMP. Every AMP has to process the BYNET message, has to get an AMP worker task, has to read the table header, get a little memory. It has to do at least on 1 physical I/O even though it’s not going to find anything. So all of this can add up if every AMP in the configuration is doing this unnecessary work when it is really only 5 AMPs that are being productive. The other thing is if it got a busy system, it’s possible that one AMP or may be several AMPs are getting close to being congested – Maybe they are out of AMP worker tasks temporarily, and they might cause a delay in getting the completion of that step done. So, any time you can reduce the number of AMPs involved in the process, the more efficient it’s going to be and more elapsed-time consistent it’s going to be. So if you can put – and the MAPS feature lets you do this – if we can put all of those 5 rows in one AMP, then doing a scan of a Promotion table becomes a single-AMP operation, and all of the other AMPs then are free to do other work, free to support other activities that are going on at the same time as less locking contention, certainly less congestion potential on other AMPs in the system. And it’s likely to be a more consistent experience if you’re repetitively accessing something like the Promotion table. So, I/O is going to be a big reduction here because now only single AMP – the AMP with that table is going to do the I/O. And chances are, all of those 5 row are in 1 data block. In addition, if you set up your Workload Management where a single-AMP steps or queries that only access a single AMP are given a special priority – and some people do this – they get a tactical priority for their single-AMP requests then access to the Promotion table all of a sudden becomes a tactical request and this can further accelerate the consistency and the elapsed time involving accessing that table, or similar small tables. So how do you know which tables then are appropriate to move into the sparse maps? Well, certainly any tables that have fewer rows than AMPs is a good candidate. You can probably think of a few of those in your own environment. And among the tables that are small, you really want focus on the tables that are frequently accessed. It’s not going to be much benefit to you to move tables in the sparse map if nobody ever touches it anyway. In fact, before you start moving tables to sparse maps, it’s not a bad idea to clean up your environment a little bit maybe delete those small tables that people put out there over the years for experimental reasons, or applications that are no longer in that form anymore. So think about doing some cleanup work before you move your tables. In addition, we are going to give you some guidelines on what tables are appropriate for sparse maps – There’s a map sizing view that’s available to you that you can run at any time, and it will tell all the tables that are appropriate for the single AMP or multi-AMP sparse maps. It uses these criteria that I am showing you on this slide. Single-AMP sparse maps are appropriate for tables that are 128KB or less. And this is minus the table header. What this view does is it strips off the table header overhead and just looks at the perm data size minus table header. If that perm size is less than 128KB, that’s a good choice to move that table to a single-AMP sparse map. For the multi-AMP sparse maps, it’s a sort of the same guideline, but just take that 128KB size and multiply it by the number of AMPs that multi-AMP sparse map covers. On a 4-node system, that would be 4 AMPs; so you just take tables that are 128KB times 4. Tables that are smaller than that would be a great fit for that multi-AMP sparse map. I just want to give you one word of warning here on empty tables. It may be tempting to move all of your empty tables into sparse maps. However some of these empty tables you need to be careful with because they may be used as staging tables in ETL processes. If you were to move them, for example into a single AMP sparse map, and you run your ETL processes that night, you may find that they are staging for a million rows or more. And that’s going to severely impact that ETL job. So be vigilant about empty tables. And the other tip I want to leave with you here is that some tables are small today but they might be bigger tomorrow. Maybe Promotions today for us has 5 rows – maybe we only have 5 promotions – but our company is growing and next month or three months out maybe we’ve got 4,000 promotions or 10,000 promotions. So occasionally you need to review these small tables. It is very easy to do – Viewpoint will do it for you, or the table sizing view I mentioned will do it for you as well. But make sure that’s in your mind somewhere when you start working with sparse maps. So we got a single-AMP sparse map. Let’s say we’ve got 100 small tables. How do we control which AMPs those tables are going to be targeted to when they are moved into the sparse map? Do we need to create an individual sparse map for each of these tables? Well – No. In fact you don’t even have to think about which AMP these tables are being placed in because the Database is going to do this for you. A single-AMP sparse map can be home for hundreds of small tables. The system will automatically decide for you where your small table is going to reside, which AMP is going to be a recipient of its rows when you move it to, for example, a single-AMP sparse map. And when the system does this, it attempts to spread these tables across different AMPs. Not only different AMPs but different nodes. So the first table you move in to is going to go on 1 AMP on node A. And second table you move into a sparse map is going to go on another AMP on node B, and the third table you move into the same sparse map is going to go on some AMP on node C, and so forth. And then we go back and fill in other AMPs as needed. So if you got lots of AMPs in your configuration, you may be just fine with one single-AMP sparse map. In fact most of you will be. And the same as too for your multi-AMP sparse maps. The system will spread among different AMPs reducing pressure on any single one AMP and really minimizing the chance that there’s going to be a bottleneck. The Viewpoint portlet, which I mentioned in the previous TechByte is very helpful for getting small tables into the right sparse map without you having to give any thought. When you run the analysis job the output will tell you which of the tables – and if scoped by the database – it will tell you which of the tables in your database are appropriate for sparse maps and which are appropriate for contiguous maps. And then among the different sparse maps, it will tell you which of those is appropriate for that given small table. You can look that over – You can look it over and see some of those empty tables might be used in ETL processes, for example, and then you run the Mover job it will do all the moving for you automatically without you being involved. Moving small tables into sparse maps is a very quick process so you don’t have to plan too carefully about when you are going to do this kind of move because it happens in a very short amount of time. When you get on 16.10, as I mentioned, you can begin to move those small tables to your sparse maps right away. But at some point, it’s likely that you are going to be expanding that system. And when you do, you are going to get that second contiguous map, TD_Map2, which incorporates a larger number of AMPs. At that point, you’re going to get additional two default sparse maps. They will be named slightly different; you will be able to tell that they belong to TD_Map2. And those new default sparse maps are going to incorporate all of the AMPs in the system based on TD_Map2. So they have more AMPs, more choices to put those single-AMP sparse map tables. So what I recommend you do is – Once you are up and running with TD_Map2, even if you haven’t moved any of your large tables into TD_Map2, move all of those small tables that were in the previous default sparse maps into the new default sparse maps. This is an extremely simple process – I know it sounds a little complicated – but you just run the Viewpoint MAPS Manager portlet. And it will automatically recommend the new default maps for you. It will see those small tables in the old default maps it will immediately recommend you’ll move them into the new default map, and the Mover job will do that all for you. So, very simple process. The advantage of doing that – I am going to show you here – if you look at this graphic on the left, that’s our old configuration. And let’s say it’s 8 AMPs and we’ve got 10 small tables moved into that single-AMP sparse map. And we got them pretty much spread around – now there are a couple of AMPs now that are supporting 2 of those small tables. In a real world you probably more likely to have 50, 100, 500, or 1,000 of these small tables possibly in a single-AMP sparse map. When we move them into the new default sparse map that is associated with the new contiguous map, TD_Map2, we’ve got more AMPs to spread these small tables across. So we’ve now eliminated any doubling up of tables on the same AMP; and we’ve reduced any possibility that there might be more contention on that AMP. So this is just a suggestion I leave with you. You don’t re-config or expand your system that often and this just a very good practice to undertake to clean up those sparse maps. And of course you can drop old sparse maps when you have emptied them out. Just a small recap of the benefits of sparse maps. Thing I like about them is that you can use them immediately, and they solve immediate problems for many customers in terms of minimizing the resources required to access a small amount of data. If you are doing repetitive access of some of your small tables then you’ll definitely see more consistency in the queries that access those particular tables. So we are saving some resources, increasing some efficiencies in a few places, and it’s almost effortless to do the move of tables. So this is a feature I think a lot of people are going to enjoy using and will benefit from as soon as they get to Teradata Database 16.10.