Armchair Architects: How Data is Changing – Part 1

brauerblogs · ‎Apr 24 2023

Welcome back to Armchair Architects as part of the Azure Enablement Show. We've talked a bit on this show in the past about how applications and workloads have changed, but what's that doing to data and how we're dealing with it? Our hosts are David Blank-Edelman and our armchair architects Uli Homann and Eric Charran.

On previous episodes, we talked a little bit about how apps were changing for various reasons. They were changing because they were embedded in a team's client and, various other reasons. These changes will have an impact on the data.

Data is changing based on the way that people are using apps.

From Eric’s perspective, data is changing based on the way that people are using apps, from the customers he has spoken with and the people across multiple verticals. They're very interested to understand what's happening right now because they need to either answer questions and gain insights from the data or they need to drive actions associated with the data. For example, if I have an oven that's in an industrial customer’s manufacturing plant and it's trending in the wrong direction in terms of temperature. I need to act right now in order to address that issue so that I don't have downtime which is costly. That's an example of an action and insight into why did that happen, how often has it happened and is it part of a larger trend? Eric thinks what's happening with these questions that customers and people have in the real world, is that it is driving us more towards realizing that data is just a stream.

We used to live in a batch world, which you would just collect transactions, do some enrichment, copy them someplace and you try to derive insights from that, which still works and is still valid today. But the way that Eric thinks that the world is trending is that it is an example of a very cold stream of data. It occurs more infrequently, it's larger in its chunkiness and its size, but more and more we're seeing insights and actions that need to be driven from hot data or data at warmer temperatures. Which is I'm getting messages; those messages are being enriched and they're transforming from telemetry into events. Those events mean things, and now I have an entire ecosystem of apps that rely on data that are event driven.

Uli seeks to clarify terms such as data temperature and talks about real-time analytics vs. synchronous processing.

Uli commented that Eric is introducing a couple of terms that we might want to talk a little bit about. The word “cold data”, means you put the data that you generated on storage and then at some point in time you query the storage using a database or NoSQL system, or another system then it shows up in your application. The closer you get to the point of origin of the data where the data got generated, in Eric's example, for the oven in the manufacturing plant, the hotter the data gets because it comes right off the system.

You want to be able to react to it very quickly and Uli thinks that's the other dimension that Eric folded into his conversation which is we used to be able to work on data fairly late. Now what we want to do is work on the data very early and that analytics which used to be a post processing step lagged so far behind the events that have already happened. Then we analyze what happened and we potentially do something about the event which has been shifting closer and closer to the point of origin where you want to drive analytics right then and there to make decisions to potentially go and drive different behaviors. Uli thinks that it’s a big shift and cautions that people don't confuse real time with synchronous processing. Keep introducing asynchronous processing, real time thinking so that you are not at a point where you have tightly coupled systems because you think you need synchronous behavior equals real time, which it doesn't. From an architecture perspective, be aware of this little trap because that can get you into real trouble from a systems architecture perspective as well as from the systems update perspective. From the data perspective, Eric's points are spot on. Uli would change Eric’s definition slightly and change it to “analytics is moving to the point of origin and analytics is becoming a key part of the business application of the transaction rather than what we used to do which is we just record and save the record.” That's why these things are called systems of record. Furthermore, we have an insight application or an analytics application which runs batches and then says, “Oh my gosh, if we would have gone left instead of right, that would have been so much smarter.” And now what we're doing is saying, “let's go and bring intelligence closer to the point of origin.” Determine if we should go left and right based upon know-how and pattern experience, and then drive the workflow left or right depending on what the right decision is.

Uli discusses the historical shifts in data from systems of record to an increased prominence of data analytics.

Before going deeper into this discussion, let’s step back a little bit and talk about the structural changes which are happening. In the past, you effectively had only systems of record and you had primarily relational databases. Uli grew up with relational databases being the answer to whatever the data question was. Yes, there were files, but in general it was relational databases and over the last 15 years you saw the rise of NoSQL databases which deal with less structured data. There was still structure in the data, but it's far less structure. Schemaless is a big term that has been coming along and that has been a big shift in data, but the bigger shift is that analytics is taking over more space in the data environment than it used to which aids in decision making. And if you're looking at traditional business applications, Uli estimated, in the 1990s 80%-90% of what they were worried about was transactional systems of record work and 10-20% was focused on reporting. Now if you look at what's going on is that we are now living in a data rich world and we want to use that data to drive a new category of applications, we call them insights applications. Other people call it systems of intelligence. It depends on where you come from as to what you call it, but you're now seeing 80% of the work happening in the analytics space and now only 20% of the work is happening in the systems of record because now you want to use all this data to determine patterns, repeatability.

The timeliness and hierarchy of decision-making can be viewed as a funnel.

The changes in the data world from relational as the only answer mixing relational now with non-relational, NoSQL and those kinds of things. Then there is this big shift from systems of record to systems of intelligence being the big thing that people are worried about. Then when you're looking at the timeliness and you are now answering your question. The thing is a funnel at the end of the day and what happens is at the beginning of the funnel is the hot data. So, it just got generated and we're using machine learning models, but we don't have to necessarily as it could also be a piece of code that makes a very localized decision.

From Eric’s earlier example the oven in the manufacturing plant has generated data ABC. Those are three signals and based upon these three signals and my past knowledge, I now can make a decision based upon what I'm seeing. In addition, if I now introduce advanced machine learning concept like reinforced learning and so forth, I can get a better result because with the three signals over time I will know which one is more important than the other or maybe the composition is more important. And so, it's really a very localized decision and the further out you go into that funnel, the more information you gather and therefore the decision surface gets bigger, but it also gets slower because that's the way it works. If you're looking at modern analytics platforms, they actually go from stream analytics to what Uli calls a streaming database analytics where the data does get terminated but stays in memory but you can now start queries.

Uli discusses modern analytics using the street behind him to illustrate his point.

A way to think about stream analytics is to envision you are standing in the street and look at the cars come by and count how many red cars show up in a period of time which is an example of stream analytics. At the moment I am just looking at the number of red cars that pass me and when they are gone, I have no memory of where they're going or what they're doing. I just see them at this point in time and count or do something with that data. If you go into a streaming database what happens to the cars get directed to a parking lot; in this case it's a hot parking lot, meaning it's in memory and a great database that does this is Azure Data Explorer. That ultimately is a streaming database because you stream the data in, it gets terminated, but it stays in memory, and you can start doing queries on top.

Why do people want to do this? Because you want to compose it with other data, you want to have intelligence query instead of just looking at a point in the street. You are now looking across the parking lot and see what happened, but you still do it in memory and then you persist the data. Then you run, for example, a spark analytics job over it to really go deep. And so not only going to go look at the parking lot that I have created, I'm also looking at past parking lots and other data that I'm bringing. Then you might end up putting some of that data into a data warehouse because you want structured reporting, KPI's, whatever business analytics models you want. So, from Uli’s perspective, you have this hot data, very localized. Then you increase the scope and the capabilities of the analytics and then you go deep but it takes longer for big data. Then you might end up in reporting and all of that has various outputs, with business analytics being one report. But analytics models using AI where the big data system says actually the pattern, we were promoting turned out to be wrong, and if you're not using reinforced learning, which ultimately has a little bit of change flexibility based upon its reward system, then you have to redeploy a new model if you learn something new and so forth. So Uli thinks that's really the big change in data, which focuses now on analytics rather than “just record keeping.”

When is the right time to use stream analytics and when should you employ prescriptive analytics?

Eric added some points for consideration from the historical overview which Uli provided of which is where we were and where we are now. Eric grew up in the Ralph Kimball days of star schema, relational databases and ETL and it's amazing where we are now. The questions that some of our viewers might be asking are “how do I know when to do a stream analytical and hot temperature versus a colder temperature?” If you're trying to answer questions such as “what just happened, what is happening, what's about to happen and what should I do?” Those span the gamut from real time analytics to prescriptive analytics in real time, close to the edge, close to where the data is being generated and in certain scenarios for our certain customers in certain verticals, having that their matters.

What the underpinning is if the data is hot and I don't have a whole bunch of reference data to be able to apply, to make inferences, to understand what's happening, so what do I do? The question then becomes there are certain things we can do; we don't necessarily have to always utilize AI but can. So, AI can do probabilistic inferencing to say, “hey, I don't have hard and fast rules about this trend, but over the last 30 minutes I've seen things going in the wrong direction and I don't think that that's the way things should go. Let me tell a human being about that or somebody about that.” The other approach is just hard thresholds. The temperature for this particular oven should never reach above 40, and now it is above 40. Let me go tell human beings that something bad is about to happen. So, there's different types of inferencing at the hotter end that you can use to drive those actions.

Contextualizing data coming from the edge is key.

Conversely, there's also this concept that's growing now called ephemeral storage, in which we take some of that date, for example, time series data, and we persist it in the edge analytical platform and we can do things like look at the past hour or the past shift, compare shifts together and then through some policy or rule, that stuff gets slid off, ideally to the cloud. Once that data is there, we can do those longer retrospective historical analytics that Uli was talking about as well. The other trick that we probably need to do is to help contextualize the information coming from the edge.

Some of these assets and these messages coming from these assets are important. They contain important information, but it's not enough information. So, if an oven is having a problem, I need more information about where that oven is, what plant it's at, where is it at physically, what part does it play in manufacturing, so that I can gain some insights, do something else about it or even send people to remediate the issue. Without that, I only know that it's experiencing problems. So, we end up having a world in which we can take some of that historical information, package it up, move it closer to where the data is being generated and utilize it in a data fusion process to contextualize telemetry coming from these devices to transform those telemetries into events.

Continue reading How Data is Changing – Part 2.

Recommended Next Steps:

If you’d like to learn more about the general principles prescribed by Microsoft, we recommend Microsoft Cloud Adoption Framework for platform and environment-level guidance and Azure Well-Architected Framework. You can also register for an upcoming workshop led by Azure partners on cloud migration and adoption topics and incorporate click-through labs to ensure effective, pragmatic training.

View the whole video at the links below.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Armchair Architects: How Data is Changing – Part 1