Microsoft Azure AZ-801 — Section 12: Implement disaster recovery by using Azure Site Recovery

Microsoft Azure AZ-801 — Section 12: Implement disaster recovery by using Azure Site Recovery

74. Understanding Azure Site Recovery

I’d like to now go over the concept of Azure site recovery.

So, when you create a recovery services vault, one of the options that you’ll have available to you is to utilize something called Azure Site Recovery.

Now, this is all about adopting what’s called a BCDR, which stands for business, continuity and disaster recovery. All right. And the game plan here is to plan for a scenario where an entire location can go down and you would lose access to apps and workloads, virtual machines, whatever it may be. So, you can use the Azure Site Recovery Services to replicate your Azure virtual machines between different regions.

For example, I could have some virtual machines and resources that are in the East US region, for example, and then I could have that replicated to the West US. All right. Another great thing about this is you can have on-premises VMs and what’s called Azure Stack VMs as well as physical server data that’s being replicated into the cloud as well.

So, an entire location could go offline and you could still be up and running. And with the proper services and equipment, you can even have traffic that is routed to these locations in a situation where there’s an outage.

So, for example, I might have some kind of a web service that’s running on a VM that’s in use, and I’ve got a replica of that over on West US region and traffic is all flow into the East US. And then if the East US region wants to go offline, then everything would be routed to the West US. So, that’s one of the great things that you’re going to get with this.

Now, what is exactly does Azure site recovery provide? So, the BCDR solution providing that site recovery is the main goal of this whole thing. So, the main goal is an entire site could go down, we could still be up and running. It does Azure VM replication, it can support VMware replication, you can support on-premises VM replication. So, I can have on-premises like Hyper-V, VMs and all that replicated out to the cloud and other workload based replications. So, your different services running in Windows or Linux or whatever it may be, this is going to provide you with data resiliency. All right.

So, basically you have what’s known as the ability to do site recovery orchestration, which is going to replicate information without having to enter, intercept any kind of application data or any of that. It’s just going to have straight replication. All right.

Now, another thing is we get RTO and RPO capabilities with this. It’s going to improve your RTO and RPO.

Now, if you’re not familiar with that, the concept of an RTO in a company is a recovery time objective, and it involves how long it takes to get things back up and running when there’s some kind of a failure. So, you’re going to be lowering that down. So, if a virtual machine goes down your recovery time, your automat is going to be, maybe, 30 seconds, you know, there might be a delay. If somebody was connected into a virtual machine, there might be a slight 30-second delay max. And that’s max. Like I said, it usually doesn’t even take that long.

And then what is RPO? RPO is a recovery point objective, and that gets into being able to recover your data in case of a failure. So, not just talking about like virtual machine replication, but imagine you had a failure of a location go down if your data is being backed up to this other site all the time, even if it’s just asynchronous replication where data gets replicated in one site first and then immediately replicated or the other, your RPO is going to be in some cases it could be milliseconds to, maybe, just a few seconds. So, any data that’s being transmitted, if a site goes down, you might lose a couple of seconds of data, perhaps that was being transmitted. But ultimately all your data will be in that other site. So, you’re not really losing a whole lot going on there. So, that’s the idea of RPO. All right.

The other thing, of course, is keeping apps consistent over failover. Everything kind of stays in sync, right? And you have the ability to test all of this, which is really nice. You have a lot of flexibility over any kind of failover that occurs. You have lots of configuration. You could do lots of settings and things you can change and you can customize your plan for this. You have what’s called recovery plan. You can kind of customize how you want that to be laid out. This can also be integrated with SQL servers and things like that.

So, you have a lot of this information, data and all of that in your back end environment could be backed up into SQL. All right. And this can provide failover solutions for various types of workloads because our soft utilizes what is known as zone redundancy and availability groups and all that, which means you’ve got multiple replicas of data stored on different racks of equipment in the data center itself. But you’ve also got at least three data centers, usually in each region, and you can have replicas of your data in those three data centers.

So, not only are you dealing with data being replicated amongst different locations that can be hundreds or even thousands of miles away, you’ve also got multiple redundancy in the same regions. All right, which is really nice. Another thing about all this, it’s all automated, so everything’s got what’s called automation integration. So, there’s not like anything manual here you’ve got to deal with. It also supports network integration.

So, the Azure networking like VNETs and all that you have in Azure is going to work perfect with this, but you can also link your on-premises environment into the cloud using something like a VPN gateway or ExpressRoute or something like that, and you got a direct connection into the Azure environment.

So, what exactly can be replicated? You can replicate Azure VMs from one Azure region to another. You can replicate on-premises VMware VMs, Hyper-V VMs, physical server data, Azure Stack VPNs, you can even replicate AWS Windows instances. So, Amazon Web Services, you can replicate your on-premises VMware VMs, Hyper-V System Center, Virtual Machine manager information, as well as your physical servers to other locations. And you can support the majority of the regions that we have in Azure are supported. And as far as replicated machines goes, Microsoft provides the ability of pretty much replicating cloud based VMs as well as those on-premises you can see above there. And as far as workloads goes, you can pretty much replicate just about any type of workload, meaning it could be some kind of a service, some kind of data. Azure can support just about anything as far as hosting something in the cloud.

So, if you’re wanting to get something from the on-premises environment out to the cloud, then this is going to be a great starting point for setting all of that up. All right. Another thing about this that I think you’re going to find is it’s very easy to configure, very easy to set up and manage, though. You definitely want to check out the Azure Calculator website so that you just go to Google or being in type Azure Calculator and you’ll find it. But you can get a better idea of the cost of this stuff. All right. Because of course storing all this stuff and the Azure services that involve replication does call some money. All right. But hopefully now that gives you an idea of the concept of this Azure site recovery capabilities.

75. Configure Azure Site Recovery networking

Let’s talk about the Azure disaster recovery networking site of things.

If you’re going to utilize site replication and all that, or, maybe, you’re going to be replicating virtual machines. There are a couple of things to consider with this, so I want to talk about that. And Microsoft has a nice little article about if you are configuring this to support your on-premises environment. This article kind of outlines a bunch of stuff that’s important and this is what I want to get across to you right now. By the way, you can get this art. You can find this article just by doing a quick Google Bing search on these keywords here, Networking and Azure VM disaster recovery. But here’s a typical look at an Azure environment you’ll have in your Azure environment. You’ll have, maybe, maybe, you got a couple of virtual machines there on a VNET. This is in the East. US as you see here, a couple of virtual disks that are tied to those. And we care about the redundancy, right? We care about what if us goes down or something like that. The other the other thing we can think about is if our company is connecting an on-premises network with Azure.

So, if we’re doing that, there’s two main ways we do that. We have an on-premises network here and we connect usually with ExpressRouter, VPN, get a VPN Gateway is a solution that would connect our on-premises network to the cloud using VPN equipment or preferably if your company can afford it, you want to go with ExpressRoute. That’s a dedicated telecommunications link that you can set up. You have to work with the telecommunication provider, but they can establish a direct connection with Azure very-very high speed and it’s all private and all that.

So, if you’re doing that, maybe, you’re using Azure as your secondary site and so you’re wanting to have replication occur from on-premises two to Azure or vice versa, you could have Azure virtual machines replicated on-premises or you can do Azure site, which is one of the most common solutions. The big thing I want to get across to you here, there’s two main things. Number one, you’ve got to make sure that if you’re if you are doing this on-premises, you’ve got to make sure that these URLs, these domain names and all that are not being blocked by the by your firewall. So, that’s the first thing of note. So, notice all these here have to be open on your firewall going out.

The second most important thing here is if you’re doing like Azure to Azure, your virtual machines on your virtual networks, you’ve got to make sure that you don’t have an NSG that’s blocking port 443, going outbound. So, outbound 443.

So, let me let me show you what I mean. If I go to portal.com and I click the menu button here and I’m going to go to my virtual machine, I have a virtual machine here called VM1 right now that, that I created earlier. This might not be something you have in your environment, but all you need is a VM2 see what I’m talking about. And if I go to networking, I can see that I have a network interface linked to this, and if I click on that network interface, I can see the IP configuration and all that. You can see that you’re using this IP address and there is this thing called a network security group. A network security group is what controls the IP filtering for the NIC. So, if I click on that currently, I don’t have a network security group. So, it’s not really it’s not performing any kind of filtering or any of that in this case.

And ultimately, though, with a lot of virtual machines, you are going to have a network security group that’s automatically enabled and you got to make sure that it is not blocking port 443. In fact, if I just go and let’s go to all services and we will let’s just do a search for network security group. You’ll see there is network security group right there. All right. And I do have one right here and right here is inbound outbound. So, you got to make sure that outbound is not blocking port for three. And as you can see right now, outbound is not it is allowing any VNET out on any port allow all Internet on any port.

So, we are fine. But if somebody’s going in there and they’ve kind of tweaked any of these rules and you’re blocking a bunch of ports, you’ve got to make sure that port four, four, three, which is 82 HTTPS is not being blocked.

So, those are the two main things to keep in mind. If you’re going to start up a virtual machine replication of any sort for on-premises, you got to make sure all this is open port for four, three is open, and then for your vignettes and all of that for Azure application, you got to make sure that port 443 is open on the NSG. All right.

So, those are the considerations in regards to doing a site recovery with Azure.

76. Understanding recovery plans

Something else that you can set up inside of a recovery services vault involving site recovery is called Azure Recovery Plans.

So, what is an Azure recovery plan? This is something that allows you to pull your virtual machines together into what are called recovery groups. All right. Recovery groups allow you to kind of group machines together so that you can kind of set a systematic way in which the machines should fail over and come online in a scenario where you had disaster recovery going on.

So, in other words, if a site goes down and another site is taking over, there might be a certain order in which things have to be have to come back up. And so these recovery groups that you’re setting up will be different units that are going to be representing this whatever app or what, like, for example, some kind of a web app of some sort like let’s say like a sales app or something. It’s a website that does sales takes, takes orders and advertises and things like that. You would have a unit that would represent that. It might be a group of VMs that have to come on, or it might be that you have multiple groups, but they need to come on in a certain order.

So, the recovery plan process, the way this works is you’re going to find how many machines are going to be part of the failover and set a sequence for those in which they’re going to start with recovery plans. You’re going to be controlling both the failover process as well as fell back.

So, a situation where my site has gone down and I’ve failed over to the second site and then what happens at the first site comes back online, we can we have a scenario in which it fails back, right? You’re allowed to have up to 100 protected instances as part of a recovery plan.

So, you have up to 100 instances of your resources. All right. You can also customize it, put things in any order you want, set things, set whatever task and things you want it to perform in whatever order you want it to perform. You can set all of that up yourself when you’re configuring this. All right? The other thing you can do is you can go through a test of the deployment, so you can actually run a series of tests where you fell over to the other site and then you can fill back and make sure all that works like it’s supposed to.

So, why should you use a recovery plan? Well, one is to help model the way that a certain app would work. Again, if you’re dealing with a web app, maybe, it’s a bunch of virtual machines that are producing a website of some sort, maybe, a sales website. You model the servers and point out which virtual machine or whatever does what., maybe, there’s a set of virtual machines that deal with the website itself., maybe, there’s a set of virtual machines that deal with the database and you would model that out and into groups of machines through this recovery plan. You can then, with the help of all of that, you can automate so fail when a failure occurs, automate the process of it failing over and then failing back. Right. This is going to help protect you by reducing your auto, your recovery time objective, meaning how long something is offline. And again, you can verify all of this. You can run, test and verify that it all works. And finally, here’s kind of an example of this.

So, I’ve got this picture sales app here. This is sort of talking about the processing order and as you as you see there. You can have different groups, so you have different machines. In this case, we’ve got a SQL Server, a sales app controller and a sales front end. And these are all these are different virtual machines. And so notice how it’s broken up into groups. Group1, Group2, Group3.

So, what would happen is a group one would need to start up first because that’s your that’s your SQL database, right? So you want your database to be online because the other two virtual machines are going to want to query that database. Right? And so group Group2 would start after Group1 started the sales app controller and of course group three, the front end, which would be the website itself, would have to talk to the web app controller. You would want it to start last. So, that’s that gives you a kind of a visual look at what this would be like.

Now, unfortunately, this can get a bit pricey and this can also suck your Azure credit up pretty quickly if you’re if you’re going to play around with it.

So, just kind of a word of the why is there if you start playing around with this whole lot, but definitely something you can check out and a pretty neat idea and hopefully that gives you now an understanding of what recovery plans do for us.