Microsoft Azure AZ-800 — Section 19: Configure Windows Server storage Part 3
147. Configure Data Deduplication
Windows servers also has a pretty nice little feature known as data deduplication now data deduplication, the idea there is that in a lot of cases on a on a volume, we have lots of data that, you know, remember you break data down into binary and you have lots of data that’s just duplicated, lots of files, a duplicate, lots of chunks of data. They get duplicated inside files. And so what data do duplication is going to do for you is it’s going to index your files. It breaks your files up into different blocks and then it can be duplicate chunks of data that are being duplicated over and over and over. And this going to really conserve disk space, especially on a like a big file server or backup server, or even in virtual desktop infrastructure. VDI environments where you have virtual virtualization going on and you have a bunch of virtual machines that are being backed up, for example.
So this a great little feature, but of course, the first thing you have to do is turn it on.
So here we are on N.Y.C. server one. I’m going to open up server manager. I’m going to go to manage ADM’s and features next, next and next go to the file and storage services and then file in ice custody services. And we’re going to turn on data duplication right here. We’re going to click next, next and install. And of course, I’m going to pause recording while this being installed.
OK, after that’s done, I’m just going to hit close and here in server manager, I’m going to go to file and storage services and we’re going to click on volumes.
OK, so this where you’re going to turn this on. Keep in mind, you can now that you turn this on. You can actually enable data deduplication when you’re creating a new volume. In my case, I’ve already got another volume here called E! And so from there, I can right click that and you’ll notice that I can click on configure data deduplication. And keep in mind that’ll be great out unless you’ve installed the feature. From there, it’s going to bring up this little wizard that’s going to basically let me turn this on if I want to enable it.
Now there’s three options for enabling it. I can choose General Purpose File Server. This if you are doing this for a file server, and that’s the main job. If you got a VDI environment, virtual desktop infrastructure with virtual machines and you’re want to use this in regards your virtual machines. This option wouldn’t be the one you would go with, and then you also have virtual backup server. This geared towards if you are using this for backups.
OK, so sort of like a backup server or some kind, but in our case, we’re going to go with general purpose file server.
OK, so next thing you’ll notice is that we have the ability to exclude certain files, but generally speaking, you want to exclude databases. You really don’t want this thing trying to duplicate databases because usually they have a high transaction amount and any kind of file is being changed a lot. It’s not really a great idea to utilize dated education on that. It’s not that you can’t do it, it’s that it does affect performance so, we can exclude certain things here that we want to exclude. I’m going to go to my file explorer real quick. Here we’ll go to the iDrive, which is what we’re turning this on with, and we’re going to create a folder called D2, and we’ll create another folder called No D2. All right. And then from there, inside the folder, I’m just going to create a text file called datand. We’ll to say this a test. All right. And we’ll just save that file. Close out of that. All right, and we’re going to go ahead and exclude, so, we’ll just click on Exclude right here and we will exclude the no d-do folder, right? So that folder is no longer being viewed, OK? Also, we can set schedule. All right.
So, if we want to set a schedule, enable background optimization, this tells you that it’ll basically regularly run the data duplication at a low priority and Paul’s day to day vacation when the system’s busy. You can also do enable throughput optimization. They tell you during the certain hours it’ll run the deduplication at normal priority and then so you can set what that’ll be. And if you want to have like a secondary throughput authorization, basically it’ll it’ll run during those hours as well.
OK, so you kind of have a, you know, a primary schedule and a secondary schedule, but that you can set that on, right? So from there, I can click, OK. All right, and I’ll go ahead and click OK again, and this going to go ahead and turn this on. And we’ve now officially got the deduplication enabled on our three volume.
OK, now I’m going to go back over to File Explorer and. Take a look at my drive again.
So there’s my drive I’ve got to do, I’m going to copy this text file over to the no do you do? So, we’ve got two copies of it, one in the D2 folder and one in the no do folder, right? So, we’re going to go there.
Now, I don’t want to wait on the schedule for this to happen, so, I’m going to open up PowerShell and right click Power Start, go to PowerShell. All right, and I’m going to run a command that’s going to go ahead and trigger this deduplication process. We’re going to say start -do job -volume and it’s going to be the colon slash and then we’ll say optimization.
OK. We’re going to go ahead and hit Enter, and we’re going to let that run.
OK, and then from there it should be it should be now officially activated and the application process started.
Now the problem is you’re not really going to see anything here. And there’s a couple of reasons for that. First off, we didn’t really store much data. Secondly, if we right click here and click Configure data duplication. You’ll notice it says duplicate files older than three days.
OK, so obviously we don’t have any files over three days, So, we’re going to set that to zero, OK, and we’ll go ahead and click OK and apply that. All right.
So now it can actually it can actually duplicate stuff that is not necessarily older than three days. All right. The next thing I’m going to do, though, also, Is I want to store some data in there that’s a little bit bigger.
So, if you go to Google and you search for VLC Media Player, you can download the VLC media player, which I’ve already done and stored it right here on my downloads folder, and I was going to copy this. And first, we’re going to just paste that into the data folder here, and then I’m going to create two more folders de do two and three.
OK, just to kind of demonstrate this or paste this here, and we’ll paste this here.
So now we should have three copies of the VLCC media player. All right. There. And they’re OK. And I’m also going to copy my data file. Same data in it.
OK, so, we should have three copies of both of those files now. Excellent. Currently, I did not put a copy in the no do folder, so there’s a copy of data, but not the Velshi file.
OK, so that’s fine. And so now we’ll go back to PowerShell here and we can run this command here to start the optimization again. And then also, if we Typekit -Duke job, you can see that this job is currently running. And So, it’s going to take it just a bit of time and to finish that up. Says progress, okay, it looks like it is officially completed now, so, I’m going to go up to the little refresh button here, we’re going to go out and click that and give that just a moment to refresh.
OK, so as you can see, it’s done, and it’s telling me that the infection rate was at 12 percent and savings was 204 megs.
Now what we’ll do is we’ll go back in here and we’re going to copy this B or C player and we’re going to paste this in the no do folder, which is not being duplicated, right? And we’ll go back down here to PowerShell. We’re going to run the command again. Start to do.
OK, well, then we’ll tell it to get the job just to see if it’s done.
OK, not done yet.
OK, it’s done. Progress hundred percent.
OK, it’s complete once it’s no longer applying, it’s complete.
So what you’re going to see is drum roll. Nothing. Actually, it shouldn’t affect this number at all, because remember, this only showing you what’s being duplicated.
So since I’ve put that data in the node folder, it’s not going to affect it.
So, we’re just waiting on that to finish processing it. And hopefully, if I’m correct, this should not change. And that, as you can see, it has not changed as of yet. Still, loading data looks like it’s complete now, and it did not change.
So as you can see, deduplication is working like it’s supposed to, and as you can see, it’s pretty easy to configure for easy to setup, and I encourage you to give that a shot if you never done it before.
148. Configure SMB direct
So another nice feature that we can implement on our servers is a feature called SMB Direct.
Now, SMB Direct is actually a feature that works in conjunction with a hardware based feature that your network adapter card can support. That features called Imay, which stands for remote direct memory access, remains a feature that network adapter cards can use that allows your network adapter card to interact directly with RAM on your server, instead of things having to process directly through your operating system to get to memory. It allows the network adapter card to communicate directly with RAM. And so this can really increase your bandwidth between your machines. As far as transmitting data over SMB, which is server message block, which is what Microsoft’s main file server protocol is.
Now, the first thing you got to understand is that your physical network adapter card must support this, so you need to find out what kind of physical network after cards you’ve got in your server and then make sure that it supports it. If your physical network after card supports it, then if you are using virtual machines, the virtual network adapter cardinal support it as well. All right. All right, so let me show you how to turn this on. It’s actually pretty easy to get working, so the first thing we need to do is go into device manager on our server.
So, I’m going to right click my start button and I’m going to go to device manager. Next thing to do is to find my network adapter card and right click Network Adapter Card, go to properties, click Advanced and scroll down. You’ll see Network Direct. You may. We’re going to turn it on there.
OK.
So then I’m just going to click OK, and then I’m going to go into PowerShell.
So right click Start. Go to PowerShell. All right. And once PowerShell is officially loaded up, I’m going to run a command called Get Net Adapter, Artium hit Enter on there and it’s going to tell you that currently I have this turned on on one Ethernet, but I do have a virtual adapter that’s being used in conjunction with Hyper-V.
So, if I want to forest that one to be turned on because of course it doesn’t show up inside of my device manager, I can run a command to do that. Called Enable -Net Adapter, I already am.
OK.
So already may and then I want to put in a name and then I’m going to go in there and put in Ethernet. Actually, it’s v Ethernet, isn’t it? The Ethernet, Nat? All right. And then from there, I should be able to hit Enter and it should be able to turn it on and we’ll just get that after and it’s now turned on.
Now again, you’re not going to see any difference unless, of course, you’ve got this supported by your hardware. But if your hardware supports it, you’ve now got it turned on. You would be able to transmit files over SMB and and utilize that extra performance.