Mastering Active Directory Health: Dcdiag Troubleshooting Tips for IT Pros

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
foreign [Music] ER pool and this channel is dedicated to I.T students I.T professionals and anyone who enjoys learning technical subjects [Music] thank you I don't know of too many things that can happen to an I.T professional like coming in on Monday morning and the first thing that you're told is active directory is not functioning correctly and no one can log on you talked about pressure and ruining your day that'll do it keeping a healthy active directory and being very proactive to Diagnostics and troubleshooting is key to preventing that scenario that I just mentioned in an Enterprise that's working with active directory so much runs on active directory so many systems depend on it and when active directory is not functioning right you've got this cascading problem of all these systems just coming to a grinding Halt and because of that if you're not fully aware of how to troubleshoot active directory these problems in combination with the errors generated by active directory make it very difficult you end up with fog of problems and how do you drill down and find the right Avenue to troubleshoot instead of getting LED off in a direction say troubleshooting certificates when the problems or application you really don't want an active directory failure because it literally can bring your organization to a halt whether you like it or not and whether I like it or not as an I.T professional active directory is complex and when failures occur it can be extremely stressful on the admin now active directory cannot function without some components number one it cannot function without DNS it cannot function without a functional RPC remote procedure call protocol it cannot function without ldap working and it cannot function without Kerberos now in our look at how to maintain a healthy active directory free and troubleshoot effectively this is the lab scenario that we're going to do I've got a forest homelab.tech Savvy productions.com I'm going to have three domain controllers on a site so I'm going to try to keep it simple so if you understand the simple procedures it helps a lot when you're trying to get more and more complex now the good news is there's many active directory diagnostic tools but we're going to focus in on one dcdiags.exe the reason why is it's so extensive and if you don't understand the test that it can run and the results that it gives you and what those results mean it doesn't matter how many tools you have you better learn one or two really really well so that it can at least get you started on the right track for troubleshooting so we're going to focus on dcdiet.exe and analyze it and examines the status of your domain controller and multiple domain controllers it requires learning and understanding the test that it proceeds to do and also understanding the results the results are key to getting you in the right direction for troubleshooting now dcdiags.exe requires an elevated command prompt or a Powershell with administrative rights let's get started so let's get started with the utility in the syntax the parameters and how to apply them so that we get the Diagnostics and the results that we're looking for here we're going to run DC diags forward slash S colon that specifies that we're going to run these utilities on a specific domain controller and we'll put the name of that domain controller forward slash V is verbose which gives us the most amount of information in our log file or our text file or on screen forward slash F colon is going to allow us to designate a path and a file name in which we're going to save all the results of our test in each particular file so here's my domain controller and I've got my command prompt elevated and I'm just going to paste my command and notice it executed it already I'm going to put it in the DC diag folder in the C drive and I'm going to call it DC diag underscore report.log so let's open up that directory and there's my report and now I can double click now I can take notepad and start walking through the report and here's the the critical part is being able to understand what I am seeing in the results which should help get me in the right direction for troubleshooting this is one of the first tests that's that's begun on your domain controller against active directory using DC dag and it's called advertising and it's very important that we see what is taking place what are we looking for and of course in the case of this advertising test this is very important we see that this particular domain controller passes the advertising test it sees that the domain controller sees itself as a DC having a directory service this domain controller sees itself as an ldap server so we can look at the results and see that it passed that test we know that that portion of active directory seems to be functioning okay now if you notice right below where it said we passed the advertising test it had two tests shown here that were omitted so by the user request and I'll explain that in a minute so we can ignore those for right now it began by starting a next test which is called FRS event and basically it's looking at a log for the file replication service and it skipped this test because we're running dfrs now our next test is the dfrs so here we're starting the event and this is the dfr replication event log and basically we're looking in the last 24 hours to see has there been any events related to this service and we see a couple and we will have to drill into those to see are those problematic are those things we want to look at but it's bringing events to our attention right away concerning that service now great best practice is while you're making these reports start using the date in the file name so you're saving the date as part of the file name and also save all of these copies so that you can review for past results it's really helpful to go back previous reports to see what were your conditions and your errors in previous reports versus where you are right now now here in this particular DC dag syntax we're going to add forward slash a which is going to execute the dcdiag.exe against all the domain controllers in the site we're going to keep the verbos we're going to keep the forward slash F colon and we're going to save our report to a file so here's my domain controller let's go ahead and copy the command and we're just going to let it run and it should create a file with the date in it this time and then we can look at how all three domain controllers are responding to this diagnostic okay our Command Prompt is back let's go to our folder and we see now a file with the date and we can double click and take a look and here we can begin to see that it's testing all of our domain controllers here's dc1 here's dc2 and here's DC3 so now we've running the dcdiag.exe against all the domain controllers in this site now this presentation is very heavily dependent on you getting the video notes in the video description of this video you can find the link to the video notes in PDF or in Microsoft Word you need those there's no way that I can go into all the details of each and every test and explain in detail everything that you're going to see in your diagnostic report so that you have a clear understanding of what that test is telling you keep in mind when you have significant active directory failure it impacts everything you're going to have air is in the cloud you're going to have errors on your file server you're going to have phone calls concerning all kinds of Enterprise activity and if you waste your time doing all these other things and you haven't done your due diligence on running these tests on your active directory is my active directory good or do I have a problem before you go chasing all these fires that are burning in your Enterprise you're going to waste a lot of time chasing after a file server that you're having login problems when bottom line has been active directory all along get the video notes now honestly I'm not going to be a domain controller logging on opening up an elevated command prompt and running DC diags on my domain controllers now I'm going to automate it I'm going to put it into a scheduler and I'm going to run it on a daily basis and just once a day pop up the log file if I have a chance and take a look at that log file and just see is there anything that stands out is there anything that I can proactively look at right now because I know anything that's going to happen in active Victory is going to impact a whole lot of other things so here you can see I've put this DC diag with my switches and arguments in my task scheduler Under start a program now you can take your switches and put them over here in the at arguments area if you want to it doesn't matter it will prompt you and say do you really want to do it that way I do it that way and then I'm going to run it daily and I'm going to run it early in the morning because I don't want the test being run when people are trying to log on at 8 o'clock and I'm not going to go over task scheduler but you've got you've got a lot of Options under conditions and then under settings you can all tweak those to your Enterprise needs and your company's needs if I need to run DC diag I can just go to task scheduler and run on demand go look at that report at that time and of course occasionally go to your task scheduler look at your history make sure that this is running on a daily basis now here on this slide shows you a list of all the tests that you can run either individually or run all together with DC dying I typically run all the tests every day just so I can see what's going on you can see we're doing advertising we're going to be looking at check SDR reference Dom but having all these tests and not understanding what they tell you is worthless so in the notes and I'll spend some time in the video going over and looking at what are they telling you what is this test and then it has tremendous value so here's an example in my notes as I show you the detail that I'm going and to help you better understand each of these tests let's look at the advertising test here in the notes I talk about the advertising test is all about checking the net logon service the net logon service on your domain controller is critical for creating SRV records in DNS it then goes into the DNS itself and checks to make sure there's an SRV record for ldap there's an SRV record for Kerberos there's an SRV record for Global catalog and then it continues on and looks at your event logs related to those issues and make sure that nothing is showing up in the event logs indicating these various tests so that's what the advertising test is telling you what are some of the problems that cause these advertising test failures primarily it's your net logon service so check that net logon service make sure you've got good network connectivity and it may indicate improper or incorrect DNS configuration okay let's begin to look at a series of tests that are run by DC dag now we've already looked at the test called advertising so I've already talked about that what I'm going to do is walk through these last four briefly describe them then jump into what they're actually testing and then come back and do a little Deep dive on each test so we have the check SD ref Dom wow okay that's going to look at the application partition and active directory we'll get into that in a minute then the cross ref validation that's going to look at another partition in active directory called the configuration partition then we got two FRS tests one looks at the events and one looks at the sys Vol which is a very important share now if you have a server that's 2016 or above you're not running FRS you're running dfrs which is the newer version when this FRS test runs it's going to say oh you got a 2016 server forget that run the dfrs now active directory has partitions they're also known as naming context so when you think of active directory it's actually broken down one way you can think of it like a file cabinet and you have drawers in that file cabinet so partitions are like the drawers where you organize your data this large file cabinet and you have a drawer and that holds specific information another drawer will hold another group of information now with active directory two partitions are required and the rest are optional so when we think of active directory the most important partition the two that are absolutely required is one is the domain partition that's what we generally think of active directory that's where you're going to put users groups computers the things we think of normally as active directory the schema partition is also very critical because it sets the rules to describe all these objects such as a user he has a password a username a telephone number etc etc the key thing about the schema partition is it has to be absolutely consistent across your Forest so I can't have the definition of a password over in this domain as something and over in this domain we Define a password as something else so schema must be consistent across your Forest so the two required partitions domain schema then the next two is configuration which holds all the information about the physical structure of your domain I'll get into that one in a minute and then application this is optional an application partition is only generated when you have ad aware applications like exchange that really have to be involved in active directory so they're going to create a application partition and put all their stuff in that partition now you can have other partitions depending on the complexity of your environment just be aware you can have other types of partitions like in this case the schema directory partition I'm not going to go into it now one of our tests that we're talking about now is going to be looking at the CIS fall share every domain controller shares a folder and the name of that share is called sis Vol and in that chair is going to be Group Policy objects and we're going to make sure that they're replicated across all our domain controllers we're going to have logon scripts if that's what your organization does and we may have shared system files such as administrative templates or system policies in that cisfall share FRS the file replication service or distributive file replication service has to make sure that what's in this domain controller is in this domain controller and that domain controller and they have to be consistent so we used to have FRS since the early adoption of active directory we used it up to 2008 and then Microsoft introduced dfrs which is distributive file system this is a powerful service that's going to keep all these files and systems consistent across domain controllers dfrs enables you to synchronize folders on multiple servers on local and wide area networks this service also uses remote differential compression protocol to update only portions of file that have changed since your last replication what is the check SD reference Dom test going to do well one is going to do a deep dive into the application partition in active directory we're going to have application specific data stored in that partition and we have to verify that users and groups and permissions and access controls that are in that database are consistent across the entire Forest so that we don't crash and burn an application because in one domain controller a user has this rights and another domain controller this user doesn't even have rights to the same application data now the cross-reference validation test is really a complicated test this test is going to look at a lot of object attributes in this configuration partition we've got import information about the physical structure of active directory things like sites and subnets Trust relationship between domains to domain controllers now this is going to test for a specific object and it's called cross ref it's going to look at attributes of these cross ref objects in the configuration partition any inconsistencies or missing cross ref objects it's going to fail this is an ugly test to fail because you've got probably serious active directory issues so hold your breath until you see that pass so here I'm looking at my DC diag log file and here you can see I'm starting the FRS event test it recognizes this is a 2016 and I'm not running FRS I'm running dfrs so it immediately switches to that test and it begins looking at my event logs and you can see I've got an event ID here I've got an event ID here these are different events that have happened in the last 24 hours let's look at the very end of this test it captured a few other events and notice here this event failed the reason it failed was because I had turned off the other domain controllers so it couldn't file replicate so I deliberately did that so you could see a failed dfr event because I had turned off the other domain controllers this one was the only one running and it failed the test all right let's take a look at the next five tests we'll quickly look at each of those tests then we'll go into active directory components that they're actually going to run the test on and then we'll go back and we'll actually execute some of these tests and see how they behave the first one is fsmo check the next one is intersite check KCC event again we're looking at event IDs for KCC then knows of rolls holders test and then the machine account machine account tests for that secure Channel between a client and a server now active directory has what's known as fsmo roles and these are flexible single Master operation roles there's five of them infrastructure Master rid Master primary domain controller emulator schema Master domain naming notice some of them are districtly domain and some of them are Forest wide by default all roles are assigned to your first domain controller so in my lab when I created my first domain controller it got all of these roles on the one domain controller now you can move those five rolls to any domain controller you want but if you don't the very first domain controller that you create is going to have all five of these roles now this is a command that you can run to find out who has these roles it's net Dom query fsmo and it will in my case you can see all of the fsmo roles are on my dc1 domain controller knowledge consistency Checker or KCC because active directory uses this distributive file replication service the KCC is a component of that to help develop what is known as logical connections that create spanning trees between domain controllers for efficient replication the goal is to keep all data across active directory consistent yet using as few network resources as possible now KCC is a logical connection not physical we are seeing already DC diags runs a lot of tests on this replication of active directory now back to our five tests now when DC dag runs these and we look at the results of these tests in the report it basically says fsmo check running past intersight running past it doesn't give us a lot of information it's not like in your production environment you want to start turning off domain controllers to see what will happen you don't want to do that that's the beauty of having a home lab because I can do that and we can see not only the success but let's see what happens when they fail now when I first looked at this test I thought it would check the flexible single Master operation roles on the domain controllers according to the description of this test it's going to check a Time server it's going to check the PDC which is the only flexible single Master operation role that it checks it's going to check the KDC and a few other things and that's it so let's run it so we'll see it run some basic initialization testing of the domain controller that it's on it looks at these partitions and then basically it says starting the test and pass the test and that's it doesn't give us any more information than that now I'm going to turn off my PDC which is one of the things that it says that it tests see what happens so I'm going to go into my primary domain controller that's got these roles and I'm just going to pause it so it will not function properly go back to my test and try it again and notice it's sitting there now it's starting the fsmo check and we'll see whether it checks that PDC because that PDC is not working yes and we can see it is attempting to find the primary domain controller role it could not be located and so it did give me the proper results that I expected when the PDC is down which this test is supposed to find so next we're going to do DC diags forward slash test intersight now this checks for anything that could fail or possibly prevent inter-site replication everything I tried all the bugs and problems that I put into domain controls it's still passed so let's go ahead and run it and again we can see it started the test down here and it passed the test even when I go to my domain controllers and I pause this one which is on the same site I'll pause this one and I'm going to come my other domain controller and pause that one run it again and now I have two domain controllers that are not going to participate in intersight replication and it started the test and passed the test so go figure so we're going to run the KCC event Test it ran the test and it passed the test so it did not detect any problems with those two DCS down our next test is nose of roll holder test and this one we should see failure because the domain controller which has all these roles on is off it's starting the test we can see and we should start seeing errors as it can't find any of these roles now don't be deceived by the quickness of the video because when you run this test this is when it fails will take quite a while here we see the first failure and so we see all five roles were not found and the errors indicate that and it did fail that test you can see DC3 failed test the nose of rolls holder test our last test is the machine account test no problems with that it basically tests this domain controller DC3 against the active directory system and it says the secure channel is there if you're watching me right now you're the very person we're attempting to reach with our Channel people with a real interest to learn technical topics and skills because our content is free on YouTube and our audience is a relatively small group of viewers if any of this material is helpful to you we would appreciate your support you can support the channel as simple as liking a video hit that like button because it helps others like you find our content you can subscribe it's an effective way of supporting this and if you can if you're able you can become a member of the channel it's 2.99 a month less than a cup of coffee and we really want your comments and feedback on any video we produce and thank you for supporting tech savvy Productions now I did do a video on rejoining a server to The Domain without a reboot which is fixing that secure Channel or that machine account and it allows you to connect either a client or server back to the domain without a reboot check out the QR code and it'll take you right to that video now why Mr Vanderpool are you taking all this time going through each of these switches in DC diag well when you have problems with active directory it's impacting all kinds of issues the one that we have before us is called naming context security descriptors if you have problems with this it's actually going to cause problems with logons and a lot of unpredicted behavior and Microsoft gives you 14 words now would you rather have as much information as possible about that test and what you can do about it or Microsoft's 14 words I can tell you which one I want now the shorthand for this test is NC sect descriptors they just desc but that's what it is it's looking at partitions and the appropriate permissions for replication and it's looking at all those various partitions you have an active directory and making sure those are set correctly otherwise your replication is going to fail this will impact logons unpredictable Behavior it's typically incorrect permissions network issues such as firewalls if you've got a domain controller across the wand you've got a fire firewall between a DNS misconfigurations can cause replications to fail schema mismatch replication latency the domain control is taking too long to get up to speed and active directory database corruption which is the ugliest word in the country those are some of the things that can cause or show up under naming convention security descriptor test why are the video notes so important because I've actually included event IDs that you can go look at that are tied directly to this test failure our next one is net logons so when we run the net logons test we're testing the net log on service on the domain controller because that's key to adding SRV records to DNS it's also key for authentication processes operations like updating and applying group policies so the service could be stopped or not running properly on the domain controller you can have network issues such as incorrect DNS settings you can have replication errors that do not allow the net logon service to have the information it needs causing the test to fail if the cisfault folder share or the net logon share has improper permissions then you're going to have issues there also file system permissions if you've got NTFS or share permissions that are improper on the sysfall folder again you're going to have failures with a net log on now our next test is objects replicated and this checks again that machine account and the directory system agent DSA object make sure that they're replicating you have some additional switches and parameters that you can use with the object replicated test here again we're looking at replication maybe in some different areas but again incorrect DNS settings we can have incorrect site and services settings we can have domain controller or unreachable we can have active directory database corruption which we don't like and insufficient permissions on objects being replicated now our next test is outbound secure channels again we've talked some about this this will check those secure channels between all the domain controllers and the domain and also make sure that it will run this test even outside of the domain controllers in the site and there's two switches that you can add to this test that can give you some additional flexibility remember secure channels are used by domain controllers to communicate with other domain controllers if that's not working we're going to have issues so what causes some of these secure Channel issues incorrect DNS settings firewalls blocking traffic incorrect time so if you've got a significant time difference between domain controllers or Kerberos authentication it can fail so time is a big one here you can use the video that I just showed you a minute ago that will re-establish those secure Channels with just a Powershell script or a domain controller is unreachable or down remember in the notes I have additional event IDs that you can use to help you dig deeper into this particular test here's our next series of tests that DC diag can execute we are going to go through these quickly and then we're going to look at components that we need to better on understand in order to understand what these tests are going to do against our active directory components first one is replication we've seen a lot of replication testing because it's so critical you've got so much that has to be consistent across our domain controllers that replication is a big component of a healthy active directory rid manager this is another test that's going to look at our rid Master remember those flexible single operation roles one of those roles is known as the rid Master when active directory creates objects it must assign a security ID or a sid component of that Sid is brought in by the rid Master the rid Master gives it a unique set of digits that will make up the entire Sid we'll see we'll see that in just a minute Services obviously there are a lot of services running on a domain controller that are critical for active directory that test is going to look at those Services running system log we're going to look at our system event log and look for warnings and errors to look at the overall health of the system because that definitely could impact active directory on the domain controller verify Enterprise references is going to look at read-only DCS domain controllers that are read only so if you have an rodc this test will look at that and verify things are working correctly in the test verify references we're going to look at backlinks and forward links as it relates to objects in active directory and our last test is verify replica and this is basically focused on application directory partitions that are instantiated on your domain controllers making sure that all of our data is consistent across the domain so let's take a look at some components rid Master remember this is one of the five flexible single Master operation roles that are assigned to the first domain controller in your active directory environment but you can move them to other domain controllers the ridmaster must always be available you can't create SIDS security IDs or objects without a rid Master a good practice is to avoid placing the rid Master on a domain controller that's also a global catalog server so here's a sid you can see it's a long number begins with s which is stands for Sid the one the dash one indicates the implementation of Windows which is almost always once s-1-5 is the authority that created the said in which most cases is the system Authority and then the rest of the said value is a domain identifier it will be consistent across the domain and here's where the red comes in the last portion of the Sid is General created by the rid master and this is where uniqueness comes in the rid Master will give it up to a 10 digit number that will give it uniqueness in in the Sid value SIDS are generally no more than 60 to 100 characters if you would like to find out your Sid and red I've got some Powershell some wmic and a registry setting that you can go look and find out what your Sid and rid are when we do the replication test one of the things can really grow a stone in your replication is what's known as 80 tombstones when you delete an object from active directory database it's marked as a tombstone object instead of just being deleted it provides protection from accidental deletion or replicating of a delete action or a deleted action that is captured doing an ad restore which is ugly now these tombstones have a lifespan they're allowed to reside in Academia for a certain period of time if your domain controller was off for a longer period than Tombstone period of time then you power that active directory up that can cause replication errors because of these tombstoned objects if you put a two domain controllers and a firewall between them you've got to open up a lot of ports so here's a list I have them in the notes of ports that you need to open in order for those domain controllers to properly do their work and allow communication and protocols to effectively work between them active directory requires a number of services to be running and functioning in order for ad to work on any domain controller I've got a list here on the slide as well as in the notes in our system log test in DC diag we're actually going to go and look at system events that have either errored or warning and you can see I've actually created a custom event view that does exactly what the diagnostic test is going to do I'm looking at system logs and I'm highlighting errors and warnings we're going to run a test that will look for read-only domain controllers read-only domain controllers have a read-only partition they have everything then an active directory domain controller is but they're read only they're typically deployed when you have few users or physical security at that site or network bandwidth or everybody on that site has almost no knowledge about it so you just stick a read-only one in there it has unidirection replication it has credential caching let's go ahead and run the replication test and this will give us some problems because I had two domain controllers down for a full 24 hours so we should see some significant problems in the test results now remember I'm using video magic some of these tests take a long time so don't run a test based on what you see in the video so in our results as we scroll down we can see it's testing dc1 dc2 DC3 and it's finding lots of problems there because they're not existing so this will be very common results when you have a domain controller unavailable or it's down now Microsoft does give you another utility that's very powerful that helps you troubleshoot and analyze replication and that's called rep admin you can do a forward slash question mark one nice switch is the replication summary and that is a very very handy review of your replication so it gives us a nice table for each of the domain controllers how many fails the total efforts to replicate says dc1 is not available dc2 looks like it is available and DC3 is available I also added in our notes a number of Powershell scripts that you can run again for more additional help in replication so those will be in the notes as well as event IDs if you're troubleshooting our next test is rid master the domain controller that this is on is down so it should fail starting to read Master Test and we can see it failed let's do the system log test we'll now run the verify Enterprise references which is for the read-only domain controller and it passed that test even though we don't have a read-only domain controller we'll now run the test for verify references which is backlinks and forward links and it passed this also and it basically runs it on the present domain controller the last one is test verify replicas I don't have applications that are ad aware so I don't really have application partitions but we'll go ahead and run it anyway and it passed now in the notes will be event IDs for each of these tests there'll be additional Powershell scripts that you can use to further dig into failures on these tests in the video notes you want to get those now there are additional tests for DNS through the DC diags test I'm going to do these tests in another video that I'm going to focus on troubleshooting an embedded DNS server now as we wrap up DC diag there's just a few more switches and arguments that are really helpful if you're working from a workstation that doesn't have elevated credentials you can use the slash U and put in your domain username and password or slash a gives you test all servers domain controllers on the site forward slash e gives you all in your Enterprise so that will be Forest wide then forward slash Q will basically show you only error messages so if you want that you can do that too [Music] thank you
Info
Channel: TechsavvyProductions
Views: 7,654
Rating: undefined out of 5
Keywords:
Id: bWCQG61Z7-M
Channel Id: undefined
Length: 42min 59sec (2579 seconds)
Published: Tue Jul 04 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.