Getting The Most Out Of Your Epyc Server With Proxmox!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this proxmox guide hopefully the first of many it's brought to you by Linode so big thanks Dylan ode there's a coupon below for $20 we've been using lenôtre ever there's tons of tons of old will no dad's and try to get this thing going again I've got some other content planned for you some bluetooth stuff but big thanks to low note for sponsoring this video I can say we've been using them for years since even before the level 1 YouTube channels currently running the level 1 website and get $20 and there's all kinds of fun DIY stuff you can do with the note it's toasting in the cloud you can host your own VPN or your own file sharing thing file storage thing you could have encrypted backups if you want although if you're just strictly using file backups there's probably some other options for that but running your own private VPN server that's actually pretty legit that's pretty cool so big guy still a node and there's a link below just check that out and for now I want the guide ok so I did a video on consolidating like how much consolidation I was doing with a new 128 core 256 thread epoch server from gigabyte and I don't I mean that's no moon that's a space station I mean that's sort of perfectly offer with 128 cores and 256 threads in the day I mean when you're talking about consolidating you know older Xeon machines and replacing everything up to and you know you could make the the argument to replace even some 3647 CPUs because and the SQL people in the audience will totally back me up on this like much past Broadwell like xeon v4 that per core performance basically just flatlines I mean if you're doing like SQL transactions per second or anything like that there's I mean it's Broadwell v4 all the way up to like xenon gold on you know socket 36:47 it's basically the same story so inter AMD with epic I mean that's the 30 second back story and 128 cores 256 threads it's like yeah they can more quarters in here but Intel crammed a bunch more cores on that older socket and when Intel did that when you had really high speed network interfaces and really has to be disk interfaces you couldn't keep the cores fed but this is not what I'm seeing from real-world production use on Epic Rome even 128 cores in a box if you want really really high density especially with the virtual machine type workloads this is what you'll be running because any single application it will struggle to use 128 cores I mean even across both sockets depending on if you're gonna run that as a to Numa node or a setup or you've got four local neumann ODEs and for remote Numa nodes for this the near-far thing which works a little better in terms of latency but for most more clothes you don't need to bother with what setting any of that this is a complete game-changer for density inside the data center and it is not unreasonable to consolidate dozens of machines into a single rack mount machine so the point of this video is some hints for configuring proxmox that i've sort of gathered over the years an approximate if you're not familiar with it is a virtualization platform distribution of Linux it is commercial you should you know subscribe to them they put a lot of work into making it really easy but it goes even beyond being just a distribution of Linux in that you can run the clustered services with it so you can set up a cluster of say three of these servers which is what I've done for the failover and redundancy you know a lot of the people in the in the last video kind of missed the point it's like isn't it you know if the server goes down aren't you're gonna lose everything so well yeah if you only have one but we're not putting three racks of machines into one machine we're putting three racks of machines into three machines and that cluster with high availability has better high availability than a hodgepodge of machines that were you know added over a period of like 10 years it's a much better situation situation I mean all of your your risk management people will tell you that you know running on borrowed time on hardware that old you know every day something is going to fail there's physically less stuff to fail on three physical servers because it's a software construct that's managing the availability now or the high availability or whatever and with the old stuff you know there were some of those were standalone machines there wasn't it's like what's the high availability is we're gonna go to theme and like restore the backup and that's not it's not great proxmox is something that you can play with on literally anything you don't have to have 128 core you know epic super server to do this there's tons of great resources online there are tons of other youtubers that have done like videos on how to set up rocks mugs and like a super basic stuff it is very basic like if you can't get through the proxmox installation and you've got reasonably standard hardware something is wrong they've made it that easy now on the commercial side for the set up yes I was using VMware and VMware ESXi VMware makes a lot of great products ESXi is a great virtualization platform clustering commercial support the whole nine yards but proxmox it doesn't quite have as many bells and whistles but it is an extremely competent virtualization platform so like if you're a school or you know a nonprofit organization or Church or something like that although you can get nonprofit pricing from VMware but if you are budget constrained then proxmox is work to look for the Linux platform I mean Microsoft because of VMware so good and the Linux stuff is so free you know that's where we are with with hyper-v on Microsoft side you can get hyper-v for free as well you can get Windows Server for free if it's just the diverse Windows servers that does nothing but be hypervisor because that's sort of the competition landscape that we're in and then you can do the PowerShell thing and manage it remotely and all this kind of stuff the proxy box and some hard-won lessons from working on proxmox over the years I've got a guide on the level 1 forum that is the stuff that I usually do to a new proxmox box the first step for getting the most out of your epic CPU is configuring your BIOS or at least look around and make sure that everything is good you can configure things for performance but it's also useful for troubleshooting so you configure things like you've got onboard video maybe you're using an ad in video card I'm using Tesla V 100 sometimes the bar it's different versions of the bars for example will detect that as a card that you can actually do video out on and it's like not so much so before you install that you can come in here and be like hey I want to pick there's nothing in here because there's no GPU installed in the chassis at the moment but you can pick those kinds of things so agra ami graphic output policy is like output select unknown device that's probably a bios bug but if you've got more than one thing installed in here so we get the a speed drive route that's the on-board thing so just be aware of these options you know your SATA devices that are installed we can see that we've got all kinds of stuff in here for the networks and this kind of thing the network stack configuration there's another way to get to the AMD memory configuration status in here - which is perhaps useful for troubleshooting that would show you things like whether or not the DC's the DRAM ECC is enabled and and some other other things like that but aim dcbs CPU common options generally all of this stuff on auto is basically fine on the gigabyte chassis you don't have to worry about any of this some of this like the power supply auto control that is a real concern if you're retrofitting something older the machine will lock or hang if you have a power supply that can't tolerate the ultra low power states of epic meaning that the power supply will not supply clean power and then the system hangs and this is not a good situation you can also do custom P States but generally if you're running epic don't do this this is crazy it just doesn't doesn't make sense you can disable SMT multi-threading so when you do that you have to power cycle machine them two CPUs have to be turned completely off because turning the the threading on or off is not really a thing that you can do soft but that's a hard thing so the AMD CBS Northbridge common options most of the stuff that you would actually want to change your under here immu and acs access control an iommu go hand-in-hand so access control generally being on is going to improve your I live in new groups and also a lot of motherboards there's the difference between Adam and me Otto which is a partial enablement and iommu enabled which is a full enablement and the reason for that is some bugs and windows and let's just leave it at that there's also PCIe AR I support its the alternative routing implementation and this if you've got a PCIe device that has a lot of downstream devices that would normally be used like the downstream devices would normally use the bits on the PCIe packets for specifying hey this is a downstream device this is a downstream device it doesn't necessarily work correctly and so auto used to mean enabled but I think it since been changed to disabled so if you're having trouble out of an especially an older PCIe device set this to disabled 10 bit tag support is a new thing for devices that have a whole bunch of a whole bunch of downstream devices I think it's just moving from 8-bit tags to 10 bit tags so that the device can have more tags on the bus which as long as it's fine as long as you don't have any devices that will be problematic with that setting the SMU common options this is the the option here so we can do C TDP 240 Watts now I had one device that required 480 watts it was a 2 socket system and so I was heading 240 Watts actually hurt my performance so they didn't run some benchmark the movies kind of thing because it seems we're running each CPU at 110 or 120 watts so I said up to 480 I was like that's weird so set it to 480 and then each CP was running it to 40 so I don't ok I mean I guess I think that has since been fixed but you know just so you know there's options in here for fan control you can also control that from the IPMI so you don't have to fiddle with the bots on that there's also the determinism control so the determinisms slider we can say we want performance and the system will tend to giving its performance it's just the same as it is on desktop except all of the cores can do 3.3 5 gigahertz so it doesn't have to shuffle your load around or doesn't necessarily have to have a particular loading order that might be coming in future generations but for now and that's a thing of thread Ripper it's not really thing an epoch boost F max enable yeah just like the desktop counterparts although I filled with this a little bit in over and you really do have to abuse the SMU to unlock full epic overclocking I think AMD has locked that down but the danger is not really to the processor it's really the motherboard because overclocking thread Ripper like it's 39 70 X if you're gonna run 4.4 gigahertz on all 32 cores that thing's gonna suck down like 700 watts of electricity it's really a bathtub curve to get you know those really really hot clocks but you can't score like 20k and Cinebench are 20 so I mean 700 watts it's a little bit of space heater there in general you won't really need to change much of anything else in the box I will mention the chipset tab PCIe link training type in PCIe compliance mode I mean PCI compliance modes like that just if you're you know you're running some of those older PCIe adapters that are not exactly stable to begin with you've got specialty equipment or lab equipment that is running off of a PCI Express interface with an FPGA like compliance mode being off I mean that just I just I don't know maybe we can like for political reasons we could label that something else but know that those options are there I personally have never had a situation where changing these has resolved an issue it's always been one of the other one of the other issues and you can get yourself into a situation where changing these might actually prevent the system from posting correctly like II I've run into situations where the a speed IPMI device was not able to interface the video device is not able to interface correctly anymore after making some changes here so just be aware of that and be aware of these changes that you can do to the UEFI to get the most out of your system I'm going to configure the boot order for proxmox save and exit with proxmox installed when it boots up it will give you a login and you can log in here and this is it's good to know how to log in here because it'll take something goes wrong you will need to do that but it gives you an IP address HTTP colon slash slash whatever IP address it got from your DHCP server colon 8006 you can use that to load the proxmox web browser and finish configuring the system I wanted to use the ZFS GUI the ZFS GUI works great if you're going to use the ZFS GUI you're gonna have to clean your disk so that they're blank because otherwise for safety it's gonna say no we're you'll do it from the command line not a big deal there we go rayji one our tank is online and now if we go to the ZFS option Hey look it sees the tank everything is online it's going to report our status it will email us in case one of the drives dies or or whatever you've got some really awesome stuff there so there's zpool mounted we can click on data center and go to storage and create a directory now because I installed two just as SATA SSD normally I would want to use an nvme but for the purposes of this video I'm using to say that SSD which is fine but you know again little swords would be good it creates a directory called local which is stored on that local SSD not good enough I've created a ZFS pool called tank I've told it that hey this directory is available at ZFS it knows is under the storage tab you actually have to click on data center under proxmox so then I'm gonna create another folder under tank and just copy the very live easy which is by default that local folder I'm just gonna call it local ISO really only gonna use it for ISOs but the content you know there could be multiple things in here disk image dumps whatever you know we could we could store five backups in here not really a big deal so now I've got local ISO looks pretty similar except it's in my ZFS tank and it's enabled so now when I create a virtual machine we can see it in a storage here under under data center so now when I right-click on this node and go create virtual machine we call this VM 101 or whatever they can go to my OS tab I can actually instead of local I can pick a local ISO and my Debian ISO should be in there unless I screwed up somehow I did screw up slightly goes in the ISO folder which is why I copied that you can't have done this a time or two right there we go good to go yeah Colonel it's not gonna be that's fine whatever yeah this is all fine hard drive this is fine we're gonna store it in tank and disk size 30 gigabytes that's fine cash no cash unsafe right back I will do four cores and do four gigs of RAM so now we've got both a container and a virtual machine it's going to connect and boot and do stuff proxmox really is a pretty slick operating system it's worth money it's worth paying for support I mean that's what you're what you're paying for right but just to experiment with it you know you don't need this subscription it's totally okay and you see how just mind-bendingly fast this was and I think I accidentally chose algerian is our install language but that's completely fine it's that's it's a totally recoverable situation and this is installing you know imagine how fast this would be with like actual SATA this is pretty much she's doing everything from RAM Oh ipv6 is not enabled so there's that now if you had a cluster of these notice that you've got options in proxmox like replication and SEF so I get a couple of questions on the last video that was like oh my gosh you put an entire rack in in one server what about redundancy and reliability well got news for you we're talking about 15 to 20 servers per rack because the rack was not super overloaded they were not super super high density we're not talking blade centers or anything like that but because of the replication the actual storage of these virtual machines we'll run from system to system so that means that if I've got three of these brand new AMD epic servers running there's copies of my machines distributed on two out of the three VM hosts so any one of them could blow up and I've got an up-to-the-minute replica of whatever I was doing on the other machine now if it's a control shutdown like say we've had a catastrophic power supply failure and what fire has shot out and we need to change the chassis but the machine is still running which is the most common type of failure mode not the power supply catching on fire but like something has happened you're gonna have to shut down the Machine proxmox will let you migrate those running virtual machines from system to system that does take a little configuration you have to set that up and the whole shared storage thing you need to fast interconnect and the whole split node shoot the other node in the head there's there's some details there but you can totally do that for the kinds of relatively pedestrian workloads that we're running here or you know you could run a clustered up like clustered Microsoft SQL Server where you've got VMs on each node and then the clustering is happening literally inside the virtual machine but all those individual machines will show up here under the data center tab in proxmox so you know if there's enough interest in this video and enough people have not seen that before and would like to see that kind of a set up I could show you but it's pretty cut-and-dry I mean if you're comfortable moving around to this extent with your system you know we're just setting a password it's not secure at all if you're comfortable moving around the system like this I mean I'm just showing you there's not really a lot of insanity here then you're pretty much good to go in terms of setting up the clustering or there are well I probably should do a how-to on that actually with Saif by the time you get into like storage replication and that kind of thing but it really is pretty much point click now for this because I'm setting up new virtual machines we're actually running through the Debian installer but if you used like disk the VHD or the VMware tools for creating a virtual machine from a physical machine there's a ton of different tools out there for doing that that you could just boot up the virtual machine pretty much immediately all you'll really have to do is configure the network you can maybe move from a virtualized network adapter to a pair of virtualized network adapter it really is worth installing the vert io drivers for both IO and the the para virtualized NIC because those have less overhead I mean just imagine the overhead of emulating a full network card versus just enough of the network card to get it to work to pass it through to the host operating system way less overhead it's the same deal with the containerization we get the PvE containers here you can also set this up for docker it's not a problem the overhead from that type of virtualization is much less I mean this epic server could run thousands and thousands of containers whereas it might only be able to run merely hundreds of virtual machines I mean depending on how beefy those virtual machines need to be but 128 cores like this is probably the easiest way that I can explain it I mean it is nuts that we've had a generational leap that will will reasonably let you consolidate machines that are you know five to ten years old entire racks at a time that's what we're talking about here in the data center that level of a compression for lack of a better way to describe it and it really is game-changing because all that legacy Hardware all those older physical machines all that electricity that's being used gone and when you implement a clustered solution like this the reliability is actually better than the liability of all that old hardware I'm Wendell this is level 1 this has been a quick peek at proxmox and also optimizing your servers like the gigabyte epic server chassis that we've been taking a look at so if you found this useful give it an update and if not there's another thing there I don't know but give me some feedback on the level 1 forums as to the kinds of things that I can produce that would be useful and be sure to check out the how-to guide that goes with this video because it's a little bit more step-by-step and a little bit less me rambling I'm Windell this is level 1 I'm signing out and I'll see you later so there you go that's getting the most of your aim d epic server now our and the epic servers happen to be gigabyte chasings and the configurability here is pretty much off the scale these are an incredible value for what they are and if you're worried about reliability and it's like I only need a four hour support thing we can't you know I need I need service well there's probably a local company that will actually sell you gigabyte chasse's and provide the level of support that you need but when we're talking about a three server cluster it's not a four hour response time it's a 30 second response time because if one of the machines and the cluster goes down immediately the other two starts spinning up virtual machines and they take over the load immediately gigabyte doesn't play any games with the the ipmi licensing it's fully unlocked it supports everything virtual media it's it's the it's the $300 ipmi option the cost is is better then you know other major OEMs at least here in the US and I feel like I have to plug them a little bit because nobody knows about them but in the data center and in other markets they're actually sort of a force to be reckoned with but here in America it's like a gigabyte rack server but then you look at it and it's like okay I want to use all my nvme all my PCIe lanes for interview me okay that the whole hundred twenty eight lanes are gonna the front it's like oh you don't want to do that well you can do 64 PCIe lines with the the PCIe muxes and then take 64 lanes for you know other stuff or 100 network interface or mixed 100 gig network interfaces plus storage into you single or dual socket we've covered pretty much everything check out our other videos on these chassis x' if you're curious they're all available in her own configurations this machine has the you know 12 three and a half inch drive configuration so a little bit bulk storage don't know that I would pair that with 128 core CPUs but you can't and you can mix and match and you can do whatever because it's all modular look at the OCP 2 and 3 slot so again it's all modular you can mix whatever network interfaces you want all the PCIe for storage internally that you could ever need it's it's pretty awesome these these I love not being nickel and dime today there's nothing more frustrating than getting a really high end server and then you know you're gonna DIY some upgrades to save some money I mean you pull the power supplies out it's like 500 watt power supplies what what is this no these Jesse's are all everything is top end it's good stuff I mean these are all thousand watt plus you know modular power supplies it's a good job get good buy I've rambled enough I'm Wendell this is level 1 this has been getting the most out of your proxmox configuration while I'm troubleshooting that ZFS thing but I think the ZFS things been fixed so run the patch on your kernel and let me know because I've run it for a few days now and it's been fine I'm sounding out I'll see you later [Music]
Info
Channel: Level1Linux
Views: 83,961
Rating: undefined out of 5
Keywords: technology, science, design, ux, computers, linux, software, programming, level1, l1, level one, l1Linux, Level1Linux
Id: fqa3npnLimg
Channel Id: undefined
Length: 24min 30sec (1470 seconds)
Published: Thu Jan 23 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.