Documenting My Home Network/Lab

I’m a big believer in learning by doing. Over the last couple years I’ve built up a nice little home network/lab where I host various applications and test new tech. I will hardly be the first person to make this recommendation, but if you are able to get your hands on some hardware to set up a home lab I would highly recommend it. Any old computer you can turn into an ESXi host is enough to get started. I have learned a ton about various technologies that I wouldn’t have had much opportunity to learn elsewhere. First, the obligatory picture:

The breakdown:

  • Ubiquiti Edgerouter X
  • Ubiquiti UniFi AP AC LR (not pictured as it’s in a location central to the house)
  • ARRIS SURFboard DOCSIS 3.0 Cable Modem (SB6183)
  • Cisco SG300-28
  • Cisco SF302-08MPP
  • TRENDnet 24-Port Cat6 Patch Panel (22 runs in the house, minimum of 2 per room)
  • Startech 12U 19-Inch Desktop Open Frame 2 Post Rack
  • Raspberry Pi 3 Model B
  • Lenovo TS140
    • Xeon E3-1226 v3 3.3Ghz
    • 16 GB Ram
    • 44 TB (raw) of various spinning disk, mostly WD reds (yeah, yeah, I know. I’ll get a NAS eventually)
  • Old computer serving as my ESXi host
    • Intel Core i7-2600K Sandy Bridge Quad-Core 3.4GHz
    • 16 GB RAM
    • Samsung 840 EVO 250GB SSD
    • 5 TB (raw) of spinning disk
  • 2x CyberPower CP1500AVRLCD UPS
    • One is for my “core” equipment (network stuff/TS140), the other is on the ESXi host

I am a huge fan of the Ubiquiti hardware and I’ve helped multiple people set up the same combo. It’s pretty fantastic to not have to regularly “reset the router” like seems to be necessary for just about every consumer router/AP. I picked up both switches off of eBay and have been pleased with them. Some Cisco purists dislike the small business line but they’ve been great for my purposes. The SG300-28 (gigabit) is my main switch and the SF302-08MPP (fast ethernet, POE) runs my cameras. I didn’t need to have separate switches, but based on what was available on eBay this was perfect for my goals. If we’re being honest, need went out the window a long time ago anyway. I run a handful of VLANs and 2 separate physical LANs to segment off the different portions of my network. Necessary? Probably not, but where’s the fun in that? This has allowed me to tinker with VLANing and setting up firewall rules. Fun stuff!

The TS140 runs AD, DNS, Blue Iris (cameras), and acts as a file server. My storage situation could certainly be improved, but all important data is backed up locally and remotely and any data that I consider to be less important is at least mirrored to separate local storage. My Raspberry Pi runs PiVPN which is a super simple way to run OpenVPN and is much more manageable than my previous setup where I ran it on my router. The ESXi host runs the following VMs and applications:

  • Plex and related apps
  • Borg Backup
  • Crashplan
  • Nextcloud
  • Kali box for security tinkering
  • Windows VM for when I need to run Windows specific apps
  • LibreNMS
  • NGINX
  • Whatever I feel like playing with at the moment

Each of my Linux boxes are running Ubuntu 16.04. All VMs are regularly backed up using ghettoVCB.

That’s pretty much it for now. I will update when I inevitably add more!

SQL Saturday #357 Recap

In January I posted about a couple of upcoming speaking engagements and after some nagging for a follow up from @JayDue216, here it is. First up was the Ohio North SQL Server User Group’s January meeting. This wasn’t my first time speaking in a professional setting but it was by far the audience with the most SQL Server experience I have been in front of. All in all, I thought it went alright. I didn’t feel too great about it immediately afterward but I received some good feedback from those in attendance which is always great to hear. Nerves got to me a bit and I caught myself tripping over my own words at a few points. I had a great deal of confidence in the content I developed and I can’t express enough how valuable that is when your delivery isn’t going as smoothly as planned. Having that confidence allows you to slow yourself down, gather yourself, move to the next slide, and keep powering through the session. At the end of the day it felt good to get the first one out of the way and identify areas for improvement.

After getting my feet wet next up was SQL Saturday #357 in Cleveland. I was initially scheduled to speak early in the day but an out of town speaker was hoping to catch an early flight home and I agreed to move to the last time slot of the day. I caught a few sessions early in the day, but I ended up spending a great deal of the time in the speaker room going through my presentation to make sure I was ready to go.

SQLSat357I only did it for the sweet SQL Saturday pullover

Unlike the user group, I felt very good about the presentation after I had finished. I didn’t feel myself tripping over my own words this time and my demos went smoothly. Despite being the last session of the day I had a full classroom and the audience seemed pretty engaged throughout the entire presentation.

I definitely plan to submit to more SQL Saturdays going forward. For anyone interested in speaking but is on the fence: just go for it. Everyone has fears and reasons for avoiding them but the feeling of learning a subject inside and out and then delivering it successfully is absolutely worth taking on those fears.

Happy 2015

New year, new updates. After a brief stint as a DBA at another organization I have returned to my previous employer to take on a new and exciting role as a Database Administrator in a large SaaS environment. This environment has customers and data centers all over the wold and will certainly present some new challenges. It’s no secret that IT as a whole is trending toward cloud architecture and I’m excited to be right in the thick of it.

I’m also excited to announce that I will be presenting my session “Troubleshooting SQL Server Performance Using Wait Stats” at the Ohio North SQL Server User Group on January 6 and then again at SQLSaturday #357 – Cleveland on February 7. These events will be my first speaking engagements and I am definitely looking forward to them.

I have again become quiet on this blog as the last few months have seen two career moves that have kept me busy. Now that I have started to settle in I hope to increase my posting frequency. My new role should provide me with plenty to share!

Meet the New and Improved MCSA Certification

When I was working through my MCSA certification the least enjoyable part, for me, was the 463 exam (Implementing a Data Warehouse with Microsoft SQL Server 2012). I wasn’t really a fan for a number of reasons. It was the least relevant to my role at the time and it felt like portions of the exam were less about practical knowledge and more about promoting those technologies. Still, being a nerd I enjoyed the challenge of learning something new.

On Monday, Microsoft announced that there will be changes to the current exam structure. The 463 exam is no longer a requirement to receive the certification. The 461 (Querying Microsoft SQL Server 2012) and 462 (Administering Microsoft SQL Server 2012 Databases) exams are still required, but now you have four choices for your third exam. Those choices include the 463 exam, 411 (Administering Windows Server 2012), 412 (Configuring Advanced Windows Server 2012 Services), and 483 (Programming in C#). This gives more flexibility to those like myself whose interests aren’t really on the BI side of things. I would have loved spending more time learning Windows Server or resurrecting my C# skills.

I think this change will allow the certification to appeal to a wider range of SQL Server professionals which is a win for everyone. Well done, Microsoft.

 

The WAIT_AT_LOW_PRIORITY Option

My first post on this blog detailed a scenario where a read uncommitted select statement could ultimately block an insert/update/delete. In this scenario, a long running read uncommitted select is executed requiring a schema stability lock. That lock prevented the online rebuild from grabbing the schema modification lock necessary and caused the update statement to get queued up behind it.

SQL Server 2014 introduced an option that will allow more control over how online rebuilds behave in a scenario such as the one I described. The WAIT_AT_LOW_PRIORITY option gives you a few different choices in dealing with blocking scenarios. The syntax, from Books Online, is below.

MAX_DURATION is the time, in minutes, the rebuild will wait to acquire the necessary locks before taking action. ABORT_AFTER_WAIT tells it what to do after that time period has passed. Setting it to NONE means SQL Server will just continue to wait while setting it to SELF will cause it to give up on the rebuild. If this rebuild absolutely must finish, this could be set to BLOCKERS which would kill the process or processes preventing it from completing. To see this in action, lets first create our old scenario without the WAIT_AT_LOW_PRIORITY option.

LowPriority1

Now let’s try the same thing except we will use the WAIT_AT_LOW_PRIORITY option with a MAX_DURATION of 1 and ABORT_AFTER_WAIT set to SELF. This means after a minute of waiting the rebuild will give up.

LockPriority2

As we can see, after a minute of waiting the rebuild gave up and we would not have been able to recreate our previous scenario.

Breaking Radio Silence

Well it has been a few months since I have posted here but it’s time for me to share an update. I recently accepted a new position as a SQL Server DBA and I am excited to continue moving forward in my career. My previous role doing database support for a software vendor provided me with a great deal of experience working in both SQL Server and Oracle environments across many industries. In my new position I will get to experience taking ownership of an environment of my own and facing all the challenges that come with that responsibility. I believe this move will allow me to become a more well rounded SQL Server professional and I am incredibly excited for the opportunity.

Between buying a house, planning a wedding, and accepting my new position I have been quite busy lately and that has resulted in me being pretty quiet on here. I aim to be more active on this blog and more active in the community once the chaos settles down.

Automatic Page Repair in Action

One of the cool features of Mirroring and Always On Availability Groups is Automatic Page Repair. With Mirroring or Availability Groups, when SQL Server encounters certain types of corruption it will look to the Mirror/Replicas to see if they have a clean copy of the page. If they do, it will repair the page in the Principal/Primary and make your life easy. Obviously any occurrence of corruption requires further investigation, but this is a handy feature where SQL Server can bail you out of certain scenarios. In this post I will step through an example of what happens when you corrupt a page in a normal database and then what happens when you corrupt the same page in a database that is part of an Availability Group.

For this example I have two SQL Server 2012 servers (V-SQL1, V-SQL2) that live on a Windows Server Failover Cluster. They are both part of an Availability Group (AG1) that contains the BaseballData database. These replicas are in synchronous commit mode. V-SQL1 also has a copy of the BaseballData database called BaseballData2.

AutoPageRep_Instance

We will start by corrupting a page in BaseballData2 to see how SQL Server normally behaves when it encounters corruption, and then we will do the same thing with the AG database (BaseballData). Let’s begin by picking a random page to corrupt.

AutoPageRep_DBCCIND

AutoPageRep_DBCCPAGE

Here we have picked page 7071 slot 33 of file 1 which contains data about the 1942 Yankees.

AutoPageRep_Row

Next we are going to use a hex editor to corrupt the page we previously identified by writing some zeroes to the file. We will first need to take the database offline.

AutoPageRep_HexEdit

Now that it is corrupt, lets bring the database back online and try to retrieve some information about the 1942 Yankees.

AutoPageRep_Error1

As you can see, we encountered an invalid checksum. In this situation, our best bet is to go to a backup to get that data back. Let’s take a look at msdb..suspect_pages and sys.dm_hadr_auto_page_repair.

AutoPageRep_SusPages1

We can see our corrupt page in suspect_pages and we see nothing in dm_hadr_auto_page_repair. As you can probably guess, we will see something in dm_hadr_auto_page_repair later in this example. Let’s go ahead and step through the same process on the BaseballData database. To do this, I will stop the V-SQL1 instance (causing failover) and edit the data file the same way as we did for BaseballData2. I will then start the instance back up.

AutoPageRep_Shutdown

AutoPageRep_HexEdit

AutoPageRep_Startup

Now that our instance is running again, let’s fail back to V-SQL1 and then try to run our select statement.

AutoPageRep_Error2

It threw the same error, great example right? Let’s try again and see what happens.

AutoPageRep_Row

Success! It looks like Automatic Page Repair had our back. We can verify this by querying suspect_pages and dm_hadr_auto_page_repair.

AutoPageRep_SusPages2

Now we see our page listed in dm_hadr_auto_page_repair, because SQL Server repaired that page with Automatic Page Repair. Running CHECKDB returns no errors as our corruption has successfully been repaired.

Image 1017

Sort order without an ORDER BY

If you’ve worked with SQL Server for any length of time, you know that sort order is never guaranteed without an ORDER BY. I was recently asked why a query (without an ORDER BY) brings back results in alphabetical order in production while in test it returns them in the order they were inserted to the table. To explain this, I showed a quick example that I thought I would share.

First, let’s create a table and populate it with some data. We will be creating a clustered index on an identity column and a nonclustered index on the lname column. I then used generatedata.com to create 100 dummy rows to insert.

Now let’s go ahead and select all rows from the table where the last name begins with the letter L and grab the execution plan.

resultorder_nohint
resultorder_ex_nohint

Looking at the identity column, we can see here it brought back the results in the order they were inserted to the table. With the table being so small, it was most efficient to scan the clustered index. Since we have a clustered index on the identity column the rows will be stored on page in that order. Now let’s hint the same query to use the nonclustered index on lname.

resultorder_withhint

resultorder_ex_withhint

Here we can see that it returned the results in order by lname which is the order the lname column and clustering key would be stored at the leaf level of our nonclustered index. This ended up being what was happening in the environment I was asked about. The table in their production system was much larger and was correctly choosing to use a nonclustered index. This is just a simple example of how sort order is never guaranteed without an ORDER BY.

All About Collation

Collation is one of the easiest settings to overlook when configuring SQL Server but it’s one that can come back to bite you if it isn’t properly set. In this post I intend to cover the basics of collations, common issues resulting from collation misconfiguration, and basic resolution of those common issues.

So what is a collation? One way to think of a collation is as a dialect of a language. For example, in the U.S. we use the word color. Other English speaking countries use the word colour instead. Who is right? Well, that would depend entirely on where you are from. If you are in the U.S., color is probably going to be considered correct. If you had never even seen the word colour, would you know it means the same thing as color?

Collations can be thought of in a similar way. There are many variations of a language, each with different rules associated. These rules are set by the collation and govern things like sorting, case sensitivity, and accent sensitivity. In one collation, ‘ABC’ may be the same thing as ‘abc’ while in others that may not be the case. The “language” associated with a collation is going to be a code page.

Single byte (8 bit) characters (char, varchar, text) can make a total of 256 different combinations with their bits being flipped on and off. Since there are more than 256 characters in the entire world, there is no way to represent every possible character in a single byte. This is where code pages come in. Code pages exist at the Windows level and keep track of which characters the 256 combinations represent.

For example, English speaking countries are going to use the Latin code page (1252). The 256 characters on that code page are going to be characters used in the English language. We wouldn’t be able to fit every Greek character in that same code page, so Greek has it’s own code page as well (1253). All collations, their description, and their code page can be found with the query below.

Unicode data types (nchar, nvarchar, ntext) are stored double byte so they can actually store multiple languages without having to reference a code page. Our Unicode data types will still play by the sorting and comparison rules of the collation, so you still don’t get to ignore collation if you are using Unicode data types.

Collation is set at the instance level, the database level, and the column level. When a new database is created it will use the instance collation unless otherwise specified. Your system databases will also match that instance collation. Any character columns created in your user databases will use the database collation when they are created unless a collation is defined in the create statement.

When problems with collation occur it’s typically because something was overlooked in the planning stages. At some point somebody realizes the wrong collation was used and they decide to change it at the database level. Problem solved right? Unfortunately that is not the case as the database collation is simply what SQL Server looks to when new objects are created. Changing this setting will have no impact on existing objects. So in this scenario, the change is made and time goes on and new tables are added to the database using this new collation. What happens at this point is that you have two collations actively used in your database and if there is ever a need to compare columns in different collations you will encounter errors.

The other issues commonly seen are TempDB collation mismatches. We know that TempDB’s default collation is going to match the instance collation so any objects (such as temp tables) created in TempDB will use that collation by default. So what happens if your user database uses a different collation and you have to compare it to data in a temp table created with TempDB’s default collation? You guessed it, you get collation mismatch errors.

The proper fix for these issues really depends on the scenario. With collation mismatches within a user database you can simply alter the columns if there are only a few tables impacted. Unfortunately these issues, in my experience, are rarely caught that early. I have worked with databases that have had multiple collations throughout the database and manually altering them was more work than it was worth. In this scenario I would just migrate all the data to a new database that was created in the correct collation.

Your other option if there are only a few tables is to collate the columns in question within the query. This is done by adding a COLLATE clause to the column. Unless this is a report or some other ad-hoc query, this probably isn’t going to be a very viable option.

With TempDB collation mismatches the fix is somewhat similar. You can take the sledgehammer approach and either change the collation of the database and all existing objects to match the instance (and therefore TempDB) or change the instance collation. Neither is an easy task and can introduce even more issues if done incorrectly. Alternatively, you could modify the code like in the other scenario. This could be accomplished by specifying a collation when creating the temporary object, or modifying the queries to collate the columns on the fly.

Not scared yet? Well the above are best case scenarios assuming that the collations in question share the same code page. When they are using different code pages a potential for data loss exists when changing between collations, and you probably won’t know when it happens. Your best bet is to avoid these issues altogether by making sure the proper collation is used in the first place. As a general rule, your instance and all databases on that instance should use the same collation. This is the best way to be sure that you will never encounter issues within your user databases or in TempDB.

Let’s create a test database and step through a few examples.

Now let’s take a look at the collation at the instance, object, and table levels.

CollationInfo

We can see that our instance collation is SQL_Latin1_General_CP1_CI_AS. As we learned, our database collation will match that along with our columns since we did not specify otherwise. Now let’s change the database collation of our database and create a new table with that collation. If we select from the two tables, we will see that the data looks identical.

CollationLooksSame

Both sets of data look identical, but watch what happens when we try to join them together.

CollationError

And now we have the dreaded “cannot resolve the collation conflict” error. This is why collation mismatches can be a silent killer. You generally don’t know it’s in issue until it jumps out at you. As we discussed earlier, there are two ways to fix this. Alter the collation of the columns themselves, or modify the query. First we will alter the columns with success.

JoinSuccess

The other option is to use the COLLATE clause within our query. I have set our columns back to the wrong collation for this example.

JoinSuccess

Let’s now look at TempDB collation mismatches. As it stands currently, our person1 table is using the SQL_Latin1_General_CP1_CI_AS collation and person2 is using Latin1_General_100_CI_AI. Here we will create and populate a temp table and then try to join to it.

CollationError

Once again, our join fails because our temp table was created using the default collation of TempDB. If we were to include a collation in our create table statement, this would work without issue. Alternatively we could modify our select query to use the COLLATE clause like in the previous example.

JoinSuccess

Unfortunately, most real world examples I have encountered aren’t always this simple to resolve. If thousands of objects are using multiple collations the solution becomes a little less clear. It becomes even more difficult when different code pages are in play as data loss is a possibility. With proper planning, however, these situations can be avoided entirely.