Errors stating the 8060 bytes per row limit has been exceeded when sum of columns is well under 8060 bytes

Awhile back I worked on an issue where a client reached the 8060 byte limit for a row in SQL Server. 8060 bytes is the max size a single row can be as that is the size of a page minus the header (this doesn’t include LOB data types as those are handled differently). They were using a software functionality that adds and removes columns to the table as fields are added and removed within the software. The issue this particular client was seeing was that they had reached the limit when adding new fields (columns), but when they removed fields (dropped columns) to try to make room they still saw the error message.

At first this didn’t quite make sense. The column has been dropped, so that space should be available right? Ultimately we found this Microsoft bug report that was closed as functioning to spec.

From Microsoft:

“This behaviour is expected. Dropping a column is a metadata-only change and can leave gaps in column-offsets on the row. When new columns are added to such a table the space left by the dropped columns may or may not be reused for the new column; it depends on the size and type of the old/new columns. Bottom-line is that if you drop/add columns enough times there will be a point at which the max-fixed-size of the row will be exceeded due to the “holes” left behind by the dropped columns.

How to fix this? If your table has a clustered index, then just rebuilding the clustered index will recompact the rows getting rid of the holes left behind by dropped columns. If your table is a heap, then there is no direct way, but you could create/drop a dummy clustered index just to get this above effect. You could use DBCC CleanTable, but it only reclaims space from dropped variable length columns (not fixed length columns). We are looking into the heap issue for a future release.”

We rebuilt the clustered index as instructed and this resolved the issue. While it was great that the issue was resolved and the explanation made sense, I still wanted to see it for myself. Below are the steps I took step through this.

Let’s create a test table with eight CHAR(1000) columns and populate it with some data. This will bring us 8000 bytes per row.

Image 1002

Now let’s try to add another CHAR(1000) column. It will fail as this would exceed the 8060 byte limit. This is what the client saw in their environment.

Image 1001

Now let’s drop one of the columns, col5. This should free up 1000 bytes, right?

With 1000 bytes free we should be able to add another column. However, when we try we see that same familiar error.

Image 1003

But we dropped a column so shouldn’t that free up space? It certainly appears that way when we look at sys.columns.

Image 1004

Something has to give, so let’s take a look at what is written to the actual data page. First we will use DBCC IND to locate one of the pages we will look at.

Image 1005

Next we will turn on trace flag 3604 and use DBCC PAGE to view the page contents. We dropped col5 which would contain the value ‘E’ so let’s scroll down and look for it. We see values ‘D’ and ‘F’ and between them we see ‘DROPED = NULL’ where ‘E’ used to exist and we can see that it is still taking up 1000 bytes.

Image 1006

There is no clustered index on this table so rather than rebuilding the clustered index we will just rebuild the heap. We will follow the same steps to view one of our data pages.

Image 1007

Image 1008

Now when we look at the page, we no longer see the 1000 bytes col5 used to consume, and if we try to add another column it will succeed.

Image 1017

Deleting duplicate rows from a table without a primary key

I have encountered several scenarios where it has been necessary to clean up duplicate rows in a table. When there is a primary key or some sort of unique identifier this is an easy task. However, when they are truly duplicates and there is no way to distinguish between the rows it becomes a little trickier.

Because of this. I have often seen people take the approach of:

  • Select distinct values from Table A into temporary table B
  • Delete from Table A where row exists in table B
  • Insert values from table B back into table A

While this works, I prefer fewer moving parts. Below are my preferred methods for cleaning up duplicates in both SQL Server and Oracle.

In SQL Server, my weapon of choice in this situation is the ROW_NUMBER window function. Let’s start by creating a table with duplicate rows.

Image 001

The ROW_NUMBER function will allow us to create that unique identifier we were looking for to distinguish between the duplicates. A simple select statement using the ROW_NUMBER function:

Image 003

Looking at the results we can see that deleting everything with a rownum greater than 1 would leave us one unique row each. We will do that next using a derived table and this same window function.

Image 002

And with that we have a duplicate free table.

In Oracle, we could simply use ROWID Pseudocolumn the to clean up the same data. If we were to select all columns from the employees table and the ROWID Pseudocolumn, we would see that we actually do have a unique identifier to work with.

rowidWe would then pick out the min ROWID for each unique row in the table grouping by all columns and delete everything else.

oradeldups

And once again we have a nice clean table.

How a read uncommitted/nolock select can block other processes

I thought I’d start this blog off by sharing something I have encountered more than once in the past few months. Most people believe that if they run their select queries in read uncommitted/nolock that there is no way they can block other processes. This belief is mostly true, but it actually still possible to unintentionally block other processes when running a select with nolock.

So how can a read uncommitted/nolock select query block other processes? To answer that, I will first explain two types of locks in SQL Server. Schema Stability (Sch-S) and Schema Modification (Sch-M) locks are designed to maintain the integrity of our objects while they are being accessed by DML or DDL queries. When running any select query in read uncommitted/nolock, a Schema Stability lock is acquired to prevent any other process from altering the underlying object while the query is executing. These locks are compatible with any other lock except a Schema Modification lock and will never block a DML query. Schema Modification locks on the other hand are acquired any time DDL is run against a table and are incompatible with every other lock type. SQL Server doesn’t want anyone modifying the table while it is being selected from (thus the Sch-S lock), nor does it want anyone accessing the data while the table structure is being modified (where the Sch-M lock comes into play). As a Schema Modification lock cannot be acquired while a Schema Stability lock exists, the Schema Modification lock would have to wait for existing Schema Stability locks on that table to clear.

Say for example you need to kick off a report that will run overnight. You know there shouldn’t be users in the system, but you run it in the read uncommitted isolation level just to be safe. Let’s then say that you have a maintenance job that is supposed to rebuild indexes on the same night. If a long query/report is running at the time your index rebuild job kicks off, you can encounter a scenario where that long running query blocks other processes. When rebuilding an index (even online), a Schema Modification lock is required. Since a Sch-M lock is incompatible with any other lock it must wait for that Sch-S lock to be released. If, at the same time, another process is attempting to update/insert/delete to the table(s) in question, they will be unable to complete as they will now be queued up behind the Sch-M lock.

Let’s step through an example. First, let’s create a table big enough to cause trouble and add a nonclustered index on that table.

Why a GUID CAST as a CHAR you ask? I have no good answer, but since we are being destructive here I decided we should let them join the party. The next part is tricky as you have to time it just right. As I stated previously, even online rebuilds need a Sch-M lock. For online rebuilds, this is very short lived and happens at the end of the rebuild process. Because the Sch-M lock is our problem lock, we want to start our long running query against the table just as the rebuild is about to finish. To prepare, I first rebuilt the index online and noted the time (let’s say 15 seconds).

Now we’re ready to do some blocking. We will first kick off our online rebuild (SPID 66 for me).

Since my previous rebuild took 15 seconds, about 10 seconds into the rebuild we will start the next step. In another query tab (SPID 70 for me) we will start our “reporting” query that will require a Sch-S lock.

Now let’s say our application needs to update a row in the schemamodlock table. The application (SPID 68) issues an update statement.

If you timed it up right, the update query is now blocked and sitting behind the Sch-S and Sch-M locks we created. If we run sp_who2 we can see that.

schemamodlock

And that is how a NOLOCK/Read Uncommitted select can ruin your day. Of course, this can be described as a “perfect storm” situation and most likely the rebuilds wouldn’t be blocked excessively long. Regardless, always be sure to keep your maintenance plans in mind when doing reporting, even when using nolock.