Simple (and incomplete) note about storage of BLOBS in SQL Server

Version : 0.8
Date : 02/12/2012
By : Albert van der Sel



We all know that Relational Databases, in the "past", typically were used to store traditional "administrative"
and "business-like" data like Customer names, addresses, ordernumbers, prices, amounts, product id's etc..

So, the common SQL Server datatypes in use are simply characterbased or nummeric, like "char", "varchar", "int",
"nummeric" (or decimal), "datetime" and the like.
These datatypes still are very important (and much used ofcourse), but the later SQL Server versions were
progressively better equiped to handle "binary" data.

This "binary" data could be very diverse: .pdf files, excel sheets, images, video's, and completely unstructured
data as well. Usually, people call that type of data "BLOBs" or Binary Large Objects.

It sounds a bit strange: If we just look at the physical implementation, SQL Server uses pages of 8192 bytes,
where tablerows are stored in (details will follow soon!).
So, we can easily imagine that just characters and some nummeric data can be easily stored in such a page.
Now what happens if a 5MB image is stored? As we will see, SQL Server (in case of "inline" storage) will simply store
a pointer in the first page, that will point to a whole seperate "tree" of pages storing this blob data.

So, problem solved? We have not seen any details yet, but the topic remains lively discussed among
various SQL Server experts. Fact is, that a true relational database does not seem to be very good in
storing and handling traditional data together with BLOBs.

On the other hand: the business has changed. Many applications wants to show their customers images and movies of their
products and services. And the best way to store them seems to be a database.
And, Microsoft responded quite well. What you can store in SQL Server nowaydays, using SQL 2008, or even better,
SQL 2012, is amazing.

There exists at least the following options to store binary data in SQL Server:

In this note, we are trying to explore some stuff on BLOBs: how BLOBs are stored, how to load them,
how to retrieve them (using TSQL and other programmatic interfaces), and hopefully some more interesting facts.

First, we will explore the "traditional" inline storage, and work our way through the other options later on.

Hopefully, you will like this simple note.....




Main Contents:

Chapter 1. Inline storage of a blob.
  1.1 Some preparations first (create a database etc..).
  1.2 Adding a blob to a table.
  1.3 Analysis of the (inline) blob storage.

Chapter 2. Pages, extends, and other structures.
  2.1 Structure of a data page, and special pages.
  2.2 Extents.
  2.3 System pages.

Chapter 3. Using Filestream for storage of blobs.
  3.1 Enabling "Filestream".
  3.2 Implementing "Filestream".
  3.3 Adding a blob.
  3.4 Analysis of blob storage.

Chapter 4. Some methods for storage and retrieval of blobs. (not ready)




Chapter 1. Inline storage of a blob.

Here we take a quick look on how we can "load" a blob (in this case, a .jpg file) into a SQL Server database.
In this chapter, we will investigate "inline" storage, where the blob is stored on internal database pages.

1.1 Some preparations first (create a database etc..).


Now, we will do a practical example. Hopefully, you have sql server installed (like 2008 or 2012), and you
are invited to follow along.

First, we will create a simple database. However, it is composed of several files. This is so because I want
to prevent to store our objects in the "primary" .mdf file.
Actually having multiple (data) files is good practice, and we should leave the .mdf file for the dictionary.

All you need to do is, is to create a folder "c:\mssql\data" on your testsystem.
Next, using Management Studio, logon to SQL Server as sa or as an Administrator, and run the following script:

-- 1. Create the database:

create database SALES
on PRIMARY
(
name='SALES',
filename='c:\mssql\data\SALES.mdf',
size=40MB,
filegrowth= 10MB,
maxsize= 100MB
),
FILEGROUP SALESDATA01
(
name='SALES_DATA_01',
filename='c:\mssql\data\SALES_DATA_01.ndf',
size= 40MB,
filegrowth= 10MB,
maxsize= 100MB
),
FILEGROUP SALESINDEX01
(
name='SALES_INDEX_01',
filename='c:\mssql\data\SALES_INDEX_01.ndf',
size= 40MB,
filegrowth= 10MB,
maxsize= 100MB
)
LOG ON
(
name='SALES_LOG_001',
filename='c:\mssql\data\SALES_LOG_001.ldf',
size= 40MB,
filegrowth= 10MB,
maxsize= 100MB
)

ALTER DATABASE SALES
MODIFY FILEGROUP SALESDATA01 DEFAULT
GO

USE SALES
GO


-- Create a sample table:

CREATE TABLE dbo.EMPLOYEE
(
EMPID INT NOT NULL,
EMPNAME VARCHAR(20) NOT NULL,
EMPSALARY DECIMAL(7,2),
EMPPHOTO VARBINARY(MAX)
)
ON [SALESDATA01]

-- Note the traditional datatypes like INT, VARCHAR, and DECIMAL,
-- as well as the datatype "VARBINARY" for BLOBs.


-- Insert some characterbased data (no BLOBs yet) into the EMPLOYEE table:

insert into EMPLOYEE
(EMPID,EMPNAME,EMPSALARY)
values
(1,'Harry',2000.50)

insert into EMPLOYEE
(EMPID,EMPNAME,EMPSALARY)
values
(2,'Nadia',3000.00)

insert into EMPLOYEE
(EMPID,EMPNAME,EMPSALARY)
values
(3,'Albert',5000.00)

GO

So, we have created a database, and an EMPLOYEE table in that database. This table has three simple
character- or nummeric columns, for storing data like "employee name" (EMPNAME).
Note however, that the last column is of type VARBINARY, which means that SQL Server is now prepared
to store "binary" data (like .jpg or .pdf etc..) in that column. Soon we will see details on that.

Then, we inserted 3 rows into that table, filling the first 3 columns, leaving the EMPPHOTO to be 'null' for now.
Obviously, the EMPPHOTO column is supposed to store a photo (a blob) of an Employee.

Note:

in older SQL Server versions, binary data could be stored using the "image" datatype.
Although this datatype is still available, the "varbinary()" or "varbinary(max)" datatypes should be used for storing blobs.
First, "image" might get depreciated (it is, but it's still around), and varbinary is much better in
scenarios where space must be reclaimed if blobs are deleted or updated. The varbinary, is indeed of
variable length.

Let's find out how the table is physically stored.

If you created the database exactly as shown in the script above, then I am sure that our table is "page no 8"
in the file 'c:\mssql\data\SALES_DATA_01.ndf'.

This is so because the first couple of pages of any datafile, are for administrative purposes (for SQL itself),
like the fileheader (page 0), the "Page Free Space (PSF) page" (page 1) etc..
And, when the database was created, we told SQL Server that the filegroup "SALESDATA01" (consisting of 'c:\mssql\data\SALES_DATA_01.ndf')
to be the DEFAULT filegroup for new objects.

If you did not used the script, or used an existing Test database, the EMPLOYEE table is on different pages.
Anyway, if that is true, you can still follow the next "experiment":

We are going to use the DBCC PAGE command to dump page contents. Just follow along...

In order to get "full output" from the DBCC PAGE command, let's first tell SQL Server to do so.

DBCC TRACEON (3604)
GO

The DBCC PAGE() statement, uses some parameters. These parameters are nothing else than pure logical,
since the parameters just tell SQL Server the complete address of the page: that is, which database, the file id in that database,
the page number in that file, and output mode (printoption)
.
So, it's like this:

DBCC PAGE (databasename, file id, page no, modus)

Now, we already know the database name, the page number, and the modus we want (3 to let it be "verbose").

So, maybe we are not sure about the file_id where the page is stored on. Well, let's simply take a look
in the systemview "sysfiles".

USE SALES
GO

select fileid, filename from sysfiles

fileid filename

1 c:\mssql\data\SALES.mdf
2 c:\mssql\data\SALES_LOG_001.ldf
3 c:\mssql\data\SALES_DATA_01.ndf
4 c:\mssql\data\SALES_INDEX_01.ndf

Since we know that the EMPLOYEE table is stored on the default filegroup, which consists of the
c:\mssql\data\SALES_DATA_01.ndf file, we now know that the file id=3.

The way most entries in logs tell you about pages, is like this example: "3:55", meaning
page 55 in file 3.

Now, lets dump the page:

DBCC PAGE('sales',3,8,3)

output:
...
some output skipped
...
EMPID = 3
Slot 2 Column 2 Offset 0x14 Length 6 Length (physical) 6
EMPNAME = Albert
Slot 2 Column 3 Offset 0x8 Length 5 Length (physical) 5
EMPSALARY = 5000.00
Slot 2 Column 4 Offset 0x0 Length 0 Length (physical) 0
EMPPHOTO = [NULL]

Above, much output was skipped. Here, we see all fields of the third row.

Since our table is still extremely small (it only has 3 rows), all of it "sits" in one page.
Let's doublecheck that with the following.

DBCC SHOWCONTIG(EMPLOYEE)

DBCC SHOWCONTIG scanning 'EMPLOYEE' table...
Table: 'EMPLOYEE' (2105058535); index ID: 0, database ID: 8
TABLE level scan performed.
- Pages Scanned................................: 1
- Extents Scanned..............................: 1
- Extent Switches..............................: 0
- Avg. Pages per Extent........................: 1.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1:1]
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 8014.0
- Avg. Page Density (full).....................: 0.99%

As you can see from the output above, only one page needed to be scanned, so the EMPLOYEE table
just sits completely in page 8 of file c:\mssql\data\SALES_DATA_01.ndf.


1.2 Adding a blob in the EMPLOYEE table.

Up to this point, we only have "simple" data in our EMPLOYEE table like varchar, nummeric,
but no blob yet.

SELECT * FROM EMPLOYEE

EMPID...EMPNAME...EMPSALARY..EMPPHOTO
1.......Harry.....2000.50....NULL
2.......Nadia.....3000.00....NULL
3.......Albert....5000.00....NULL


Let's update the third record, and store a binary file in the EMPPHOTO column, that is, we will place
a photo of Albert (yuk!) into the EMPLOYEE table, and then see what has changed.

There are many functions in SQL Server, for import/export of data. The OPENROWSET() function can also
be used to load a blob into a table. So let's try that. Suppose in C:\TEMP, we have the photo "albert.jpg".

TSQL statement for adding a BLOB using OPENROWSET():

UPDATE EMPLOYEE
SET EMPPHOTO =
(SELECT * FROM OPENROWSET (BULK 'C:\TEMP\albert.jpg', SINGLE_BLOB) a)
WHERE EMPID=3

SELECT * FROM EMPLOYEE

EMPID...EMPNAME...EMPSALARY..EMPPHOTO
1.......Harry.....2000.50....NULL
2.......Nadia.....3000.00....NULL
3.......Albert....5000.00....0xFFD8FFE000104A4649...


Ok, we see a pointerlike field in the EMPPHOTO column where EMPID=3, but how is the blob (albert.jpg) stored?


1.3 Analysis of the (inline) blob storage.

We know that the EMPLOYEE table is stored in page 8. Now, there are some special pages in any database file,
at various locations, but it's reasonable to expect that SQL Server will not store the blob, say, starting at page 1298,
or page 633 or something. No, it should be quite"close", so lets play with DBCC PAGE('sales',3,x,3), where
we take a look at page no 8,9,10,11 and a few other pages "nearby".

Now its getting interesting....

First let look at page 8 again, at the data related to EMPID=3 (Albert).


EMPID = 3
Slot 2 Column 2 Offset 0x16 Length 6 Length (physical) 6
EMPNAME = Albert
Slot 2 Column 3 Offset 0x8 Length 5 Length (physical) 5
EMPSALARY = 5000.00
EMPPHOTO = [BLOB Inline Root] Slot 2 Column 4 Offset 0x1c Length 24 Length (physical) 24 (more stuff..)

Interesting. At the EMPPHOTO column it now says "[BLOB Inline Root]", meaning that SQL Server stored
the binary file inline, that is, really inside the database.
Now lets look at some other nearby pages.

At page no 12, I have a "hit"! Take a look at this:

DBCC PAGE('sales',3,12,3)

partial output..

Blob Id: 929038336 Level: 0 MaxLinks: 501 CurLinks: 56
Child 0 at Page (3:16) Slot 0 Size: 8040 Offset: 8040
Child 1 at Page (3:17) Slot 0 Size: 8040 Offset: 16080
Child 2 at Page (3:18) Slot 0 Size: 8040 Offset: 24120
Child 3 at Page (3:19) Slot 0 Size: 8040 Offset: 32160
Child 4 at Page (3:20) Slot 0 Size: 8040 Offset: 40200
Child 5 at Page (3:21) Slot 0 Size: 8040 Offset: 48240
..
(some entries omitted)
..
Child 45 at Page (3:61) Slot 0 Size: 8040 Offset: 369840
Child 46 at Page (3:62) Slot 0 Size: 8040 Offset: 377880
Child 47 at Page (3:63) Slot 0 Size: 8040 Offset: 385920
Child 48 at Page (3:13) Slot 0 Size: 8040 Offset: 393960
Child 49 at Page (3:14) Slot 0 Size: 8040 Offset: 402000
Child 50 at Page (3:15) Slot 0 Size: 8040 Offset: 410040
Child 51 at Page (3:64) Slot 0 Size: 8040 Offset: 418080
Child 52 at Page (3:65) Slot 0 Size: 8040 Offset: 426120
Child 53 at Page (3:66) Slot 0 Size: 8040 Offset: 434160
Child 54 at Page (3:72) Slot 0 Size: 8040 Offset: 442200
Child 55 at Page (3:10) Slot 0 Size: 2361 Offset: 444561


You see that? It clearly shows that the pages from page 16 all the way up to page 66, are dedicated
for storing the blob data (the .jpg file).
Then, from the output, you can see that pages 3:13, 3:14, and 3:15 are used too, as well as page 3:72.
Actually, page 10 is the "tail" of the blob (only 2361 bytes), but above it is shown clearly that
pages 16-66, and pages 13-15, and pages 10 and 72, are used for the blob.

Evidently, page 12 is a bit "apart" and it looks like the "directory" for the blob.
Actually, it's the root-node (page) and it contains all information on where to find the blob chunks.

If you would dump, for example, page 3:66 with DBCC PAGE, you would indeed see 8040 bytes of binary data of the photo.

Except page 10, each page stores exactly 8040 byte chunks of that photo.
Indeed, SQL Server "returned" to page 10 (after filling all the way up to page 72) to be conservative with pages (and extends).

So, the last bytes of the blob are stored in page 10. But that does not matter: the directory (so to speak)
in page 12, tells SQL Server exactly on which pages all chuncks are located, and which file offset is associated,
so any application can get a perfectly rebuild picture if requested.

Fig 1. Simplified representation of the pages involved in our example.



Maybe figure 1 is helpfull in understanding our example. You see that our (small) EMPLOYEE table exists
in page 8. But, we have loaded a blob, and the very first page of the blob is page 12. This is the root node
(or root page) of the blob, containing the directory (so to speak) which tells SQL Server which page stores what blob chunck.
In the figure, these are the "blue" pages.

Now, maybe you think the page distribution is a bit "random". It's not. We have not discussed "extents" yet,
but SQL Server organizes collections of 8 pages into extents (so each extent consist of 8 contiguous pages).
Do you notice, from the "root node", that the actual blob data starts from page 16?. This is also the start
of the third extent in the file. So actually, it is pretty clean. Then SQL Server starts filling pages as from page 16
as is neccessary. Only the last portion of the blob data, then is stored in the second extent, just done in order
not to waste space. So the second extent, contains a normal regular table (in page 8), and also some pages
containing blob data.

Now, the file "albert.jpg" is 444,561 bytes in size.

How much "space" is then "spend" in SQL Server? Above you can see the answer:

(child 0 up to child 54) x 8040 + (the bytes in page 10) = 55 x 8040 + 2361 = 444,561 bytes.

So, from this, you might say that there is hardly any "overhead" in storing a blob, compared to the filesystem.
Not exactly. First, a page is 8192 bytes, and SQL uses 8040 bytes for storing blob data.
But that's not really much overhead.

The point is, that, as we will see later, that if many blobs are stored, and over time some are deleted and updated
(using the regular applications), some "gaps" will arise, throughout the "extends".
Before discussing this, we need to know how SQL Server organizes it's pages for various purposes.

You can easily "play" this example by yourself. Just create a new database, and the EMPLOYEE table as shown above.
Then, just use a ".jpg" file, like a photo or so, of say, a few hundreds of KB in size.
Next, load the blob into the table (as shown above) and play around a bit with DBCC PAGE().



Chapter 2. Pages, extends, and other structures.

2.1 Structure of a data page, and special pages.

A page is a sort of "atomic" structure in a SQL Server database file (except of the Transaction Log files).
Below, you see a very schematic representation of a Data page, like used with tables.

Fig 2. Simplified representation of a data page.



The page header identifies the page, as to which "object id" it belongs, and some further housekeeping info.
At the end of the page, is the "row offset table". It says, per row, the distance in bytes of those rows,
from the very start of the page. So, the start of any row can be found.

Now, in the figure, you see three example rows, and below that, there exists "free space".
If there is room for new rows, they will simply be added. Now, if at a certain moment, a new row does not "fit"
anymore, a "page split" will occur, and a whole new page will be allocated for this object, and the row
will be stored in that newly allocated page instead.

Note: sometimes the term "page split" is reserved for a situation where a page, which was already "quite full",
and then some record just happened to be updated with data larger than the former data, resulting in the fact that
the records don't "fit" anymore in that page. In this case, SQL Server allocates a new page for the object, and
moves record(s) as neccessary.


So, if you would just have "heaps" of pages, together forming tables, it would be simple indeed.
But this is not how it is organized. We will se that in section 2.2

What types of pages do we have? The most important types are:

=> System pages: only in the first extent, and a small number distributed throughout the file.

PSF: Page Free Space page Information about page allocation and free space available on pages.
GAM: Global Allocation Map and SGAM pages Information about whether extents are allocated.
IAM: Index Allocation Map Information about extents used by a clustered table or index per allocation unit.

=> Special pages: A small number distributed throughout the file.

Bulk Changed Map pages Information about extents modified by bulk operations
since the last BACKUP or BACKUP LOG statement per allocation unit.
Differential Changed Map pages Information about extents that have changed since the last BACKUP DATABASE
or BACKUP DATABASE with Differential statement per allocation unit.

=> Data pages: these are the common pages in the database file.

Data (table) page Used for "normal" tables, with a few exceptions for certain column datatypes like:
text/ntext, image, varbinary(max) and a few others.
Index page Almost "the same" as a data page, except for a few things like pointers.
text/image page Used for text datatypes, or BLOBs

So, in general, the "data" and "index" pages are ofcourse the most common pages in a database, unless you have
stored a lot of BLOBs as well.

About the "system" and "special" pages:

You know, this is just how Microsoft has implemented the physical structure. Ofcourse, a lot of new terms
are introduced, which we really have to discuss first.

As show in figure 1, the first pages in any database file, are system related. So, in the first 8 pages (0-7),
you will never find any of your objects (like regular tables, indexes etc...).

Let's print the second page of the "c:\mssql\data\SALES_DATA_01.ndf" file (this is the file with our table and blob).
This page (page 1) is the "PSF", or "Page Free Space" page.

You can follow along (if you indeed created the database as shown in chapter 1). Just use the DBCC PAGE command again.

Allocation Status

GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED 0_PCT_FULL
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED

PFS: Page Alloc Status @0x000000000C95A000

(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL
(3:8)..............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:9)..............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:10).............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:11).............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL Mixed Ext
(3:16)...- (3:63)..=.....ALLOCATED.100_PCT_FULL
(3:64)...- (3:66)..=.....ALLOCATED.100_PCT_FULL Mixed Ext
(3:67)...- (3:71)..= NOT ALLOCATED...0_PCT_FULL
(3:72).............=.....ALLOCATED.100_PCT_FULL
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL


Here you see the allocation status from page 0 up to page 5119 (we only have used up to page 72).
This low end number of the last page, comes from the fact that we created the databasefiles
with an initial size of 40MB (which is very very small).

- The system pages are:

page 3:0 The fileheader
page 3.1 The PFS
page 3:2 The GAM page
page 3:3 The SGAM page
page 3:4 and 3:5 are not allocated
page 3:6 The DIFF page (related to register extent changes between backups)
page 3:7 The "Minimally Logged Map (ML Map)" page (related to register extent changes with respect to BULK LOGGED operations)

- Then, the "second" extent starts, which can hold user objects.

page 3:8 This happens to be a page of the EMPLOYEE table
page 3:9 An IAM page (Index Allocation Map)
page 3:10 In our case, it happens to contain BLOB data (from albert's photo)
page 3:11 An IAM page (Index Allocation Map)
page 3:12 In our case, it happens to hold the root node page of the BLOB
page 3:13,14,15 blob data.

- Then, as of page 3:16, the third extent starts. Here we find pages with blob data.
etc..

other pages up to page 72: blob chunks.
pages (3:73) up to (3:5119): free pages.

Ok, let's first discuss a few facts about extents.


2.2 Extents.

As we already saw in the former section, SQL Server organize pages in units called "extents".
Each extent consists of 8 contiguous pages.

Some other nummeric facts:
  • Since a page is 8K (8192 bytes), an extent is 64K (65536 bytes) in size.
  • 4GB (4294967296 bytes) space in a database file, can contain 64K (65536) extents
Two main type of extents exists:
  • Uniform (or dedicated) extent: all pages belong to the same object (like an index).
  • Mixed (or shared) extent: the pages can belong to two or more objects.
Fig 3. Uniform and Mixed extents.



Usually,SQL Server allocates multiple uniform extents for each large table.
However, if a table is small, or "begins" small, SQL Server won't allocate an entire extent for it.
Instead it will allocate one or more data pages from a mixed extent. So, a mixed extent can be thought of
as a pool of pages for small objects.

When there are quite a few of small tables and indexes in your database, you might expect a certain
amount of mixed extents. SQL Server ofcourse tries to save and compact space as optimal as possible.
However, quite some smart algolrithms are in use. If an object gets larger than 8 pages, SQL Server tries
to allocate uniform extents to that object, further on, as much as possible.

Also, when you create a new clustered index, or rebuild one, the pages will go on uniform extents as well.
Indexes will be discussed in another section.

How SQL Server "keeps track" of free and occupied extents, will be discussed in the next section.


2.3 System pages.

The most important "system" pages (for internal administration) are located on the first 8 pages
of any database file. However, as you will read below, most of them are repeated at certain intervals.

Fig 4. System pages.




The "GAM" and "SGAM" (Global Allocation Map) pages:

A GAM page registers which extents are totally free, or have been allocated.
Each GAM page covers about 64,000 extents, or about 4 GB of data.

Explanation:

The page has 8192 bytes. Now, the usual page header and GAM header will take some space,
so let's say that 8000 bytes can be used for tracking extents. Now, if a bitmap is used, something like
8000 x 8 bits can be used, so about 64K bits. Each of such a bit, can be used to identify if an extent is totally free,
or if it is already partly allocated (partly or fully used).

  • If the bit is 1, the extent is totally free.
  • If the bit is 0, the extent is (partly) allocated.
So, 64K extents can be "covered" by one GAM page. So, this amounts to about 4GB dataspace.
So, if a datafile is larger than 4GB, at every 4GB interval a GAM bitmap page is needed.

A similar story holds for the SGAM page. Only here, it tracks the following in the bitmap:
If an extent is a mixed extent with at least one page free, the bit is 1.
If an extent is not a mixed extent, or it is a full mixed extent, then the bit is 0.

So, this explains how SQL Server can discriminate between free or (partially) used extents.

As you have seen in the former sections, the first GAM is page 2, and the first SGAM is page 3 in any .ndf file.

The "Page Free Space" (PFS) pages:

This is page 1 in any ordinary .ndf database file, right after the fileheader (page 0).
It registers which pages and page ranges are in use, or are free.
If you have very small database files, then even just one PFS page might be sufficient per file.
This will be explained below.
In our example sales database, we use 40MB sizes, which is ridiculous small ofcourse.

But for larger database files, a PFS page needs to be repeated after about 8000 pages.
This is so, because a PFS does not use a bitmap. The PFS uses one byte for each page, which records whether the page
is allocated or not. So, since the PFS has about 8000 usable bytes for this purpose, other PFS pages are needed in (about)
8000 page intervals.
It needs a byte per page, because it tries to describe for each page, the level of "fullness", like
0_PCT_FULL, 50_PCT_FULL, 100_PCT_FULL (and a few others), so to register that, one bit per page
is not sufficient. So, one byte per page is used.

Here again, you can see a dump of the PFS of the "c:\mssql\data\SALES_DATA_01.ndf" database file,
as used in our example SALES database.

Allocation Status

GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED 0_PCT_FULL
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED

PFS: Page Alloc Status @0x000000000C95A000

(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL
(3:8)..............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:9)..............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:10).............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:11).............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL Mixed Ext
(3:16)...- (3:63)..=.....ALLOCATED.100_PCT_FULL
(3:64)...- (3:66)..=.....ALLOCATED.100_PCT_FULL Mixed Ext
(3:67)...- (3:71)..= NOT ALLOCATED...0_PCT_FULL
(3:72).............=.....ALLOCATED.100_PCT_FULL
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL

The "ML" (or Bulk Changed Map pages) and "DIFF" (Differential Changed Map pages):

=> The Differential Changed Map pages, track which extents have been changed between differential backups.
Ever wondered how SQL Server knows what changes to backup between a Full backup, and the following
differential backups? The differential backups are generally much smaller compared to the full backup.
This is due to the fact that SQL Server registers which extents have been changed. So, unmodified extents
do not need to be backupped between differential backups.

=> The ML pages track which extents are affected with "Bulk logged" operations.

However, both types of pages are not very relevant for our discussion.

The "IAM" pages:

I like to pospone this one, until we have covered Btree and index structures in some more detail.



Chapter 3. Using FILESTREAM for storage of blobs (2008/2012).

Up to now, we have seen one type of BLOB storage, namely "inline" storage, where the blob
really physically is stored on database pages.

If you would ask me: I think the (traditional) inline storage is quite "ok", since all information
(metadata+blobs themselves) is collected in one and the same database. For transactional reasons, and
for availability, this is good.

⇒ Drawbacks of inline storage:

However, if the amount of blobs is very large, DBA's might be confronted with long backup/recovery times.
But that could also be due to the fact that certain appliances have "versioning" switched on, which might result
in the fact that documents (blobs) are stored several times, corresponding to their "versions".
So, if you could throttle that back a little, it might have quite an impact on the database size.

Secondly, performance might be an issue too. But don't forget that possible "middleware/application" Servers
might be involved as well.

A commonly heard phrase is that:"for large blobs", the filesystem has better perfomance over inline storage,
while "for smaller blobs", inline storage offers good performance
. Now, define "large" and "small"....
It seems that Microsoft takes "over" 1MB as "large", and smaller than 1MB as "small".

The suggested improvement in performance, is partly attributed to the fast file IO service of the OS, and
the use of the "NT file cache".

⇒ Alternatives for inline storage:

It's possible to store the blobs on a filesystem (using block IO, as well as file IO).
This means that SQL Server uses tables and views solely for metadata, but the blobs themselves
are not stored in SQL Server, but are files on disks.

There exists third-party solutions for this, as well as Microsoft implementations.
Since SQL 2008, the Microsoft "FILESTREAM" feature became available.
In this chapter, we are going to take a quick look at it's main features.

⇒ What is the FILESTREAM feature then?:

In essence: the DBA needs to create a new "filegroup" with the "filestream clause". This filegroup,
then physically is a folder on a fileystem, where the blobs are going to be saved and accessed.
From a "transactional viewpoint", the filegroup is just a container, accessible using the SQL Server interface
which then garantees consistency. This is OK, but if IO is possible using other methods (using the OS for example),
there might be serious risks for inconsistencies.

Once filestream is enabled and a filestream filegroup is created, you can build tables with a varbinary datatype, using
the "filestream clause" for that column, which makes sure the blobs then are saved to, and accessed from, this special filegroup.

You can "enable" filestream "globally" on the instance level. Per default, it's "off".
However, filestream is a database feature. You can have databases under your instance with no filestream,
and databases with filestream.


3.1 Enabling "Filestream".

=> While installing SQL Server:

When you install SQL Server, somewhere "halfway", there is an option to configure the database engine.
You have to watch carefully, if you would do an interactive install, because it's easy to overlook this option.
See figure 5.

Fig 5. Enabling the FILESTREAM feature at the installation of SQL Server.



=> After SQL Server already was installed:

- Using a graphical utility:

If SQL Server was already installed, you can still use the "SQL Server Configuration Manager"
to enable the filestream feature. Start the utility, rightclick the "SQL Server service", and choose "properties".
In the dialogbox that will show up, you can choose for several settings.

Fig 6. Enabling the FILESTREAM feature using the "SQL Server Configuration Manager".



- Using TSQL:

Instead of using the graphical utility, you may use TSQL as well, to enable the Filestream feature:

USE master
Go
EXEC sp_configure 'show advanced options'
GO
EXEC sp_configure filestream_access_level, 1
GO
RECONFIGURE WITH OVERRIDE
GO

-- the possible options 0 (none), or 1 (TSQL access), or 2 (TSQL access, and file I/O streaming access)
-- will be explained below.


=> Configuring the Filestream settings:

Note from figures 5 and 6, that you can configure Filestream for various settings.
Although those figures show the same possible configurations, it's most clear from figure 6.
  • Option 1: "Enable FILESTREAM for Transact-SQL access"
    Here you limit access to the blobs using TSQL only. This is the safest way to go, albeit (seemingly)
    not the most flexible option.
  • Option 2: "Enable FILESTREAM for file I/O streaming access"
    If you want this too, then TSQL access is enabled, and "file i/o access" is enabled.
    This is very flexible, but you need to do some further research here.
  • Option 3: "Allow remote clients to have streaming access to FILESTREAM data"
    If you want this, then TSQL access is enabled, and "share access" is enabled.
    Furthermore any client can, in principle, access the share. You really need to do some further research here.

From a "transactional viewpoint", the filegroup is just a container, accessible using the SQL Server interface
which then garantees consistency, if you would use the first and second options.
In this case, you can use TSQL, and also use Win32 APIs to work with the blobs. For example, the "columnname.pathname()"
method (columnname of the varbinary column), can provide a handle to a file, and further operations can take place.
One important consequence is thus that Applications can use streaming APIs and performance of the file system
and at the same time maintain transactional consistency between the unstructured data (the files) and the
corresponding structured data, that is, the other fields of the table, and all optionally related tables.

If you enable the third option, so using a share access for remote clients, I would say that a fully garanteed
consistency is "at risk", unless the applications in use are truly "ironclad". You need to do more research (if you are interrested).
So, I guess I try to say that there might be security issues as well as transactional consistency issues.


3.2 Implementing "Filestream".

Lets try to add a filestream tablespace to our SALES database.
I suggest we make a suitable folder first. I choose to create a folder "c:\fsblobs".
Inside that, I want to have a "c:\fsblobs\documents" folder. But do not create this one.
That then will be the folder/container that's associated with the new filestream tablespace.

Create the "c:\fsblobs", but do not create the second folder "c:\fsblobs\documents", because SQL
will do that for us (it does not expect an existing directory).

So, here we go:

ALTER DATABASE SALES ADD FILEGROUP Documents CONTAINS FILESTREAM
GO
ALTER DATABASE SALES ADD FILE (NAME='Documents', FILENAME='c:\fsblobs\documents') TO FILEGROUP Documents
GO

In my case, it succeeded. Let's see what happened on the filesystem:

C:\> cd fsblobs
C:\fsblobs> cd d*

C:\fsblobs\documents>dir

30.11.2012 20:14 DIR $FSLOG
30.11.2012 20:14 422 filestream.hdr

C:\fsblobs\documents>cd $*

C:\fsblobs\documents\$FSLOG>dir
30.11.2012 20:14 DIR .
30.11.2012 20:14 DIR ..

Ofcourse, the folder is still empty. We have not stored anything yet "in" the filestream filegroup.

Note the "$FSLOG" directory. This is a sort of a "transaction log" (container) for events on blobs that are going
to be stored in the filestream container.

As we will see later on, anytime you create a table using filestream for blob storage, we will see a subfolder added
in the form of "C:\fsblobs\documents\guid" like for example "C:\fsblobs\documents\25892e17-80f6-415f-9c65-7395632f0223."
Then, inside such a GUID named container, we will find the blobs associated with that table. Later more on this.

Note:
The "$FSLOG" directory is (ofcourse) very sensitive to "foreign" files. If, on a test system, you place any object there
it will have an effect to the status of the database of which this filestream container is associated with.
You might even end up with a Suspect database. It's easy to recover from this, ofcourse.
But it's an illustration that any interactive access to filestream containers (e.g. using OS commands) must be avoided.


Let's now see what SQL Server "thinks" what we have as database files:

SELECT file_id, type_desc, name, physical_name FROM sys.database_files

file_id...type_desc.....name............physical_name
1.........ROWS..........SALES...........c:\mssql\data\SALES.mdf
2.........LOG...........SALES_LOG_001...c:\mssql\data\SALES_LOG_001.ldf
3.........ROWS..........SALES_DATA_01...c:\mssql\data\SALES_DATA_01.ndf
4.........ROWS..........SALES_INDEX_01..c:\mssql\data\SALES_INDEX_01.ndf
65537.....FILESTREAM....Documents.......c:\fsblobs\documents

So, the regular database files are always of "type" ROWS, or LOG (in case of transactionlog files).
Indeed, we now have a new type of file, of type FILESTREAM, associated with the physical location "c:\fsblobs\documents".


What storage can be used:

The disk(s) that hold the filestream containers does not need to be local disks.
They can easily be LUNs from a SAN as well.
However, all filesystems should be NTFS formatted.

There are many other considerations, especially for obtaining the best performance.
For example, cluster (block) size can be important, as well as the RAID level, to name a few.


3.3 Adding a blob.

Let's create a DOCS table, with some regular datatypes, and ofcourse a datatype of datatype varbinary(max).

CREATE TABLE DOCS
(
  FileStreamID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
  DOC_EXTENSION VARCHAR(10),
  DOC_NAME VARCHAR(256),
  DOCUMENT VARBINARY(MAX) FILESTREAM
) FILESTREAM_ON Documents
GO

Let's take a look at the columns I took here.

=> The "DOC_EXTENSION" and "DOC_NAME" are simply optional. They are just there for informational purposes.If you want,
we could have left those out. It's only that it might be handy to have the "document name" (blobname), and
the file extension (like. jpg or .xls) registered in the table as well.

=> The name "FileStreamID" is not a required columnname. It could just as well have been named "id" or another reasonable name.
But we must have such a field, and it must be of a "uniqueidentifier" datatype.
This is a 16-byte binary value that should actually function as a "World Wide globally unique identifier" (GUID).

As we will see, it is used to uniquely determine the file associated with that record, on the filesystem.
Often, we use the NEWID() function, or NEWSEQUENTIALID() function, to let SQL Server itself automatically generate GUIDs
for any new record. Note that we have used "DEFAULT NEWSEQUENTIALID()" as a default, so SQL Server will handle it
automatically, if we (or an application) do not provide a GUID for a new record.

As "UNIQUEIDENTIFIER" should result in unique identifiers anyway, you might wonder what the "ROWGUIDCOL" is doing here.
I believe it is not absolutely neccessary, but the efficiency gets up by using UNIQUEIDENTIFIER with the ROWGUIDCOL property.

=> Lastly, we have a field (DOCUMENT) which is of datatype "varbinary", and this one refers to our blob.
Note that in this column declaration, the clause "FILESTREAM" is neccessary to inform SQL Server that
we are going to use the FILESTREAM feature for storage of blobs.


Suppose that we have the file (or the "blob") c:\temp\sales.xls. We are going to store this as a blob in our
"Documents" filestream filegroup (which actually is the "c:\fsblobs\documents" container).

Take a look at the following TSQL:

INSERT INTO DOCS (DOC_EXTENSION, DOC_NAME, Document)
SELECT
'xls' AS DOC_EXTENSION,
'sales.xls' AS DOC_NAME,
* FROM OPENROWSET(BULK 'c:\temp\sales.xls', SINGLE_BLOB) AS Document
GO

Now, let's see what record we have in the table DOCS:

SELECT * FROM DOCS

FileStreamID............................DOC_EXTENSION....DOC_NAME.........DOCUMENT
6423DA82-5523-E211-B687-000AE4B3F060... xls..............sales.xls........0xD0CF11E0A1B11 (etc)



3.4 Analysis of blob storage.

⇒ Let's take a look at the database pages first:

The "sales.xls" file, I loaded "into" the DOCS table (in reality in the filestream tablespace),
is 8921 KB (circa 9MB) in size, which qualifies as a large BLOB.

In the chapters above, we have seen which pages in the SALES database were allocated, after (only) loading
an employee photo (albert.jpg). You know how to do that using the DBCC PAGE statement.
We know that after page 3:72, all pages were "free". Here is a partial output again:

- Situation before filestream and before loading "sales.xls":

Allocation Status

GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED 0_PCT_FULL
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED

PFS: Page Alloc Status @0x000000000C95A000

(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL
(3:8)..............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:9)..............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:10).............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:11).............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL Mixed Ext
(3:16)...- (3:63)..=.....ALLOCATED.100_PCT_FULL
(3:64)...- (3:66)..=.....ALLOCATED.100_PCT_FULL Mixed Ext
(3:67)...- (3:71)..= NOT ALLOCATED...0_PCT_FULL
(3:72).............=.....ALLOCATED.100_PCT_FULL
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL

Now, we have "loaded" are large blob, but it should NOT have been loaded into database pages.
The blob "sales.xls" is supposed to live on the filesystem. So let's dump the PFS page again:

- Situation after enabling filestream and after loading "sales.xls":

(Allocation Status

GAM (3:2) = ALLOCATED....SGAM (3:3) = NOT ALLOCATED....PFS (3:1) = 0x40 ALLOCATED 0_PCT_FULL
DIFF(3:6) = CHANGED........ML (3:7) = NOT MIN_LOGGED

PFS: Page Alloc Status @0x000000000C95A000

(3:0)....- (3:3)...=.....ALLOCATED...0_PCT_FULL
(3:4)....- (3:5)...= NOT ALLOCATED...0_PCT_FULL
(3:6)....- (3:7)...=.....ALLOCATED...0_PCT_FULL
(3:8)..............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:9)..............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:10).............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:11).............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:12)...- (3:15)..=.....ALLOCATED.100_PCT_FULL Mixed Ext
(3:16)...- (3:63)..=.....ALLOCATED 100_PCT_FULL
(3:64)...- (3:66)..=.....ALLOCATED 100_PCT_FULL Mixed Ext
(3:67).............=.....ALLOCATED..50_PCT_FULL Mixed Ext
(3:68).............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:69).............=.....ALLOCATED...0_PCT_FULL Mixed Ext
(3:70).............=.....ALLOCATED...0_PCT_FULL IAM Page Mixed Ext
(3:71).............= NOT ALLOCATED...0_PCT_FULL
(3:72).............=.....ALLOCATED.100_PCT_FULL
(3:73)...- (3:5119)= NOT ALLOCATED...0_PCT_FULL

From this, it is easy to conclude that our sales.xls blob (8MB in size), is not stored inside the database.
If it would, then about 1000 database pages would have been allocated, which is not the case.
You can see from the output, that the range of pages (3:73) - (3:5119) are still free, as they were before.

So, the blob is not in the database. Now let's take a look how it is organized in "c:\fsblobs\documents",
which is the container of our filestream filegroup.

Note: there are just a few changes though, like the storage of the DOCS table in page 3:67, and
the two new IAM pages. But there are no new pages due to blob storage. We will come back on the IAM later on.


⇒ Let's take a look at the filesystem:

On a Test system, you could browse around through the subdirs within the "c:\fsblobs\documents" directory.
Don't do that on anything labeled "production".

In the figure below, you see how the storage is organized.

Fig 7. Folder structure of the FILESTREAM container/filegroup after storing one blob.



Here you see my "sales.xls" blob, with exactly the same size as the original file.
Note the different levels of the subdirectories, each named as a GUID identifier.

The "upper" GUID, represent the DOCS table.
The "lower" GUID, represent the "DOCUMENT" column of the DOCS table.

Let's see what happens if we load a second blob "into" the DOCS table.
This time, we use the "sales2.xls" file, with a size of 5283KB, stored in "c:\temp".
To load in into the filestream container, we can use:

INSERT INTO DOCS (DOC_EXTENSION, DOC_NAME, Document)
SELECT
'xls' AS DOC_EXTENSION,
'sales2.xls' AS DOC_NAME,
* FROM OPENROWSET(BULK 'c:\temp\sales2.xls', SINGLE_BLOB) AS Document
GO


Fig 8. After storing the second blob.




TSQL and Win32 API:

- TSQL:

Using TSQL, you have full control on columns with blob data. You can use INSERT, UPDATE, DELETE
on tables, in the "usual" way. However, if you would delete a row with binary data,
you probably will not see that the object is removed from the filestream folder immediately.
First, the object is "tombstoned" and a sort of "garbage collector" will remove it permanently later.
Sometimes, it is observed that this can take quite a while.

There are some "best practices" in using TSQL in relation to tables with blobs.
See chapter 4.

- Programmatic API's to Filestream:

Using C#, VB#, C++ etc.., you can access the blobs "in a neat way" using SQL Server
services. This means that manipulating blobs, using the filestream file API is possible.
This also means that authentication to SQL Server have occurred.

Usually, a handle to the object is aquired. Such a handle can be derived from a sort UNC path
to the object. Take a look at this example. Although it's SQL, it shows how a handle
can be obtained. Similar code can be placed in other developing environments.

DECLARE @uncpath varchar(max)

SELECT @uncpath = Document.PathName()
FROM DOCS
WHERE DOC_NAME = 'sales.xls'

PRINT @uncpath

\\W2K8srv1\MSSQLSERVER\v1\SALES\dbo\DOCS\DOCUMENT\FB30ECE5-A33C-E211-8FDC-F04DA2915E69


Some Further remarks:

Up to now, we have seen two types of blob "storage":

- "Inline", that is, the blobs are really stored inside the database, in database pages.
- "Filestream", where blobs are stored in the filesystem.

Another option is the SQL 2012 "File Tables" option, which is actually a nifty refinement of the Filestream feature.

Other options to store blobs on the filesystem exists as well, like RBS of Microsoft, or third-party proprierty solutions
which sometimes can be observed in certain document workflow appliances.

If you would be in a situation to select a storage model, then ultimately the Application that will
be used to accesss the objects, should be "primary" in your decision.
Anyway, if you are in such a situation, you got a lot of research to do.

However, in many documents and blogs you will find the "general" advice to store objects over 1MB
on the filesystem, and smaller objects inline.

Ofcourse, what a certain "Albert" says is not very relevant, but my two cents are:

I was never very dissapointed with inline storage, with respect to general performance,
and throughput of backup/recovery. So, if the database is to be expected not to grow very large,
the inline option stays appealing to me.
Furthermore, I was never too keen about a "split" situation where one part exists on the filesystem
and other parts inside the database.

But, an application can favour one model over the other, so it's probably best to follow the applications favourite.

However, many well-known SQL authorities advise this:
Use the filesystem for large objects, and inline for smaller blobs.



Chapter 4. Some methods for storage and retrieval of blobs.