SQL Server vs PostgreSQL: fighting for resources in the same server


Notice: This is not the typical performance’s comparative between SQL Server and PostgreSQL. I only test their performance’s affectation when they run in the same server, that I would never do.

We were asked: “Is it possible to run SQL Server and a PostgreSQL in the same server?”

Obviously, the server would run under Windows. The first thought was: no.

We said that because we would experience problems with I/O and memory pressure, due to the high use of disk and memory that is typical on. We were asked to prove this.


We decide to use the DELL’s DVD store benchmark. It’s a benchmark that simulates a DVD store with a simple OLTP database. The benchmark can be run on several databases and SQL Server and PostgreSQL are among them.

We want to see how they would behave when working together, compared to when they’re working alone. It’s not a SQL Server vs. PostgreSQL benchmark because we don’t want to go deep on how DVD store is build, DB drivers’ performance and so on.

We create a DB of 10GB in each RDBM.


A brief description about the server:

  • Intel Xeon E3-1220 3.1Ghz
  • Memory: 8 GB
  • RAID controller: LSI Adapter SAS2 2004 Spitfire with 2 SATA disks in RAID 1.


  • Microsoft Windows Server 2008R2 Standard SP1 64-bits
  • SQL Server 2008 Workstation Edition
  • PostgreSQL 9.2.4


After install the RDBM it’s time to install DVD Store:

For SQL Server, I use this article. It’s very comprehensive.

For PostgreSQL in Windows I haven’t found something similar so I do it on my own, finding this issues:

  • The script for creating the database and importing the data is for Linux bash pgsqlds2_create_all.sh so I port it to Windows, pgsqlds2_create_all.bat,.
  • Executing ds2pgsqldriver.exe I get a runtime error.

To solve the runtime error in ds2pgsqldriver.exe I modify the source code: in file c:\ds2\pgsqlds2\ds2pgsqlfns.cs at line 209 I change this block:

      int i_row = 0;
      if ((customerid_out > 0) && Rdr.NextResult())
        while (Rdr.Read())
          title_out[i_row] = Rdr.GetString(0);
          actor_out[i_row] = Rdr.GetString(1);
          related_title_out[i_row] = Rdr.GetString(2);

For this one:

      int i_row = 0;
           if ((customerid_out > 0) && Rdr.NextResult())
                while (Rdr.Read() && i_row < title_out.Length)
                       title_out[i_row] = Rdr.GetString(0);
                       actor_out[i_row] = Rdr.GetString(1);
                       related_title_out[i_row] = Rdr.GetString(2);
           rows_returned = i_row;
      catch (Exception e)
           rows_returned = i_row;

           Console.WriteLine("Length of arrays: {0,3}, {1,3}, {2,3}", title_out.Length, actor_out.Length, related_title_out.Length);
           Console.WriteLine("i_row value {0}", i_row);
           Console.WriteLine("Thread {0}: Error inside login: {1}", Thread.CurrentThread.Name, e.Message);
           return (false);

And I compile it:

csc.exe /out:ds2pgsqldriver.exe ds2pgsqlfns.cs c:\ds2\drivers\ds2xdriver.cs /d:WIN_32_TIMER /d:GEN_PERF_CTRS "/r:c:\ds2\drivers\Npgsql.dll"


We execute them with the following parameters:

c:\ds2\pgsqlds2\ds2pgsqldriver.exe --target= --run_time=20 --db_size=10GB --n_threads=40 --ramp_rate=10 --pct_newcustomers=0 --warmup_time=0 --think_time=0 > c:\Temp\testresults.txt 2>&1
c:\ds2\sqlserverds2\ds2sqlserverdriver.exe --target=localhost --run_time=20 --db_size=10GB --n_threads=40 --ramp_rate=10 --pct_newcustomers=0 --warmup_time=0 --think_time=0 > c:\Temp\testresults_sqlserver.txt 2>&1

The benchmark result is at the end of the output file:

n_purchase_from_start= 206466 n_rollbacks_from_start= 4110


Case SQL Server PostgreSQL Total % Performance decrease
Only SQL Server



Only PostgreSQL




Both servers: PostgreSQL y SQL Server





Performance decrease





The numbers are the results of the benchmark. Higher are better.

When running alone, SQL Server is much better than PostgreSQL, but as I said before it is not our goal and it can be for a lot of reasons, for example better implementation of DVD store or better drivers. But still, I’m impressed to see that difference. Test PostgreSQL in Linux is beyond the scope of our analysis, but I’m sure that it would improve a lot.


When working together, SQL Server is the winner, it only decreases 23% but PostgreSQL decreases 82% when they run together.

Here we’ve to notice that SQL Server is a native Windows application, while PostgreSQL is multi platform and probably this compatibility across multiple OS is paid with the efficiency in Windows. So, installing SQL Server and PostgreSQL in the same server, when there’s some workload, the PostgreSQL have important performance degradation.

Learning in Internet’s era

In the last post of the year I’ll talk about knowledge and how we are continuously learning. The idea of the topic comes from a Packt Publishing offer: selling books for Xmas for $5.

It doesn’t matter our job, it’s changing, so we’ve to evolve too. In fact, I don’t know any job that you don’t have to improve your skills continuously usually due to technology improvements. That’s why I’ve always believed that we have to learn during all our life. In fact I also like to learn things that have nothing to do with my career.

But how way learn has changed a lot. Before Internet people had to go to libraries and get some books. With Internet, a lot of information has been available. First in personal blogs, later on online communities like SQLServerCentral or StackOverflow. I remember the first time I found SQLServerCentral, that I was amazed by the high quality of its articles and comments. I learned a lot.

Last year a workmate show me Coursera and I other MOOC (Massive Open Online Courses) like Khan Academy, Codecademy, edX and Udacity. That’s a very big step. I made some really great courses in Coursera:

In general all courses are free, but you can find at Udacity some courses that you’ve to pay for them.
This is a very important point: if the courses are free what is really valuable is your time. If you learn below what you expect, you can just stop doing it. If you pay for it, probably you’ll continue doing it even if it’s not as good as you would desire. Once I paid for one and I couldn’t finish it: I was loosing my time, but I had already lost my money. Probably the future is having the two or three weeks free at the beginning of the courses, so you can decide if it is worth paying it or not.

And now it comes Packt Publishing with this offer: any e-book for $5. I wouldn’t have believed some years ago to have books at $5. Of course it’s an e-book, not a physical book, but it’s acknowledgment for very little money compared to some years ago. May be other publishers will join with this kind of offers. Sell a lot for a low-cost or sell much less for a more expensive price.

Anyway, knowledge and learning is very important nowadays. Now there’s no barriers. Everybody can learn.

SETVAL for all sequences in a schema

In PostgreSQL, when you’re working with sequences, if you insert a future value due to the incrementing values, you will get an error when that value is going to be inserted. I like much more how SQL Server handles autoincrement columns with its IDENTITY property, that would be like the sequences linked to a table like SERIAL, but it’s much more restrictive and by default you cannot INSERT a register specifying the value of this column as you can do with PostgreSQL.

The PostgreSQL setval() function, explained in Sequence Manipulation Functions, is the way that PostgreSQL has to change the value of a sequence. But only accepts one table as a parameter. So, if you need to set all the sequences in a schema to the max(id) of every table, you can do can use the following script, based on Updating sequence values from table select.

CREATE OR REPLACE FUNCTION setval_schema(schema_name name, raise_notice boolean = false)
-- Sets all the sequences in the schema "schema_name" to the max(id) of every table

    row_data RECORD;
    sql_code TEXT;

    IF ((SELECT COUNT(*) FROM pg_namespace WHERE nspname = schema_name) = 0) THEN
        RAISE EXCEPTION 'The schema "%" does not exist', schema_name;
    END IF;

    FOR sql_code IN
        SELECT 'SELECT SETVAL(' ||quote_literal(N.nspname || '.' || S.relname)|| ', MAX(' ||quote_ident(C.attname)|| ') ) FROM ' || quote_ident(N.nspname) || '.' || quote_ident(T.relname)|| ';' AS sql_code
            FROM pg_class AS S
            INNER JOIN pg_depend AS D ON S.oid = D.objid
            INNER JOIN pg_class AS T ON D.refobjid = T.oid
            INNER JOIN pg_attribute AS C ON D.refobjid = C.attrelid AND D.refobjsubid = C.attnum
            INNER JOIN pg_namespace N ON N.oid = S.relnamespace
            WHERE S.relkind = 'S' AND N.nspname = schema_name
            ORDER BY S.relname
        IF (raise_notice) THEN
            RAISE NOTICE 'sql_code: %', sql_code;
        END IF;
        EXECUTE sql_code;


Example to execute setval to all sequences in a schema:

SELECT setval_schema('public');

Example to execute setval to all sequences in all user schemas in the database, showing the sentences that are being executed:

SELECT setval_schema(nspname, true)
FROM pg_namespace
WHERE nspname !~ '^pg_.*' AND nspname <> 'information_schema';

Full example

In this example, a table is created and some registers are inserted. The 3rd insert “Third Value – Jumping” is forced to id = 7 instead of using the sequence. But the sequence hasn’t been modified so when it would arrive to 7 it would get an error because the value already exists.

CREATE TABLE test_setval
    id serial NOT NULL,
    info text NOT NULL,
    CONSTRAINT test_setval_pkey PRIMARY KEY (id)

INSERT INTO test_setval (info) VALUES ('First Value');
INSERT INTO test_setval (info) VALUES ('Second Value');
INSERT INTO test_setval (id, info) VALUES (7,'Third Value - Jumping');
INSERT INTO test_setval (info) VALUES ('Forth Value');

-- 'Forth Value' is inserted with id = 3
SELECT * FROM test_setval;

select after insert Forth Value

 SELECT setval_schema('public', true);

INSERT INTO test_setval (info) VALUES ('Fifth Value - after setval');

-- You can see how the data has been inserted in the same sequencial order:
SELECT * FROM test_setval;


-- or sortering it:
SELECT * FROM test_setval ORDER BY id;

select order by

You can download the all code from here.

Review of Getting Started with OrientDB book

Disclaimer: I was asked to review the Getting Started with OrientDB book and I received a free ebook copy of it.

OrientDB is defined in its website as: an Open Source GraphDB with a mix of features taken from Document Databases and Object Orientation. It’s a new and promising database. But it’s has little documentation and it’s quite chaotic to find what you’re looking for and it’s spread out in multiple places. Maybe I’m used to the amazing PostgreSQL documentation!

Getting Started with OrientDB is written by Claudio Tesoriero and it has been published on August 2013. It’s the only OrientDB book out there. At least it’s the only book listed in OrientDB website and the only one I have found.

My very first surprise is that it has 138 pages, below the average for a computer engineer book. So, as the title says, it’s a getting starter guide, an introduction to OrientDB. Probably this is the main cause why in many times I would have liked more information about some topics.

The book gives a wide, but not in depth, introduction to OrientDB: installation, architecture, administration, querying, performance tuning and advanced features like clustering.

 I would have changed the order of the book: first it discusses administration and later on programming, while I would have done the other way around because you should already be familiar with classes, properties and querying data before starting the administration part.

I also find that some information is in the wrong place, like:

  • Classes and abstract classes are explained between OrientDB’s data files and security roles, when the best place would have been in programming part.
  • The OrientDB console and the OrientDB Studio, the web console, are explained in the same chapter, but there’s other stuff in the middle, instead of being together.
  • How to install OrientDB as a daemon/service should be in Advanced Feature and not in the first chapter.

OrientDB is itself confusing when it talks about the editions (Graph and Standard) and the types of databases it has (graph and document), you’ve to read a little before knowing the difference and the book could have helped more on it and also to explain the differences between the graph and the document database in depth.

On the other hand, I like the Programming chapter, where the author explains with good and complete examples the document database and trying with these examples is enough to feel confident with the document database, but it covers the graph database with just 4 pages. In this chapter you also get an introduction how to configure the JDBC, use the Java API and use the RESTful APIs.

It’s also interesting the Performance Tuning and Advanced Feature chapters, where clustering it’s explained. They are interesting chapters, although sometimes I would have enjoyed more detailed information, for example when the author is explaining how OrientDB uses NIO to map data files in memory or the network.socketBufferSize property.


As I said, there are two things that would have made it a much better book:

  • The order of the chapters should have been different and some information rearranged.
  • More information on some points, but 138 pages are not enough for explaining OrientDB in depth.

There’s always the question if it’s worth buying the book. Although these two big complains, I would have bought the book because:

  • The book explains a lot of things that are not available in OrientDB’s documentation.
  • I would have saved a lot of time instead of looking into the little and bad organized OrientDB’s documentation.

Hidden cartesian product


There’s a project in my company that has a dedicated PostgreSQL virtual server with 12 cores and 32 GB RAM and it’s exclusively being used by one developer. He told me that he had an error with a query and checking the log file I could find: “ERROR: could not write block XXXX of temporary file: No space left”. But there were 27 GB free.
I asked him to run the query again to be able to see what was going on.


The directory of PostgreSQL /data/pg_stat_tmp started growing until fill up the 27 GB! and PostgreSQL gave the out of space error again. Amazing.

The query had the typical INNER JOIN between a Products table and Orders table, with an aggregation on Orders table. I was really surprised for this behaviour. Here is the size (MB and rows) of the tables:

  • public.Orders: 314 MB and 4.242.561 rows.
  • public.Products: 189 MB and 1.108.514 rows.

The query plan was quite suspicious because I wasn’t expectiong a Nested Loop:

I changed the query doing the aggregation in a Common Table Expression and with the result I made the JOIN. It was just a fast test to see if I could find any clue about what was going on. It worked: 66 seconds. The query plan had the expected MERGE JOIN but it was doing a TABLE SCAN instead of using the indexes!

But 66 seconds was so much. So, diving into the problem I found that instead of NULL values, there were in both tables ‘1’ (number one). So we had a lot of ‘1’ in the joining columns! To be exact:

  • public.Orders: 318.884 rows. 7,5% of total rows.
  • public.Products: 184.324 rows.  16,6% of total rows.

So, we’ve: 318884 * 184324 = 58,777,974,416 rows!!

Yes, it’s like a cartasian product!


Of course the obvious one in this case was to change the ones for nulls.

This is a development environtment and it’s far from critical. But in production, if users can make queries that are not controled by an application, we can set statement_timeout. It can be set globally in postgresql.conf or in connection basis.

Get-Hash, a powershell hash function

I needed a hash function that I could pass a text string or a file to compute its hash (MD5, SHA1, etc…). But unfortunately Powershell hasn’t got one, so I started to look for alternatives:

  • There’s a Get-Hash cmdlet (function embedded in a DLL written in C#) in PowerShell Community Extensions (PSCX) module.
  • I found two small tools command-line tools:
  • There’s an interesting post called Using Powershell for MD5 Checksums  in Brian Hartsock’s Blog that explains how to calculate directly in Powershell using NET Framework classes.

Any of these options exactly matched my needs:

  • PSCX is an amazing module, but I didn’t want to install it in production servers just to use one cmdlet.
  • The two small tools do they work without flawness, althought I prefer to have as less external applications as possible. I should parse the result and I prefer to have everything in Powershell.
  • Brian Hartsock’s post is a good point of start. I only misses the possibility to compute file’s hashes.

So based on Brian Hartsock’s post I implemented this function for Powershell 2.0 (because I use parameter definition and try…catch in it) that gets the hash value of a file or a text string. It uses HashAlgorithm Class  and FileStream Class. Both classes are supported since NET Framework 1.0.

The Get-Hash function

The following code can also be downloaded from here.

Gets the hash value of a file or string

Gets the hash value of a file or string
It uses System.Security.Cryptography.HashAlgorithm (http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.aspx)
and FileStream Class (http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx)
Based on: http://blog.brianhartsock.com/2008/12/13/using-powershell-for-md5-checksums/ and some ideas on Microsoft Online Help

Be aware, to avoid confusions, that if you use the pipeline, the behaviour is the same as using -Text, not -File

File to get the hash from.

Text string to get the hash from

.PARAMETER Algorithm
Type of hash algorithm to use. Default is SHA1

C:\PS> Get-Hash "hello_world.txt"
Gets the SHA1 from myFile.txt file. When there's no explicit parameter, it uses -File

Get-Hash -File "C:\temp\hello_world.txt"
Gets the SHA1 from myFile.txt file

C:\PS> Get-Hash -Algorithm "MD5" -Text "Hello Wold!"
Gets the MD5 from a string

C:\PS> "Hello Wold!" | Get-Hash
We can pass a string throught the pipeline

Get-Content "c:\temp\hello_world.txt" | Get-Hash
It gets the string from Get-Content

Get-ChildItem "C:\temp\*.txt" | %{ Write-Output "File: $($_)   has this hash: $(Get-Hash $_)" }
This is a more complex example gets the hash of all "*.tmp" files

DBA daily stuff (http://dbadailystuff.com) by Josep Martínez Vilà
Licensed under a Creative Commons Attribution 3.0 Unported License

Original post: http://dbadailystuff.com/2013/03/11/get-hash-a-powershell-hash-function/
function Get-Hash
          [parameter(Mandatory=$true, ValueFromPipeline=$true, ParameterSetName="set1")]
          [parameter(Position=0, Mandatory=$true, ValueFromPipeline=$false, ParameterSetName="set2")]
          $file = "",
          [parameter(Mandatory=$false, ValueFromPipeline=$false)]
          [ValidateSet("MD5", "SHA", "SHA1", "SHA-256", "SHA-384", "SHA-512")]
          $algorithm = "SHA1"
          $hashAlgorithm = [System.Security.Cryptography.HashAlgorithm]::Create($algorithm)
          $md5StringBuilder = New-Object System.Text.StringBuilder 50
          $ue = New-Object System.Text.UTF8Encoding

          if ($file){
               try {
                    if (!(Test-Path -literalpath $file)){
                         throw "Test-Path returned false."
               catch {
                    throw "Get-Hash - File not found or without permisions: [$file]. $_"
               try {
                    [System.IO.FileStream]$fileStream = [System.IO.File]::Open($file, [System.IO.FileMode]::Open);
                    $hashAlgorithm.ComputeHash($fileStream) | % { [void] $md5StringBuilder.Append($_.ToString("x2")) }
               catch {
                    throw "Get-Hash - Error reading or hashing the file: [$file]"
               finally {
          else {
               $hashAlgorithm.ComputeHash($ue.GetBytes($text)) | % { [void] $md5StringBuilder.Append($_.ToString("x2")) }

          return $md5StringBuilder.ToString()

Some examples how to call it

Here are some examples on how to use Get-Hash. Note that node.exe is a binary file.

Write-Output "`nSome examples how to call it:"
Get-Hash "c:\temp\myScriptFile.sql"
Get-Hash "c:\temp\br[a{ets.txt"
Get-Hash "c:\temp\node.exe"
Get-Hash -Algorithm "MD5" -Text "A MD5 checksum!"

Write-Output "`nFour hello world examples that return the same hash value:"
Get-Hash "c:\temp\hello_world.txt"
"Hello Wold!" | Get-Hash
Get-Hash -Text "Hello Wold!"
Get-Content "c:\temp\hello_world.txt" | Get-Hash
Get-ChildItem "C:\temp\*world*.txt" | %{ Write-Output "File: $($_)   has this hash: $(Get-Hash $_)" }


Remove IDENTITY property from a primary key column in SQL Server

When there’s a table with a primary key that is an IDENTITY and you want to remove the IDENTITY property, there’s no simple SQL code to do it. It is not a trivial task. In fact, there’re many steps to be done and it’s better to do all of them inside a single transaction, to avoid to leave the database in a inconsistent way (without FK, or PKs).


Steps to be done:

  • Add a new temporary column.
  • Update the new column with the same values.
  • Set the new column as NOT NULL.
  • Drop Foreign Keys Constraints.
  • Drop Primary Key.
  • Drop IDENTITY column.
  • Rename the new column with the name of the old one.
  • Add new Primary Key.
  • Create again constraints (without rechecking them).

An example

In AdventureWorks database, there’s a master table called [Person].[AddressType] that has only 6 Rows.
All master tables in my databases cannot have IDENTITYs because the values can be inserted in different order depending on the server. So, we’re going to remove it.

To compare the differences between the original table definition and the result after the process explained, I have copied the tables’ schema, where we can see that two things have changed:

  • There’s no IDENTITY.
  • The order of the column in the definition has changed.
-- The original table:
CREATE TABLE [Person].[AddressType](
 [AddressTypeID] [int] IDENTITY(1,1) NOT NULL,
 [Name] [dbo].[Name] NOT NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT [DF_AddressType_rowguid]  DEFAULT (newid()),
 [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_AddressType_ModifiedDate]  DEFAULT (getdate()),
 [AddressTypeID] ASC

-- The result after removing the IDENTITY property:
CREATE TABLE [Person].[AddressType](
 [Name] [dbo].[Name] NOT NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT [DF_AddressType_rowguid]  DEFAULT (newid()),
 [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_AddressType_ModifiedDate]  DEFAULT (getdate()),
 [AddressTypeID] [int] NOT NULL,
 [AddressTypeID] ASC

In our example, it’s important to note that the Person.AdressType has references to two tables:

  • [Purchasing].[VendorAddress].[AddressTypeID] named [FK_VendorAddress_AddressType_AddressTypeID]
  • [Sales].[CustomerAddress].[AddressTypeID] named [FK_CustomerAddress_AddressType_AddressTypeID]

The script is, as explained before, inside a transactions and the looks like this:


    -- Adding a new temporary column
    ALTER TABLE Person.AddressType
        ADD [new_AddressTypeID] INT NULL;

    -- Updating the new column with the values sorted as we want
    EXEC sp_executesql N'UPDATE Person.AddressType SET [new_AddressTypeID] = AddressTypeID'

    -- Setting the new column as NOT NULL
    ALTER TABLE Person.AddressType
        ALTER COLUMN [new_AddressTypeID] INT NOT NULL;

    -- Disable Foreign Keys Constraints
    ALTER TABLE [Purchasing].[VendorAddress]
        DROP CONSTRAINT [FK_VendorAddress_AddressType_AddressTypeID]
    ALTER TABLE [Sales].[CustomerAddress]
        DROP CONSTRAINT [FK_CustomerAddress_AddressType_AddressTypeID]

    -- Drop Primary Key
    ALTER TABLE Person.AddressType
    DROP CONSTRAINT [PK_AddressType_AddressTypeID]

    -- Drop IDENTITY column
    ALTER TABLE Person.AddressType
    DROP COLUMN [AddressTypeID]

    -- Rename column ID_EXENCION_NEW --> ID_EXENCION
    EXEC sp_rename 'Person.AddressType.new_AddressTypeID', 'AddressTypeID', 'COLUMN';

    -- Add new Primary Key
    ALTER TABLE Person.AddressType
        [AddressTypeID] ASC

    -- Enable constraints (without rechecking them)
    ALTER TABLE [Purchasing].[VendorAddress] WITH NOCHECK
         ADD CONSTRAINT [FK_VendorAddress_AddressType_AddressTypeID]
         FOREIGN KEY([AddressTypeID]) REFERENCES [Person].[AddressType] ([AddressTypeID])
    ALTER TABLE [Sales].[CustomerAddress] WITH NOCHECK
        ADD CONSTRAINT [FK_CustomerAddress_AddressType_AddressTypeID]
        FOREIGN KEY([AddressTypeID]) REFERENCES [Person].[AddressType] ([AddressTypeID])

    PRINT 'IDENTITY removed successfully'


The UPDATE of the new column must be done with dynamic SQL because we’re adding the column in the very same script and SQL Server tries to validate script before the execution and it doesn’t find the column and gives an error.


Get every new post delivered to your Inbox.