Hidden cartesian product

Introduction

There’s a project in my company that has a dedicated PostgreSQL virtual server with 12 cores and 32 GB RAM and it’s exclusively being used by one developer. He told me that he had an error with a query and checking the log file I could find: “ERROR: could not write block XXXX of temporary file: No space left”. But there were 27 GB free.
I asked him to run the query again to be able to see what was going on.

Analysis

The directory of PostgreSQL /data/pg_stat_tmp started growing until fill up the 27 GB! and PostgreSQL gave the out of space error again. Amazing.

The query had the typical INNER JOIN between a Products table and Orders table, with an aggregation on Orders table. I was really surprised for this behaviour. Here is the size (MB and rows) of the tables:

  • public.Orders: 314 MB and 4.242.561 rows.
  • public.Products: 189 MB and 1.108.514 rows.

The query plan was quite suspicious because I wasn’t expectiong a Nested Loop:

I changed the query doing the aggregation in a Common Table Expression and with the result I made the JOIN. It was just a fast test to see if I could find any clue about what was going on. It worked: 66 seconds. The query plan had the expected MERGE JOIN but it was doing a TABLE SCAN instead of using the indexes!

But 66 seconds was so much. So, diving into the problem I found that instead of NULL values, there were in both tables ’1′ (number one). So we had a lot of ’1′ in the joining columns! To be exact:

  • public.Orders: 318.884 rows. 7,5% of total rows.
  • public.Products: 184.324 rows.  16,6% of total rows.

So, we’ve: 318884 * 184324 = 58,777,974,416 rows!!

Yes, it’s like a cartasian product!

Solutions

Of course the obvious one in this case was to change the ones for nulls.

This is a development environtment and it’s far from critical. But in production, if users can make queries that are not controled by an application, we can set statement_timeout. It can be set globally in postgresql.conf or in connection basis.

Get-Hash, a powershell hash function

I needed a hash function that I could pass a text string or a file to compute its hash (MD5, SHA1, etc…). But unfortunately Powershell hasn’t got one, so I started to look for alternatives:

  • There’s a Get-Hash cmdlet (function embedded in a DLL written in C#) in PowerShell Community Extensions (PSCX) module.
  • I found two small tools command-line tools:
  • There’s an interesting post called Using Powershell for MD5 Checksums  in Brian Hartsock’s Blog that explains how to calculate directly in Powershell using NET Framework classes.

Any of these options exactly matched my needs:

  • PSCX is an amazing module, but I didn’t want to install it in production servers just to use one cmdlet.
  • The two small tools do they work without flawness, althought I prefer to have as less external applications as possible. I should parse the result and I prefer to have everything in Powershell.
  • Brian Hartsock’s post is a good point of start. I only misses the possibility to compute file’s hashes.

So based on Brian Hartsock’s post I implemented this function for Powershell 2.0 (because I use parameter definition and try…catch in it) that gets the hash value of a file or a text string. It uses HashAlgorithm Class  and FileStream Class. Both classes are supported since NET Framework 1.0.

The Get-Hash function

The following code can also be downloaded from here.

<#
.SYNOPSIS
Gets the hash value

.DESCRIPTION
Gets the hash value of a file or string
It uses System.Security.Cryptography.HashAlgorithm (http://msdn.microsoft.com/en-us/library/system.security.cryptography.hashalgorithm.aspx)
and FileStream Class (http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx)

Written by Josep Martínez Vilà: http://dbadailystuff.com/2013/03/11/get-hash-a-powershell-hash-function/
Based on: http://blog.brianhartsock.com/2008/12/13/using-powershell-for-md5-checksums/ and some ideas on Microsoft Online Help

.PARAMETER File
File to get the hash from.

.PARAMETER Text
Text string to get the hash from

.PARAMETER Algorithm
Type of hash algorithm to use. Default is SHA1

.EXAMPLE
C:\PS> Get-Hash "myFile.txt"
Gets the SHA1 from myFile.txt file. When there's no explicit parameter, it uses -File

.EXAMPLE
Get-Hash -File "C:\temp\myFile.txt"
Gets the SHA1 from myFile.txt file

.EXAMPLE
C:\PS> Get-Hash -Algorithm "MD5" -Text "Hello Wold!"
Gets the MD5 from a string

.EXAMPLE
C:\PS> "Hello Wold!" | Get-Hash
We can pass a string throught the pipeline
#>
function Get-Hash
{
     Param
     (
          [parameter(Mandatory=$true, ValueFromPipeline=$true, ParameterSetName="set1")]
          [String]
          $text,
          [parameter(Position=0, Mandatory=$true, ValueFromPipeline=$false, ParameterSetName="set2")]
          [String]
          $file = "",
          [parameter(Mandatory=$false, ValueFromPipeline=$false)]
          [ValidateSet("MD5", "SHA", "SHA1", "SHA-256", "SHA-384", "SHA-512")]
          [String]
          $algorithm = "SHA1"
     )
     Begin
     {
          $hashAlgorithm = [System.Security.Cryptography.HashAlgorithm]::Create($algorithm)
     }
     Process
     {
          $md5StringBuilder = New-Object System.Text.StringBuilder 50
          $ue = New-Object System.Text.UTF8Encoding

          if ($file){
               try {
                    if (!(Test-Path $file)){
                         throw "Test-Path returned false."
                    }
               }
               catch {
                    throw "Get-Hash - File not found or without permisions: [$file]. $_"
               }
               try {
                    [System.IO.FileStream]$fileStream = [System.IO.File]::Open($file, [System.IO.FileMode]::Open);
                    $hashAlgorithm.ComputeHash($fileStream) | % { [void] $md5StringBuilder.Append($_.ToString("x2")) }
               }
               catch {
                    throw "Get-Hash - Error reading or hashing the file: [$file]"
               }
               finally {
                    $fileStream.Close()
                    $fileStream.Dispose()
               }
          }
          else {
               $hashAlgorithm.ComputeHash($ue.GetBytes($text)) | % { [void] $md5StringBuilder.Append($_.ToString("x2")) }
          }

          return $md5StringBuilder.ToString()
     }
}

Some examples how to call it

Get-Hash "e:\temp\mapais.txt"
Get-Hash -File "e:\temp\hash\macallejero.txt"
"Hello Wold!" | Get-Hash
Get-Hash -Text "Hello Wold!"
Get-Hash -Algorithm "MD5" -Text "Hello Wold!"

Remove IDENTITY property from a primary key column in SQL Server

When there’s a table with a primary key that is an IDENTITY and you want to remove the IDENTITY property, there’s no simple SQL code to do it. It is not a trivial task. In fact, there’re many steps to be done and it’s better to do all of them inside a single transaction, to avoid to leave the database in a inconsistent way (without FK, or PKs).

Steps

Steps to be done:

  • Add a new temporary column.
  • Update the new column with the same values.
  • Set the new column as NOT NULL.
  • Drop Foreign Keys Constraints.
  • Drop Primary Key.
  • Drop IDENTITY column.
  • Rename the new column with the name of the old one.
  • Add new Primary Key.
  • Create again constraints (without rechecking them).

An example

In AdventureWorks database, there’s a master table called [Person].[AddressType] that has only 6 Rows.
All master tables in my databases cannot have IDENTITYs because the values can be inserted in different order depending on the server. So, we’re going to remove it.

To compare the differences between the original table definition and the result after the process explained, I have copied the tables’ schema, where we can see that two things have changed:

  • There’s no IDENTITY.
  • The order of the column in the definition has changed.
-- The original table:
CREATE TABLE [Person].[AddressType](
 [AddressTypeID] [int] IDENTITY(1,1) NOT NULL,
 [Name] [dbo].[Name] NOT NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT [DF_AddressType_rowguid]  DEFAULT (newid()),
 [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_AddressType_ModifiedDate]  DEFAULT (getdate()),
 CONSTRAINT [PK_AddressType_AddressTypeID] PRIMARY KEY CLUSTERED
 (
 [AddressTypeID] ASC
 )) ON [PRIMARY]

-- The result after removing the IDENTITY property:
CREATE TABLE [Person].[AddressType](
 [Name] [dbo].[Name] NOT NULL,
 [rowguid] [uniqueidentifier] ROWGUIDCOL  NOT NULL CONSTRAINT [DF_AddressType_rowguid]  DEFAULT (newid()),
 [ModifiedDate] [datetime] NOT NULL CONSTRAINT [DF_AddressType_ModifiedDate]  DEFAULT (getdate()),
 [AddressTypeID] [int] NOT NULL,
 CONSTRAINT [PK_AddressType_AddressTypeID] PRIMARY KEY CLUSTERED
 (
 [AddressTypeID] ASC
 )) ON [PRIMARY]

In our example, it’s important to note that the Person.AdressType has references to two tables:

  • [Purchasing].[VendorAddress].[AddressTypeID] named [FK_VendorAddress_AddressType_AddressTypeID]
  • [Sales].[CustomerAddress].[AddressTypeID] named [FK_CustomerAddress_AddressType_AddressTypeID]

The script is, as explained before, inside a transactions and the looks like this:

SET NOCOUNT ON;

BEGIN TRANSACTION
BEGIN TRY
    -- Adding a new temporary column
    ALTER TABLE Person.AddressType
        ADD [new_AddressTypeID] INT NULL;

    -- Updating the new column with the values sorted as we want
    EXEC sp_executesql N'UPDATE Person.AddressType SET [new_AddressTypeID] = AddressTypeID'

    -- Setting the new column as NOT NULL
    ALTER TABLE Person.AddressType
        ALTER COLUMN [new_AddressTypeID] INT NOT NULL;

    -- Disable Foreign Keys Constraints
    ALTER TABLE [Purchasing].[VendorAddress]
        DROP CONSTRAINT [FK_VendorAddress_AddressType_AddressTypeID]
    ALTER TABLE [Sales].[CustomerAddress]
        DROP CONSTRAINT [FK_CustomerAddress_AddressType_AddressTypeID]

    -- Drop Primary Key
    ALTER TABLE Person.AddressType
    DROP CONSTRAINT [PK_AddressType_AddressTypeID]

    -- Drop IDENTITY column
    ALTER TABLE Person.AddressType
    DROP COLUMN [AddressTypeID]

    -- Rename column ID_EXENCION_NEW --> ID_EXENCION
    EXEC sp_rename 'Person.AddressType.new_AddressTypeID', 'AddressTypeID', 'COLUMN';

    -- Add new Primary Key
    ALTER TABLE Person.AddressType
        ADD CONSTRAINT [PK_AddressType_AddressTypeID] PRIMARY KEY CLUSTERED
        (
        [AddressTypeID] ASC
        )

    -- Enable constraints (without rechecking them)
    ALTER TABLE [Purchasing].[VendorAddress] WITH NOCHECK
         ADD CONSTRAINT [FK_VendorAddress_AddressType_AddressTypeID]
         FOREIGN KEY([AddressTypeID]) REFERENCES [Person].[AddressType] ([AddressTypeID])
    ALTER TABLE [Sales].[CustomerAddress] WITH NOCHECK
        ADD CONSTRAINT [FK_CustomerAddress_AddressType_AddressTypeID]
        FOREIGN KEY([AddressTypeID]) REFERENCES [Person].[AddressType] ([AddressTypeID])

    PRINT 'IDENTITY removed successfully'
END TRY

BEGIN CATCH
IF @@TRANCOUNT > 0
    ROLLBACK TRANSACTION;
PRINT 'ERROR:' + ERROR_MESSAGE()
END CATCH;
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;

The UPDATE of the new column must be done with dynamic SQL because we’re adding the column in the very same script and SQL Server tries to validate script before the execution and it doesn’t find the column and gives an error.

Get SQL Server user permissions

I’ve found three ways of getting user permissions (grants and denies) in SQL Server:

  • SQL Server Management Studio: It’s OK and user-friendly. But it’s not an option if you’ve a large farm of servers or you want to script it and create task to run the checks.
  •  Impersonate as the user and check its permissions.
  • SELECT on system tables to see the permissions.

In this post I explain the last two options and the pros and cons they have.
For this example I use AdventureWorksDB that can be found at Codeplex.

Creating logins, users and giving permissions

To test the queries in both methods, I’ll create 2 logins and one user for each login. Then I’ll give different permissions to the users to test it.

USE AdventureWorks

-- Create login and user
CREATE LOGIN BobSmith WITH PASSWORD=N'myPa$$word', DEFAULT_DATABASE=AdventureWorks, CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
CREATE USER Bob FOR LOGIN BobSmith

-- Create login and user
CREATE LOGIN AliceSimpson WITH PASSWORD=N'myPa$$word', DEFAULT_DATABASE=AdventureWorks, CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF
CREATE USER Alice FOR LOGIN AliceSimpson
EXEC sp_addrolemember N'db_datareader', N'Alice'

-- Give GRANTs for example
GRANT SELECT ON Production.Product TO Bob
GRANT SELECT, INSERT, UPDATE ON Production.ProductDescription TO Bob
DENY DELETE ON OBJECT::Production.ProductSubcategory TO Bob

GRANT EXECUTE TO Bob    -- NOTE: This doesn't give execution rights to Table-valued functions!
DENY EXECUTE ON SCHEMA::HumanResources TO Bob
GRANT ALTER ON dbo.ufnLeadingZeros TO Bob

Impersonate as the user and check it’s permissions

The function fn_my_permissions  is very useful when you want to know the currents user’s permissions.
Can be used for server, database or object basis. When used for objects, you must specify the object you are asking for in the first parameter.
Don’t appear the explicit DENY options. Sometimes that’s interesting.

I like to give this query to other IT people to let them know what they can do and to check the rights they have.

Pros:

  • Shows effective rights. In our example, Bob has GRANT EXECUTE on all stored procedures and functions in all database, except the in ‘HumanResources’, so this query only shows the SP and functions it has rights on. And don’t show the DENY.

Cons:

  • You need IMPERSONATE right on the user you want to check rights.
  • In case you need to check all users, you should make an iteration and IMPERSONATE each user.
EXECUTE AS USER = 'Bob';

-- Server rights
SELECT * FROM fn_my_permissions(NULL, 'SERVER');

-- Database rights
SELECT * FROM fn_my_permissions(NULL, 'DATABASE');

-- Specific per object rigths
SELECT T.TABLE_TYPE AS OBJECT_TYPE, T.TABLE_SCHEMA AS [SCHEMA_NAME], T.TABLE_NAME AS [OBJECT_NAME], P.PERMISSION_NAME FROM INFORMATION_SCHEMA.TABLES T
CROSS APPLY fn_my_permissions(T.TABLE_SCHEMA + '.' + T.TABLE_NAME, 'OBJECT') P
WHERE P.subentity_name = ''
UNION
SELECT R.ROUTINE_TYPE AS OBJECT_TYPE, R.ROUTINE_SCHEMA AS [SCHEMA_NAME], R.ROUTINE_NAME AS [OBJECT_NAME], P.PERMISSION_NAME
FROM INFORMATION_SCHEMA.ROUTINES R
CROSS APPLY fn_my_permissions(R.ROUTINE_SCHEMA + '.' + R.ROUTINE_NAME, 'OBJECT') P
ORDER BY OBJECT_TYPE, [SCHEMA_NAME], [OBJECT_NAME], P.PERMISSION_NAME

REVERT;
GO

You can also test it with ALICE and you will see that she has SELECT rights to TABLES, FUNCTIONS and VIEWS. All the SELECT permissions are due to db_datareader database permission.

SELECT on system tables to see the permissions

The following SELECT it’s very helpful when you want to get all explicitly granted or denied permissions in a database.
Pros:

  • Shows GRANT permissions but also DENY permissions
  • Don’t need IMPERSONATE grant to check it.

Cons:

  • Explicit permissions due to database roles are not shown. You’ve to ‘calculate’ them
  • To check if a user has privileges on an object you have to check the GRANT/DENY on a database/schema/object, and check database roles. So it’s not straightforward…
SELECT DP.class_desc AS object_type, GR.type_desc AS grantor_user_type, GR.name AS grantor, GE.type_desc AS grantee_user_type, GE.name AS grantee,
DP.permission_name, state_desc,
CASE
WHEN S.name  IS NOT NULL    THEN S.name
ELSE                        ISNULL(OBJECT_SCHEMA_NAME(DP.major_id), 'all_database')
END AS [schema_name],
CASE
WHEN S.name  IS NOT NULL    THEN NULL
ELSE                        OBJECT_NAME(DP.major_id)
END AS [table_name]
FROM sys.database_permissions DP
INNER JOIN sys.database_principals GR ON GR.principal_id = DP.grantor_principal_id
INNER JOIN sys.database_principals GE ON GE.principal_id = DP.grantee_principal_id
LEFT JOIN sys.schemas AS S ON S.schema_id = DP.major_id
WHERE NOT (ISNULL(OBJECT_SCHEMA_NAME(DP.major_id), 'all_database') = 'sys' AND DP.class_desc = 'OBJECT_OR_COLUMN')

References

Some queries are based on the script Retrieve Database Object permission scripts and Role members scripts written by Gokhan Varol.

Source code

The source code of this example can be found here.

RAID controller got frozen

I write this post to explain a problem I found with a RAID controller. One morning I was notified that PostgreSQL in one development server was stopped. I connected to it and I found the service stopped. It was a physical server running Windows Server 2008 R2 and PostgreSQL 9.0. The server had a RAID 1. It’s important to notice that Windows Server Backup started at 22:00 and it usually takes 2 hours to complete, so during the time of the failure it was working and there was a lot of I/O.

Analyzing the logs

PostgreSQL logs

Then I checked the last pg_log:

2012-05-17 23:46:12 PANIC:  could not fsync log file 120, segment 174: Bad file descriptor

2012-05-17 23:46:12 STATEMENT:  COMMIT

This application has requested the Runtime to terminate it in an unusual way.

Please contact the application’s support team for more information.

2012-05-17 23:46:12 LOG:  server process (PID 4488) exited with exit code 3

2012-05-17 23:46:12 LOG:  terminating any other active server processes

These 3 lines are repeated several times:

2012-05-17 23:46:12 WARNING:  terminating connection because of crash of another server process

2012-05-17 23:46:12 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

2012-05-17 23:46:12 HINT:  In a moment you should be able to reconnect to the database and repeat your command.

2012-05-17 23:46:12 WARNING:  terminating connection because of crash of another server process

2012-05-17 23:46:12 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

2012-05-17 23:46:12 HINT:  In a moment you should be able to reconnect to the database and repeat your command.

2012-05-17 23:46:12 LOG:  all server processes terminated; reinitializing

2012-05-17 23:46:22 FATAL:  pre-existing shared memory block is still in use

2012-05-17 23:46:22 HINT:  Check if there are any old server processes still running, and terminate them.

I had no idea why this crash happened. I started the server successfully. It was in a consistent state. Log after the start:

2012-05-18 09:31:43 LOG:  database system was interrupted; last known up at 2012-05-17 23:45:43 CEST

2012-05-18 09:31:43  LOG:  database system was not properly shut down; automatic recovery in progress

2012-05-18 09:31:43 LOG:  consistent recovery state reached at 78/AE000078

2012-05-18 09:31:43  LOG:  redo starts at 78/AD007298

2012-05-18 09:31:43  LOG:  record with zero length at 78/AE003C10

2012-05-18 09:31:43  LOG:  redo done at 78/AE003BD0

2012-05-18 09:31:43  LOG:  last completed transaction was at log time 2012-05-17 23:46:11.808+02

2012-05-18 09:31:44  LOG:  database system is ready to accept connections

2012-05-18 09:31:44 FATAL:  the database system is starting up

2012-05-18 09:31:44 LOG:  autovacuum launcher started

Event Viewer logs

Then I went directly to Event Viewer / Windows Logs / System and found several errors. The first was:

17/05/2012 23:46:09 LSI_SAS (eventID 11):  The driver detected a controller error on \Device\RaidPort0.

It was repeated 22 times in the elapse of 2 seconds (23:46:09 – 23:46:10).

Then the backup aborted due to IO failure. The error was shown for every drive unit:

17/05/2012 23:46:11 volsnap (eventID 14): The shadow copies of volume E: were aborted because of an IO failure on volume E:.

And then from 23:46:12 to 23:47:42 there were 33 errors like this:

17/05/2012 23:46:12 ntfs (eventID 57): The system failed to flush data to the transaction log. Corruption may occur.

And between these errors, it was one that I was expecting for:

17/05/2012 23:46:23 Service Control Manager (eventide: 7036): The postgresql-x64-9.0 – PostgreSQL Server 9.0 service entered the stopped state.

Raid software

Finally I opened the RAID software administration and one of the two disks of the RAID was KO.

What happened?

Just putting together PostgreSQL error and Event Viewer error, that happened at the very same time:

PostgreSQL error:

2012-05-17 23:46:12 CEST [unknown] 4fb350d7.1188 PANIC:  could not fsync log file 120, segment 174: Bad file descriptor

Event Viewer error:

17/05/2012 23:46:12 ntfs (eventID 57): The system failed to flush data to the transaction log. Corruption may occur.

This gives us the explanation: the system couldn’t flush PostgreSQL data to the log file and PostgreSQL stopped immediate. It was the only way to maintain the integrity of the database. I suppose that Windows Backup I/O was also helping to cause these problems. During 1 minute and 30 seconds the RAID controller was frozen and couldn’t flush to disk. After it, everything was running fine because the RAID 1 had one disk to work with. PostgreSQL and Windows Backup were affected by this problem, all the other software continued working without any problem. But we’ve to keep in mind that databases are very sensitive to this.

I goggled to find some information about the RAID controller and in one post where someone was complaining it.

I could have a wrong idea of what a RAID is. So I checked Wikipedia RAID:

RAID (redundant array of independent disks, originally redundant array of inexpensive disks[1][2]) is a storage technology that combines multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called “RAID levels”, depending on what level of redundancy and performance (via parallel communication) is required.

[…]

In RAID 1 (mirroring without parity or striping), data is written identically to two drives, thereby producing a “mirrored set”; at least two drives are required to constitute such an array. While more constituent drives may be employed, many implementations deal with a maximum of only two; of course, it might be possible to use such a limited level 1 RAID itself as a constituent of a level 1 RAID, effectively masking the limitation.[citation needed] The array continues to operate as long as at least one drive is functioning. With appropriate operating system support, there can be increased read performance, and only a minimal write performance reduction; implementing RAID 1 with a separate controller for each drive in order to perform simultaneous reads (and writes) is sometimes called multiplexing (or duplexing when there are only two drives).

Conclusion

Reading twice the Wikipedia quote “The array continues to operate as long as at least one drive is functioning” I think that it could be a little ambiguous after what happened to me. I thought that a RAID idea was that the system could work seamless when a disk fails. But it looks like that some implementations of RAID are not exactly this or at least some have an interruption before the array continues to operate

As a DBA I’m in charge of databases, but I don’t control the hardware where they’re running on. And things can always go worst: there are full of stories where RAIDs weren’t monitored and disks failures weren’t detected until all information was lost. So, that’s a reason why a good backup policy and a good recovery plan are the most important things.

Install Pyrseas in Windows 64bits

Pyrseas is a Database Version Control for PostgreSQL with a very interesting approach and with an easy way to script a database (database to a YAML file) and to apply changes to a database (YAML file to a database). It’s develped in Python by Joe Abbate (thank you Joe for it!)

Currently I’m using dbtoyaml to script the database in every release and I commit it together with application code, but in the future I plan to use it together with yamltodb for continuous integration.

Brief introduction

Joe describes it as follows:

Pyrseas provides a framework and utilities to upgrade and maintain a PostgreSQL database.

Features:

  • Outputs a YAML/JSON description of a PostgreSQL database’s tables and other objects (metadata), suitable for storing in a version control repository.
  • Generates SQL statements to modify a database so that it will match an input YAML/JSON specification.
  • (planned) Generates a flexible web application to update PostgreSQL  tables.

Requirements:

  • PostgreSQL 8.4 or higher
  • Python 2.6 or higher
  • (planned) Werkzeug
  • (planned) Jinja2

License:

  • Pyrseas is free (libre) software and is distributed under the BSD license.

You can download it from GitHub and the most current documentation is available here.

Installing required packages

I suppose that in Linux its installation is quite straight forward, but in Windows Server 2008 R2 you need 64bits packages and not all required packages have their official 64bits version. So, I had to search for some alternative to install Pyrseas pre-requisites.

I installed these packages in the following order:

  • Python 2.7.3 Windows X86-64 Installer (Windows AMD64 / Intel 64 / X86-64 binary — does not include source). I suggest to add python in PATH environment variable.
  • From an unofficial website, where they advice you with “The files are unofficial(meaning: informal, unrecognized, personal, unsupported) and made available for testing and evaluation purposes.” You can download and install:
    • distribute-0.6.26.win-amd64-py2.7.exe (it’s a replacement for setuptools).
    • PyYAML-3.10.win-amd64-py2.7.exe.
  • psycopg2-2.4.5.win-amd64-py2.7-pg9.1.3-release (official website).

Installing Pyrseas

After downloading it from GitHub, unzip the file and into the directory execute:

python setup.py install

Example:

E:\Temp\pyrseas-0.5.0>python setup.py install
running install
running bdist_egg
running egg_info
[…]
Using e:\python27\lib\site-packages
Searching for psycopg2==2.4.5
Best match: psycopg2 2.4.5
Adding psycopg2 2.4.5 to easy-install.pth file
Using e:\python27\lib\site-packages
Searching for distribute==0.6.26
Best match: distribute 0.6.26
Adding distribute 0.6.26 to easy-install.pth file
Installing easy_install-script.py script to E:\Python27\Scripts
Installing easy_install.exe script to E:\Python27\Scripts
Installing easy_install-2.7-script.py script to E:\Python27\Scripts
Installing easy_install-2.7.exe script to E:\Python27\Scripts
Using e:\python27\lib\site-packages
Finished processing dependencies for Pyrseas==0.5.0

Using it

Once installed, you can execute it. For example:

cd C:\Python27\Lib\site-packages\Pyrseas-0.5.0-py2.7.egg\pyrseas

python dbtoyaml.py –host=”127.0.0.1″ -U myuser mydatabase > E:\temp\mydatabase.yaml

If everything goes fine, you should have the file E:\temp\dbagn.yaml that describes your database. Now you can test easily yamltodb by deleting or adding a column in a table in YAML file and run it:

python yamltodb.py –host=”127.0.0.1″ –user=postgres –output=”deltascript.sql” mydatabase mydatabase.yaml

Then, you’ll apply the changes made in the YAML file to the database!

Conclusion

It’s interesting, isn’t it? I like to script out the database schema and nowadays with ORM products it’s even more important: you never now what’s happening behind the scenes…

Explicit date format conversions

Implicit date format conversions, when no format is specified, are always a source of troubles when:

  • Passing dates as parameters in scripts to databases, other scripts, applications, web services or other targets, because each system is waiting the dates in specific format.
  • Running a SQL script in databases with different language settings.

 Using implicit date conversions make the conversion relying on the culture-specific server’s configuration or/and every software configuration. The problem usually appears when:

  • Installing the script in another server that has a different configuration.
  • For any reason the configuration is changed and suddenly the script fails. Or even worst, a script that’s not very used fails some weeks or months after the change and then it’s more difficult to find the error.

Implicit date format conversions: It fails easily

In these examples it’s easy to identify when and why they fail, but you can get crazy when you’ve one powershell script in one server that works with a SQL Server database in another server and a PostgreSQL in a 3rd server running in Linux. And things can be worst when your are not administrating the servers.

Example in powershell with different culture server configuration

# Windows configured with: English (United States)
Get-Date -Format G
# Result: 6/26/2012 3:46:55 PM

# Windows configured with: Spanish (Spain)
Get-Date -Format G
# Result:  26/06/2012 15:46:55

Example in SQL Server with different language settings

In this example I use CAST to convert the string to date using implicit format conversion:

-- Works
SET Language Spanish
DECLARE @d DATETIME
SET @d = CAST('30/05/2012' AS DATETIME)
SELECT CONVERT(VARCHAR, @d, 120)
-- Result:  2012-05-30 00:00:00

-- Fails
SET Language us_English
DECLARE @d DATETIME
SET @d = CAST('30/05/2012' AS DATETIME)
SELECT CONVERT(VARCHAR, @d, 120)
--Error:
--   Mens. 242, Nivel 16, Estado 3, Línea 3
--   The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.

Or what’s even worst, a silent malfunction when the number of the day is <= 12 it won’t fail and will interchange the day for the month. In the following example, instead of May 12th the DATETIME value will be December 5th. Detecting and solving it is not trivial when you’ve many registers inserted or updated erroneously.

SET Language us_English
DECLARE @d DATETIME
SET @d = CAST('12/05/2012' AS DATETIME)
SELECT CONVERT(VARCHAR, @d, 120)
-- Result:  2012-12-05 00:00:00

Explicit date format conversions: Always works

 To avoid these errors, the best thing is to make conversions explicit, using the format you feel more comfortable. In my case, I like to use yyyy-MM-dd HH:mm:ss when I’m working with date + time, while when I only need the date I use yyyy-MM-dd or dd/MM/yyyy. Now I’ll show some examples how to do it in Powershell, SQL Server and PostgreSQL.

Powershell

With Powershell we use System.Date struct, available in .NET framework. To convert from date to string I use ToString function and from string to date I use  ParseExact. Powershell to output a date to a computer screen or a file has to convert it to string and uses the computer’s culture configuration.

# Date to String:
(get-date).ToString("yyyy-MM-dd HH:mm:ss")
# Result: 2012-06-27 09:55:51
# String to date:

[System.Datetime]::ParseExact('2012-05-30 09:07:39',"yyyy-MM-dd HH:mm:ss", [System.Globalization.CultureInfo]::CreateSpecificCulture("es-ES"))
#Result: miércoles, 30 de mayo de 2012 9:07:39

[System.Datetime]::ParseExact('30/05/2012',"dd/MM/yyyy",[System.Globalization.CultureInfo]::CreateSpecificCulture("es-ES"))
# Result: miércoles, 30 de mayo de 2012 0:00:00

SQL Server

As we’ve seen before, CAST is used for implicit format conversion, but to use the conversion we want, we must use CONVERT, a non SQL standard command. The numbers used in the 3rd parameter of CONVERT can be found in SQL Server help.

-- Date to String:
DECLARE @d DATETIME
SET @d = CONVERT(DATETIME, '2012-05-30 09:07:39', 120)
SELECT CONVERT(VARCHAR, @d, 120) --> 2012-05-30 09:07:39
SELECT CONVERT(VARCHAR, @d, 103) --> 30/05/2012

-- String to date:
SELECT CONVERT(DATETIME, '30/05/2012 09:07:39', 120)

PostgreSQL

The official PostgreSQL documentation it’s really good and has several examples.

-- Date to String:
SELECT to_char(current_timestamp, 'YYYY-MM-DD HH24:MI:SS')
-- String to date:
SELECT to_timestamp('2012-05-30 09:07:39', 'YYYY-MM-DD HH24:MI:SS')

Conclusion

Using explicit date format conversions will help you to write safer scripts, less prone to bugs and there’s no extra time needed to do it :-)

It’s funny but I got the idea of writing this post two weeks ago when a workmate had problems with a Powershell script because the servers had different configurations. But today, when had this post already prepared to be published, another workmate has sent me a SQL Server script to be run in a server. It failed because the SQL Server in his computer was using Spanish language while the server was in English.

Follow

Get every new post delivered to your Inbox.