Parse zipped PostgreSQL's logs and save them in a parquet file

I'm administrating a large number of PostgreSQL's servers and I get their logs zipped. To analyze them I've done a Spark task for:

Unzip the files
Parse then logs of PostgreSQL
Save (append) the data into a parquet file

In a following post I will show how to query them to get usefull . . .

Posted in: jupyter notebook python spark

October 25, 2018

Percentage of time series over its SMA (Simple Moving Average) compared against a weighted index

Problem with weighted indexes

One problem with weighted indexes is that few components of the index can move its value when the value of few components is much bigger than the others. That could give misleading conclusions. For example, when small weighted components are not following the trend of the big ones. Some scenarios where . . .

Posted in: jupyter notebook python

October 09, 2018

Print Markdown in the HTML widget using Markdown package

I've uploaded a Jupyter Notebook in Github explaining two ways to print Markdown in a Jypyter Notebook:

Using the IPython Markdown() class.
Using the HTML ipywidget and the markdown package.

The first option is straightforward, but the second one is much more powerful because can be used with other widgets, . . .

Posted in: jupyter notebook python

August 17, 2018

A Brief introduction to YAML in Powershell

Missing an official YAML powershell module

I made another post about Powershell and YAML some months ago with more code where I explain the different ways to write YAML and how it behaves in powershell, comparing PSYaml and powershell-yaml modules.
After the experience of being working in a module that's compatible with both modules to read YAML files, I decided to write this . . .

July 06, 2018

Change SERIAL to IDENTITY in PostgreSQL

PostgreSQL 10 implements SQL standard's IDENTITY

In SQL Server is quite common to use IDENTITYs for non-natural primary keys. In PostgreSQL, until version 10, only SERIALs could be used for the same purpose. But that has changed.

Why INDENTITY and not SERIAL and SEQUENCES?

SERIAL is a friendly way to set a SEQUENCE, but at the end, it's a SEQUENCE: an object that . . .

Posted in: postgresql

February 27, 2018

Export data from Oracle to MongoDB in Python

Introduction

I had to export some data from an Oracle database to a MongoDB. For this reason I created a python function called export_data_from_oracle_to_mongodb that can be found in my Github.

To make the function more generic, I've there's an optional parameter called transform,where a function can be specified to . . .

Posted in: mongodb oracle python

February 09, 2018

YAML in powershell

powershell doesn't have native support for yaml. Solution: PSYAML and powershell-yaml

UPDATE on 2018-07-06

I recommend reading my second post A Brief introduction to YAML in Powershell: it's shorter and has less code. It's done after working in a module that's compatible with powershell-yaml and PSYaml modules to read YAML files in Powershell.
I only recommend to read this blog if you're new to . . .

Posted in: powershell

December 27, 2017

Josep

Parse zipped PostgreSQL's logs and save them in a parquet file

Percentage of time series over its SMA (Simple Moving Average) compared against a weighted index

Problem with weighted indexes

Print Markdown in the HTML widget using Markdown package

A Brief introduction to YAML in Powershell

Missing an official YAML powershell module

Change SERIAL to IDENTITY in PostgreSQL

PostgreSQL 10 implements SQL standard's IDENTITY

Why INDENTITY and not SERIAL and SEQUENCES?

Export data from Oracle to MongoDB in Python

Introduction

YAML in powershell

powershell doesn't have native support for yaml. Solution: PSYAML and powershell-yaml

UPDATE on 2018-07-06

Archive