I'm administrating a large number of PostgreSQL's servers and I get their logs zipped. To analyze them I've done a Spark task for:
- Unzip the files
- Parse then logs of PostgreSQL
- Save (append) the data into a parquet file
In a following post I will show how to query them to get usefull information.
PostgreSQL's logs format
The log format specified in the PostgreSQL's config file is the following:
log_line_prefix = '%t %a %u %d %c '
- %a = application name
- %u = user name
- %d = database name
- %t = timestamp without milliseconds
- %c = session ID
The code can be found in a Jupyter Notebook in my GitHub.