tail -f findings.out

Script to find longest running events in log files

While I have found the logging format described in this post to be useful in my scripts, once you get past a few lines, it gets hard to track what took the longest. I wanted to be able to see the top X events in duration, as recorded in my log, without any hand parsing or eyeballing. I have created a script that does just that. The latest version is here in my code repo.

An example log file to parse:

Sent to my script:

Now at a glance I can see what events took the longest to complete. This assumes pairs of logging messages. Line 1, for example, denotes an event starting a process and line 2 an event finishing that process. If the time between line 1 and line 2 is greater than that between line 2 and 3, line 1 would be given as the higher ranking event in duration. This mode wouldn’t therefore be of much use in looking at, say, web server logs (unless you needed to find periods of the least activity, perhaps to schedule a maintenance window…). But when looking at verbose output from scripts performing sequential tasks, it saves a lot of time. You can scan the longest events and select places to optimize if needed.

I tried to add a lot of commentary, as there is a level of particularity approaching obscurity in it currently, as well as provide ample examples and explanation. There are only a few options:

  • -h: Show main docstring, example uses, and options with help.
  • -f FILE: The log FILE to be parsed.
  • -n NUM: The top NUM of deltas to display. Defaults to 10.
  • -N: Show all available deltas.

With the caveat that it only currently works the logging format I have standardized on, I think it’s fairly solid, at least for a few-hour project :-) I’m sure there are a number of ways in which it could be optimized, but it’s fast enough for my needs. It works across multiple day deltas down to milliseconds. If you want to work with a different log format, you should just have to change the parse_log() function.

There were a few interesting problems to solve along the way, such as how to keep track of the line numbers associated with each line, datetime and message components, and timedeltas all the way through the processing, since there might be duplicates in several places. Another was figuring out how to pad each value per column so that things were vertically aligned, and so more readable. The pretty table printer recipes I found were too heavy for me, and I wanted to write my own.

If you have ideas for improvements or features (next on my list is working with multiple log files at once), feel free to email me or fill out my contact form.

Tags: , ,
April 5, 2009 - 12:15 AM No Comments

Simple and effective Python logging

I have been using the quite capable logging module for all of my logging in Python programs for some time. After trying a number of different options and formats, I have stabilized my setup into what I will describe below. I have found it effective in providing helpful, concise logging for a wide range of tasks.

First, up in the import section of your script, add:

1
import logging

This module has been part of the standard library since Python 2.3, so it should be found. If not, upgrade Python! Now you need to setup for your particular logging needs. I like to place the logging setup section after the imports, before the function definitions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Setup logging
logger = logging.getLogger(sys.argv[0])
logger.setLevel(logging.DEBUG)
# Use file output for production logging:
logfilename = "/var/log/myapp/my_super_app.log"
filelog = logging.FileHandler(logfilename, 'a')
filelog.setLevel(logging.INFO)
# Use console for development logging:
conlog = logging.StreamHandler()
conlog.setLevel(logging.DEBUG)
# Specify log formatting:
formatter = logging.Formatter("%(asctime)s - %(name)s - %(lineno)s - \
%(levelname)s - %(message)s"
)
conlog.setFormatter(formatter)
filelog.setFormatter(formatter)
# Add console log to logger
logger.addHandler(conlog)
logger.addHandler(filelog)

That’s all the setup you’ll likely need. From the top, this first initializes a Logger object, providing it the name of the current script as the name of the log. This makes it easy to differentiate logs from similar scripts with different names. Next we define different types of handlers that will log various types of messages. The logging module provides a wide array of log handlers, listed here under “Logging Levels”. You can log to a file (which I use for production programs), to console (for debugging), or to more exotic places if you so desire, such as directly to a socket, syslog, and more.

So after we define a file log, and what handler we want to use for it, we set the level of log messages it will record, in this case of INFO level and up. (These various levels are not entirely standard. See the section entitled “Levels” in PEP 282 for more information, in case you are not familiar with the logging module’s particular levels). For lower level messages, we add a console handler (goes to STDOUT).

Then we define the format for our log messages, and assign that format to the console and file loggers. There is quite a range of variables we have access to in the format of the logs. They are listed and described here. I have found this format to be the most informative without overloading the display:

1
2009-03-15 02:00:12,077 - ./test.py - 24 - DEBUG - I am a test message!

The scale of the timestamp is small enough to differentiate between quick changes, we immediately know what file and what line in that file is causing the message to appear, what level it is, and finally its contents. The same format is applied to the file handler, which will be written to at the same time the console log displays, providing a permanent record.

Using the logging module removes the need for test print statements, providing an informative framework that is easily shifted into a production mode. You can add as much information as you want to see in development by handling DEBUG level messages, and sending them to console, then raise the level of message shown to INFO once ready to deploy. For additional information, see this O’Reilly article.

Tags: , , ,
March 15, 2009 - 1:10 AM Comments (4)

Handy Alias: Grab latest svn log

I keep having the occasion to view the log comments for the current revision in my SVN repository. This means:

  • Knowing the number of the current revision,
  • Passing this to
    1
    svn log -r

Instead of doing that by hand, I made an alias to do it for me:

alias svnlastlog="svn info | grep 'Last Changed Rev' | cut -d ' ' -f 4 | xargs -I mystr svn log -r mystr"

So if you are in a directory that is versioned, you just run that command, and (after being prompted to authenticate to your repo, if applicable), out comes the log entry for the current revision. Perhaps there is a built in command to do this already, but I could not find it via Google and the SVN Book.

Tags: , ,
October 7, 2007 - 3:33 PM Comments (2)

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.