tail -f findings.out

Easily sum matching file line counts

This command will return the number of total lines contained in files ending in .py in the directory it is run in, as well as subdirectories:

1
2
find . -type f -iname *py | xargs -I str wc -l str | \
awk '{SUM += $1} END {print SUM}'

Just change .py to whatever file type you want to get the count.

I wanted to add this as a Bash alias, passing in the filetype as an argument. This would make the whole affair much easier. But when I tried it as an alias, or as a function, I always got 0 back. Running “type” on the alias name returned:

1
2
pylines is aliased to `find . -type f -iname *py | xargs -I str \
wc -l str | awk '{SUM += bell-style} END {print SUM}''

If anyone has any insight into this, please share.

Tags: ,
April 29, 2009 - 7:58 PM No Comments

Printing tabular data attractively in Python

Have you ever looked for a tool several times, not found what you wanted, then one fine day discover exactly what you were looking for by accident? I had just such an experience…

Not too infrequently in command line Python scripts, I want to output some textual data that is tabular in nature. I wrestled for a while with how to print out this sort of data in a way that’s readable, while not having to manually add line starting and ending characters, deal with padding and column alignment, separators, escape characters and all the other baggage that comes with ASCII shapes on the command line (at least ones that are broadly useful and don’t look like garbage).

When interacting with MySQL, Postgres, and other DB systems, their command line utilities provide just this sort of output already:

But what about cases where you simply have some data and need to print it into a table? Luke Maurits’ PrettyTable to the rescue! This module allows you to easily feed in data and print it in an ASCII table on the command line. It was actually just uploaded to PyPI today, so installation is now as easy as

1
sudo easy_install prettytable

While full documentation is available, the example Luke gives on the project homepage is sufficient to get started. Here is a shorter and more automated example:

1
2
3
4
5
6
7
8
9
10
from prettytable import PrettyTable
foo = PrettyTable()
foo.set_field_names(["Num", "Sum", "Double", "Triple"])
sum = 0
for n in range(1, 6):
    sum += n
    double = n * 2
    triple = n * 3
    foo.add_row([n, sum, double, triple])
foo.printt()

Running this returns:

It’s only version 0.1, but it has worked flawlessly and been a real lifesaver thus far.

Some additional features I’d like to help add at some point:

  • Allow formatting of various datatypes, such as adding commas to long numbers
  • Split headers across lines
Tags: , ,
April 28, 2009 - 4:55 PM Comments (3)

mysql_secure_installation: A useful first step in securing a MySQL server

This won’t be an exhaustive post on how to secure a MySQL server. I just want to mention a useful utility packaged with MySQL server: mysql_secure_installation. Simply run it from the command line once MySQL server is installed and running. It will prompt you with a series of questions, resulting in a more secure setup!

It only covers the very basics, but then you don’t need to remember those at least, and save yourself some typing to boot. Here’s what it does:

  • Set a root password. If you already have it, you’ll need to enter it for the remaining steps.
  • Remove anonymous users
  • Disable non-local root access
  • Remove the test database and access rules related to it
  • Reload privilege tables so the above changes are in effect

A quick and easy way to follow good practices.

Tags: , ,
April 18, 2009 - 9:55 AM No Comments

Script to find longest running events in log files

While I have found the logging format described in this post to be useful in my scripts, once you get past a few lines, it gets hard to track what took the longest. I wanted to be able to see the top X events in duration, as recorded in my log, without any hand parsing or eyeballing. I have created a script that does just that. The latest version is here in my code repo.

An example log file to parse:

Sent to my script:

Now at a glance I can see what events took the longest to complete. This assumes pairs of logging messages. Line 1, for example, denotes an event starting a process and line 2 an event finishing that process. If the time between line 1 and line 2 is greater than that between line 2 and 3, line 1 would be given as the higher ranking event in duration. This mode wouldn’t therefore be of much use in looking at, say, web server logs (unless you needed to find periods of the least activity, perhaps to schedule a maintenance window…). But when looking at verbose output from scripts performing sequential tasks, it saves a lot of time. You can scan the longest events and select places to optimize if needed.

I tried to add a lot of commentary, as there is a level of particularity approaching obscurity in it currently, as well as provide ample examples and explanation. There are only a few options:

  • -h: Show main docstring, example uses, and options with help.
  • -f FILE: The log FILE to be parsed.
  • -n NUM: The top NUM of deltas to display. Defaults to 10.
  • -N: Show all available deltas.

With the caveat that it only currently works the logging format I have standardized on, I think it’s fairly solid, at least for a few-hour project :-) I’m sure there are a number of ways in which it could be optimized, but it’s fast enough for my needs. It works across multiple day deltas down to milliseconds. If you want to work with a different log format, you should just have to change the parse_log() function.

There were a few interesting problems to solve along the way, such as how to keep track of the line numbers associated with each line, datetime and message components, and timedeltas all the way through the processing, since there might be duplicates in several places. Another was figuring out how to pad each value per column so that things were vertically aligned, and so more readable. The pretty table printer recipes I found were too heavy for me, and I wanted to write my own.

If you have ideas for improvements or features (next on my list is working with multiple log files at once), feel free to email me or fill out my contact form.

Tags: , ,
April 5, 2009 - 12:15 AM No Comments

PyCon 2009: Thoughts and observations

I had hoped to post entries each day at PyCon 2009, but since the exquisite Hyatt Regency O’Hare lacked the basic human right of free WiFi, and since I didn’t feel like filling T-Mobile’s coffers for access, my connectivity to the Tubes was somewhat limited. In any case, my first PyCon was a great experience!

After arriving Tuesday, I walked about downtown Chicago to see the sights. It was my first visit to the city, and there is of course way too much to see in the few hours I had. I did get to walk along a good portion of Michigan Avenue and peer up at the forest of skyscrapers at least. The John Hancock Observatory has a spectacular view of all of downtown and Lake Michigan stretches to the horizon. For the digital goodies, head over to Flickr.

On Wednesday I went to a tutorial on Kamaelia in the morning, and one on Python packaging and distribution methods in the afternoon. While interesting, I don’t think I’ll be needing to use concurrency a lot in my current work, so that wasn’t the best opener. Jeff Rush did an admirable job of covering virtualenv, distutils, setuptools, and buildout, but in my opinion it was just too much content for the time period alloted. I left feeling mostly overwhelmed. But I have the slides, and some things stuck.

Thursday was a definite treat. I started with the introductory tutorial on py.test, created by Holger Kregel. The tutorial was given by Brian Dorsey and Kregel himself. Both were refreshingly motivated and excited about the topic, which made it easy to stay engaged. I have ignored testing in my Python programming thus far, mostly because I haven’t written things of sufficient complexity for it to matter very much. Actually it’s mostly because I am just lazy. But, py.test turns out to be a perfect fit! It is very easy to use, and has quite an array of advanced features as well.

In fact, I enjoyed the topic so much, I decided to stay for the advanced py.test tutorial for the afternoon slot, and I am very glad I did! As it turns out, there is an older way of handling setup and teardown of resources in for tests in py.test (setup_* methods on the test classes). This was employed in the introduction tutorial, mostly for simplicity’s sake. But there is a newer and much more elegant way of handling such things that was covered later. Once I get to practice with it a bit more, I hope to post on how to get started using py.test.

The conference days themselves were packed with talks short and long, on an incredible variety of topics. As I digest my notes, I hope to post about different techniques, packages, and programs I found useful or interesting. I was quite surprised at the diversity of the content. I learned about detecting neutrinos at the South Pole, interacting with a wiiMote to navigate in a virtual world, simple AI techniques, web frameworks, and a lot more. I also got to see Guido van Rossum run across the main stage, steal the Django pony, and escape with great rapidity!

In sum, my first PyCon was simply splendid. I hope to go again, at least when it’s in Washington, D.C if not next year’s venue in Hotlanta.

Tags: , ,
April 4, 2009 - 11:22 PM No Comments

Twitter links powered by Tweet This v1.6.1, a WordPress plugin for Twitter.