When a server is getting low on disk space, you need to find out what is taking up that space, and fast. I previously would run this command in /:
1 | sudo du -ch --max-depth=1 . |
This would show all the directories in / and their size. Then I would run it on the largest directory found by that command, and so on until I knew what was up. This is slow. And there is a better way! I created a script that looks for directories in your current directory (or in a directory you pass it). If there are directories in the directory, it looks to see how large they are. It then does the same process on the largest directory found, continuing until there are no more child directories. It also accepts an option to prompt you at each step whether or not you want to continue. This makes it much faster to zero in on the culprit.
Example output:
1 2 3 4 5 6 7 | sudo ./follow_largest_dirs.py -t /var /var: 1033.21 MB (parent) |-> lib: 499.9 MB |--> defoma: 187.73 MB |---> gs.d: 95.1 MB |----> dirs: 94.94 MB |-----> fonts: 94.94 MB |
With prompting:
1 2 3 4 5 6 | sudo ./follow_largest_dirs.py -t /var -p /var: 1034.6 MB (parent) |-> lib: 501.02 MB Continue? (y/n) y |--> defoma: 187.73 MB Continue? (y/n) n |
Latest code can be found in my Trac. [EDIT, 2009-02-18: Added rounding of sizes and proper error checking in dir recursion based on suggestion by Dorantor. Thanks!] Current code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | #!/usr/bin/env python #----------------------------------------------------------------------------- """This program finds the largest directory in current or passed directory, and does the same within that directory until it is in a directory with no directories. Allows for prompting to continue. """ author = "Samuel Huckins" date_started = "2009-01-09" #----------------------------------------------------------------------------- # For making system calls import os # For getting params import sys # Option parsing from optparse import OptionParser # Dictionary sorting from operator import itemgetter def get_dir_size(dir): """ Get the size of the directory passed. """ dir_size = 0 for (path, dirs, files) in os.walk(dir): for file in files: filename = os.path.join(path, file) if os.path.isfile(filename): try: dir_size += os.path.getsize(filename) except OSError, e: raise e return round((dir_size / (1024*1024.0)), 2) def recursively_find_largest(target, prompt=False, level=1): """ Looks in target for dirs, prints largest, prompting if passed. """ dir_sizes = {} for eachitem in [os.path.join(target, x) for x in os.listdir(target)]: if os.path.isdir(eachitem): dir_sizes[eachitem] = get_dir_size(eachitem) if len(dir_sizes) == 0: return largest_dir = sorted(dir_sizes.iteritems(), key=itemgetter(1), \ reverse=True)[0] separator = "-" * level print " |%s> %s: %s MB" % (separator, os.path.split(largest_dir[0])[1], largest_dir[1]) level += 1 if prompt == True: cont = raw_input("Continue? (y/n) ") if cont == "y": recursively_find_largest(largest_dir[0], prompt, level) else: return else: recursively_find_largest(largest_dir[0], prompt, level) def main(): """ Control main program flow. """ usage = "usage: %prog [options] arg" parser = OptionParser(usage) parser.add_option("-t", "--target", dest="target_dir", help="The directory to start in") parser.add_option("-p", "--prompt", dest="prompt", action="store_true", \ default=False, help="Prompt to continue in child dirs") (options, args) = parser.parse_args() current_dir = os.getcwd() if options.target_dir: target = options.target_dir else: target = current_dir parent_size = get_dir_size(target) print "%s: %s MB (parent)" % (target, parent_size) recursively_find_largest(target, options.prompt) sys.exit(0) #----------------------------------------------------------------------------- if __name__ == "__main__": main() |
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.









Pingback: Easy and informative: Call graphs in Python | tail -f findings.out
When I tried to use this script it doesn’t works w/o error handling. Small, quick’n'dirty hack solves my problem
I leave commented simple example how errors can be handled for any who would like to write it.
You may also note – I added rounding to returned values, so now results are more readable.
[code]
def get_dir_size(dir):
"""
Get the size of the directory passed.
"""
dir_size = 0
for (path, dirs, files) in os.walk(dir):
for file in files:
filename = os.path.join(path, file)
try:
dir_size += os.path.getsize(filename)
except OSError,e:
dir_size += 0 # just do nothing. for now.
# if errno.ENOENT == e.errno: # http://docs.python.org/library/errno.html
# print os.strerror(e.errno) + " (" + e.filename + ")"
return round(dir_size / (1024*1024.0), 2)
[/code]
BTW! You will (possibly) need this:
[code]
import errno
[/code]
eeek! where is formatting?!!
You are right, it definitely needs some error handling. Thanks for the patch! I will give this script some more attention, adding error-handling and rounding, and make it available.
Sorry about the formatting, I am just using the Ajax Edit Comments plugin right now. I will try to get a better commenting system in place.
Updated to include size rounding and proper error checking. Worked fine in / on several *nix boxes.
Thanks Dorantor!
Just wanted to say thanks for sharing!