Tracking down used disk space: Follow largest directories

When a server is getting low on disk space, you need to find out what is taking up that space, and fast. I previously would run this command in /:

1
sudo du -ch --max-depth=1 .

This would show all the directories in / and their size. Then I would run it on the largest directory found by that command, and so on until I knew what was up. This is slow. And there is a better way! I created a script that looks for directories in your current directory (or in a directory you pass it). If there are directories in the directory, it looks to see how large they are. It then does the same process on the largest directory found, continuing until there are no more child directories. It also accepts an option to prompt you at each step whether or not you want to continue. This makes it much faster to zero in on the culprit.

Example output:

1
2
3
4
5
6
7
sudo ./follow_largest_dirs.py -t /var
/var: 1033.21 MB (parent)
 |-> lib: 499.9 MB
 |--> defoma: 187.73 MB
 |---> gs.d: 95.1 MB
 |----> dirs: 94.94 MB
 |-----> fonts: 94.94 MB

With prompting:

1
2
3
4
5
6
sudo ./follow_largest_dirs.py -t /var -p
/var: 1034.6 MB (parent)
 |-> lib: 501.02 MB
Continue? (y/n) y
 |--> defoma: 187.73 MB
Continue? (y/n) n

Latest code can be found in my Trac. [EDIT, 2009-02-18: Added rounding of sizes and proper error checking in dir recursion based on suggestion by Dorantor. Thanks!] Current code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#!/usr/bin/env python
#-----------------------------------------------------------------------------
"""This program finds the largest directory in current or passed
directory, and does the same within that directory until it is in a
directory with no directories. Allows for prompting to continue.
"""

author = "Samuel Huckins"
date_started = "2009-01-09"
#-----------------------------------------------------------------------------
# For making system calls
import os
# For getting params
import sys
# Option parsing
from optparse import OptionParser
# Dictionary sorting
from operator import itemgetter

def get_dir_size(dir):
    """
    Get the size of the directory passed.
    """

    dir_size = 0
    for (path, dirs, files) in os.walk(dir):
        for file in files:
            filename = os.path.join(path, file)
            if os.path.isfile(filename):
                try:
                    dir_size += os.path.getsize(filename)
                except OSError, e:
                    raise e
    return round((dir_size / (1024*1024.0)), 2)

def recursively_find_largest(target, prompt=False, level=1):
    """
    Looks in target for dirs, prints largest, prompting
    if passed.
    """

    dir_sizes = {}
    for eachitem in [os.path.join(target, x) for x in os.listdir(target)]:
        if os.path.isdir(eachitem):
            dir_sizes[eachitem] = get_dir_size(eachitem)
    if len(dir_sizes) == 0:
        return
    largest_dir = sorted(dir_sizes.iteritems(), key=itemgetter(1), \
        reverse=True)[0]
    separator = "-" * level
    print " |%s> %s: %s MB" % (separator, os.path.split(largest_dir[0])[1], largest_dir[1])
    level += 1
    if prompt == True:
        cont = raw_input("Continue? (y/n) ")
        if cont == "y":
            recursively_find_largest(largest_dir[0], prompt, level)
        else:
            return
    else:
        recursively_find_largest(largest_dir[0], prompt, level)

def main():
    """
    Control main program flow.
    """

    usage = "usage: %prog [options] arg"
    parser = OptionParser(usage)
    parser.add_option("-t", "--target", dest="target_dir", help="The directory to start in")
    parser.add_option("-p", "--prompt", dest="prompt", action="store_true", \
        default=False, help="Prompt to continue in child dirs")
    (options, args) = parser.parse_args()
    current_dir = os.getcwd()
    if options.target_dir:
        target = options.target_dir
    else:
        target = current_dir
    parent_size = get_dir_size(target)
    print "%s: %s MB (parent)" % (target, parent_size)
    recursively_find_largest(target, options.prompt)
    sys.exit(0)
#-----------------------------------------------------------------------------
if __name__ == "__main__":
    main()

Post to Twitter Post to Delicious Post to Digg Post to Reddit

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

This entry was posted in CLI, Programming and tagged , , , , , . Bookmark the permalink.

6 Responses to Tracking down used disk space: Follow largest directories

  1. Pingback: Easy and informative: Call graphs in Python | tail -f findings.out

  2. Dorantor says:

    When I tried to use this script it doesn’t works w/o error handling. Small, quick’n'dirty hack solves my problem :) I leave commented simple example how errors can be handled for any who would like to write it.
    You may also note – I added rounding to returned values, so now results are more readable.

    [code]
    def get_dir_size(dir):
    """
    Get the size of the directory passed.
    """
    dir_size = 0
    for (path, dirs, files) in os.walk(dir):
    for file in files:
    filename = os.path.join(path, file)
    try:
    dir_size += os.path.getsize(filename)
    except OSError,e:
    dir_size += 0 # just do nothing. for now.
    # if errno.ENOENT == e.errno: # http://docs.python.org/library/errno.html
    # print os.strerror(e.errno) + " (" + e.filename + ")"
    return round(dir_size / (1024*1024.0), 2)
    [/code]

    BTW! You will (possibly) need this:

    [code]
    import errno
    [/code]

  3. Dorantor says:

    eeek! where is formatting?!! :(

  4. You are right, it definitely needs some error handling. Thanks for the patch! I will give this script some more attention, adding error-handling and rounding, and make it available.

    Sorry about the formatting, I am just using the Ajax Edit Comments plugin right now. I will try to get a better commenting system in place.

  5. Updated to include size rounding and proper error checking. Worked fine in / on several *nix boxes.

    Thanks Dorantor!

  6. Brad says:

    Just wanted to say thanks for sharing!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>