Python 101: How to traverse a directory

Every so often you will find yourself needing to write code that traverse a directory. They tend to be one-off scripts or clean up scripts that run in cron in my experience. Anyway, Python provides a very useful method of walking a directory structure that is aptly called os.walk. I usually use this functionality to go through a set of folders and sub-folders where I need to remove old files or move them into an archive directory. Let's spend some time learning about how to traverse directories in Python!


Using os.walk

Using os.walk takes a bit of practice to get it right. Here's an example that just prints out all your file names in the path that you passed to it:

import os

def pywalker(path):
    for root, dirs, files in os.walk(path):
        for file_ in files:
            print( os.path.join(root, file_) )

if __name__ == '__main__':
    pywalker('/path/to/some/folder')

By joining the root and the file_ elements, you end up with the full path to the file. If you want to check the date of when the file was made, then you would use os.stat. I've used this in the past to create a cleanup script, for example.

If all you want to do is check out a listing of the folders and files in the specified path, then you're looking for os.listdir. Most of the time, I usually need drill down to the lowest sub-folder, so listdir isn't good enough and I need to use os.walk instead.


Using os.scandir() Directly

Python 3.5 recently added os.scandir(), which is a new directory iteration function. You can read about it in PEP 471. In Python 3.5, os.walk is implemented using os.scandir "which makes it 3 to 5 times faster on POSIX systems and 7 to 20 times faster on Windows systems" according to the Python 3.5 announcement.

Let's try out a simple example using os.scandir directly.

import os

folders = []
files = []

for entry in os.scandir('/'):
    if entry.is_dir():
        folders.append(entry.path)
    elif entry.is_file():
        files.append(entry.path)
        
print('Folders:')
print(folders)

Scandir returns an iterator of DirEntry objects which are lightweight and have handy methods that can tell you about the paths you're iterating over. In the example above, we check to see if the entry is a file or a directory and append the item to the appropriate list. You can also a stat object via the DirEntry's stat method, which is pretty neat!


Wrapping Up

Now you know how to traverse a directory structure in Python. If you'd like the speed improvements that os.scandir provide in a version of Python older than 3.5, you can get the scandir package on PyPI.


Related Reading

Copyright © 2024 Mouse Vs Python | Powered by Pythonlibrary