Get file names by extension from a directory

Image for post
Image for post

Whenever you access the directories and files, you probably will need to implement some function to get the files by file extension from a particular directory. For instance, you may want to check and process all the excel files in a folder, or do a house keeping to remove all the old log files. In this article, I will be explaining to you a few ways of implementing such function.

There are actually plenty of libraries/modules you can use to achieve it, but let’s start with the most commonly used libraries/modules.

Since you will need to import the os module anyway if you need to handle the file operations, you can make use of the functions from this module.

For instance, you can list out all the files/sub-directories under the current directory, and check if file name ending with certain file extension as per below:

import os pyfiles = [] for file in os.listdir("."): if file.lower().endswith(".ipynb"): pyfiles.append(file)

You can further sort the files by last modified time from latest to the earliest.

pyfiles.sort(key=os.path.getmtime, reverse=True)

What if you want to check multiple file extensions ? Don’t worries, you can still achieve it by some minor change on the if condition:

if file.lower().endswith((".ipynb", ".xlsx")):

The os module also has another method scandir which is able to achieve the same, and also returns the file types and file attribute info.

files = [] for file in os.scandir("."): if file.name.lower().endswith((".ipynb", ".xlsx")): files.append(file.name)

If you don’t like the way to match the file names in the above code, you can use fnmatch to do this job. for example:

import fnmatch files = [] for file in os.listdir("."): if fnmatch.fnmatch(file, "*.ipynb") or fnmatch.fnmatch(file, "*.xlsx"): files.append(file)

Python has a glob module you can use the Unix style of pattern to match the files. To match the files with certain extension, you can simply do the below:

import glob files = glob.glob("*.ipynb")

And then sort by the file creation from the latest to the earliest:

files.sort(key=os.path.getctime, reverse=True)

if you want match for multiple file extensions, you can do something as below:

files = [] file_types = ("*.ipynb", "*.xlsx") for file_type in file_types: files.extend(glob.glob(file_type)) files.sort(key=os.path.getctime, reverse=True)

As I mentioned earlier, there are far more ways of doing it and it would not be possible to list of all them, so I will just stop here, and please leave your comments if you have better ideas.

Originally published at https://www.codeforests.com on June 13, 2020.

Resources and tutorials for python, data science and automation solutions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store