Table of Contents
Introduction
In this blog post, we look at the useful use of Python for real-time folder monitoring. Python gives you strong capabilities to do tasks quickly, whether you need to process new files automatically, monitor changes, or react to certain occurrences. We will use watchdog library, which is a robust and user-friendly library for keeping track of file system events.
Monitoring a folder for changes can be a powerful tool in various applications, especially when dealing with file systems, data processing, or real-time updates. These are a few typical use cases:
Data Ingestion: Automatically process files (e.g., Excel, JSON) as they arrive in a designated folder, such as importing data into a database.
Image processing: This includes tasks including resizing, compressing, and format conversion, in addition to monitoring the addition of new photos.
Document Management: When documents are uploaded, automatically tag, arrange, or convert them (such as from Word to PDF).
Version Control: Automatically create versions of files when they are updated, useful for tracking changes over time and restoring previous versions if needed.
Content Distribution: As soon as content becomes available, distribute it to a variety of platforms or channels.
Job scheduling: When new files arrive, start certain processes or tasks (such as coding compilation, app deployment, or report generation).
Alert systems: Send out messages or alerts in response to the creation, modification, or deletion of specific file types.
Real-Time Backups: Automatically back up new or modified files to a remote server or cloud storage.
These are a few use cases; there are many more. These use cases demonstrate how folder monitoring may enhance automation, security, and efficiency in a range of scenarios.
Environment Setup for File Monitor using Python:
- Download and install Python as per your system’s operating system if it is not available on your system. Here is the link: https://www.python.org/downloads/ Please follow the steps which are coming during installation of the python application.
- There are multiple editor and IDE supported for python code development like PyCharm, Visual Studio Code, and Eclipse. Please use any one for your development.
- Install watchdog library from command line using below command.
# pip install watchdog
To get more information on watchdog library, please visit Python official website https://pypi.org/project/watchdog/
The environment is set to start the development of a script. When the file arrives, we use the Python callback function to perform any task with the file as per the use case.
What is a callback function?
A callback function is a function that is called after an event or after a particular task is finished, and it is supplied as an argument to another function. Callbacks allow for asynchronous execution by allowing them to be triggered following a specific event or after the main function has completed running without interfering with the execution of other code.
Important Callback Function Features:
- Passed as an Argument: A callback function is frequently supplied to another function as a parameter.
- Executed Later: It is not called immediately upon passing, but rather once a specific circumstance or condition is satisfied.
- Asynchronous Operations: These are frequently used in asynchronous programming to manage operations such as file reading/writing, network requests, and timeouts without interfering with the main program flow.
The Advantages of Callback Functions:
- Modularity: Permits the separation of the main function’s logic from its post-completion actions.
- Asynchronous Programming: This enables the handling of asynchronous operations without blocking the main thread.
- Reusability: Callback functions can be utilized again with different functions or in various contexts.
In general, callback functions are a basic idea in a lot of programming languages, which makes applications more adaptable, effective, and responsive.
Sample Code:
This is sample script which monitors a folder for a new file and triggers some action on file receipt, e.g. file modification (adding some column to an excel file)
Every time the user receives an email with an attachment, a new file gets uploaded there. To do this, a power automation flow is used. Please go to https://dasfascination.com/handling-email-attachments-with-power-automate/ for more information.
import time
import shutil
import os
import openpyxl
from openpyxl.styles import Font
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
source_folder = ".\\Input" # Update this path
dest_folder = ".\\Output" # Update this path
temp_folder = ".\\Hash_Tool\\temp"
class NewFileHandler(FileSystemEventHandler):
def __init__(self, callback):
self.callback = callback
def on_created(self, event):
# Check if the event is for a file
if not event.is_directory:
file_path = event.src_path
print(f'New file detected: {file_path}')
file_name = self.process_new_file(file_path)
if file_name:
self.callback(file_name)
def process_new_file(self, file_path):
try:
time.sleep(3)
# Get the file name from the file path
file_name = os.path.basename(file_path)
temp_input_file = temp_folder + '\\' + file_name
if os.path.exists(temp_input_file):
print("Removing Temp input file")
os.remove(temp_input_file)
# Copy the new file to the destination folder
shutil.move(file_path, temp_folder)
print(f'File {file_name} has been moved to the {file_path}.')
return file_name
except Exception as e:
print(f'Failed to move {file_path}: {e}')
return None
def monitor_folder(callback):
event_handler = NewFileHandler(callback)
observer = Observer()
observer.schedule(event_handler, path=source_folder, recursive=False)
observer.start()
print(f'Started monitoring {source_folder}')
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
print('Stopped monitoring')
observer.join()
def hash_create(file_name, output_file):
#Process the files and create a new file to send as an attachment in a mail
def new_file_callback(file_name):
temp_input_file = temp_folder + '\\' + file_name
temp_Output_file = temp_folder + '\\hash.xlsx'
dest_output_file = dest_folder + '\\hash.xlsx'
if os.path.exists(dest_output_file): os.remove(dest_output_file)
# Handle the new file in the main function
print(f'Callback received for new file: {file_name}')
#function to process the file.
hash_create(temp_input_file, temp_Output_file)
if __name__ == "__main__":
if os.path.isdir(temp_folder):
print("Temp folder Already exist")
else:
os.makedirs(temp_folder)
print("Created Temp folder")
if not os.path.isdir(dest_folder):
os.makedirs(dest_folder)
monitor_folder(new_file_callback)
Key Points to explain:
- FileSystemEventHandler: This watchdog library class allows you to specify what to do when an event (such as the creation of a file) happens.
- Observer: Monitors any file additions in the monitored folder.
- Callback Function: In this case, the callback function is activated each time a new file is added in the monitored directory.