• Active Topics 

Programmatically Manipulate Files: Renaming, Reading, Writing, Deleting, and Moving Files Between Folders

Post Reply
User avatar
Eli
Senior Expert Member
Reactions: 187
Posts: 5693
Joined: 10 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#1

Suppose you have a folder with a lot of files in different formats: .jpg, .JPG, .JPEG, .png, .PNG, .pdf, .h5, .fits, .dat, .txt, .odt, .docx, .doc, etc., and you want to only move all the files with .h5 extension to a new folder. You could do that by looking at each individual file, copy/cut and paste them into a new folder. But that process may be tedious and very slow; there is however, a possibility of leaving behind files, especially, if the folder is too cluttered.

You can more conveniently move files, for example, from /home/tssfl/Desktop/Old_Folder/ to a new folder, /home/tssfl/Desktop/New_Folder/ on the Desktop by using the following piece of code:

  1. import glob
  2. import shutil
  3. filenames = sorted(glob.glob('/home/tssfl/Desktop/Old_Folder/*.h5'))
  4. filenames = filenames[:]
  5.  
  6. for f in filenames:
  7.     shutil.move(f, "/home/tssfl/Desktop/New_Folder/")


Note that, sorted function is not needed/necessary, it only rearranges files alphabetically.


You can as well use the mv command.

The mv command moves folders/files recursively and does not have an -r flag.

Use the -i option for being prompted in case a file with the same name exists if you do not want the file to be replaced:

mv -i /home/tssfl/Desktop/Old_Folder/*.h5 /home/tssfl/Desktop/New_Folder/


If moving folders, use:

sudo mv /home/tssfl/folder1 /home/tssfl/folder2/

"/" in the end means you're moving folder1 inside folder2.

You can also move and rename files/folders, for example move file1.pdf from folder1 to folder2 and rename it file2.pdf

  1. mv /home/tssfl/Desktop/folder1/file1.pdf /home/tssfl/Desktop/folder2/file2.pdf


Do Even More Magic!

Now we can do things more magically, suppose we have nested folders: folder1, folder2, folder3, ..., foldern, defined by paths /home/tssfl/Desktop/folder1, /home/tssfl/Desktop/folder2, /home/tssfl/Desktop/folder3, ..., /home/tssfl/Desktop/foldern. Suppose further, inside each of these folders you have a file and/or a folder named: file, folder.

The structure of each of your nested folders will then look like:

  1. /home/tssfl/Desktop/folderx/
  2. file
  3. folder


where x = 1, 2, 3, ...n.

You can then rename your folders and files: file_n1.ext , folder_n1, folder_n2, file_n2.ext, folder_n3, file_n3.ext, ..., file_nn.ext, folder_nn, and move all of them to a new directory "new_directory" by using the script below:

  1. import os
  2. for i in xrange(n):
  3.     path = "/home/tssfl/Desktop/folder%s" % (i+1)
  4.     for filename in os.listdir(path):
  5.         filename_without_ext = os.path.splitext(filename)[0]
  6.         extension = os.path.splitext(filename)[1]
  7.         new_file_name = filename_without_ext+"_n%s" % (i+1)
  8.         new_file_name_with_ext = new_file_name+extension
  9.         print(new_file_name_with_ext)
  10.         os.rename(os.path.join(path,filename),os.path.join("/home/tssfl/Desktop/new_folder",new_file_name_with_ext))


The script will take care of whatever extensions (.ext) for your files.
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 187
Posts: 5693
Joined: 10 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#2

rsync is a tool for copying/moving files (including performing incremental backups) from your computer to a remote machine, from a remote machine to your computer, from a directory to another directory on the same computer, from your computer to an external hard drive or network path, and so on -- see more.

The basic format of the rsync command is,

  1. $ rsync options source destination


For example, to copy all the Desktop contents -- files and folders, to another directory (let’s say DELL_Desktop_Backup) into a Hard Drive named Seagate\ Expansion\ Drive mounted at /media/user/, we would run rsync command as follows (here, our user is tssfl):

  1. rsync -avzh /home/tssfl/Desktop/ /media/tssfl/Seagate\ Expansion\ Drive/DELL_Desktop_Backup


To copy the whole Desktop directory (with all its contents inside it) into the folder in the HDD, remove "/" after it.
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 187
Posts: 5693
Joined: 10 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#3

I was once again confronted with the task of moving, renaming and re-organizing files, and came up with a Python function below that magically accomplished the job:

  1. import os
  2.  
  3. def move_files_with_unique_names(parent_dir):
  4.     #Provide the path where you want to move files to
  5.     new_folder_path = "/home/tssfl/New_folder"
  6.     count = 0
  7.  
  8.     for root, _, files in os.walk(parent_dir):
  9.         if not files:
  10.             print(f"Skipping empty directory: {root}")
  11.             continue
  12.  
  13.         print(f"Current Directory: {root}")
  14.         for file in files:
  15.             filename_without_ext = os.path.splitext(file)[0]
  16.             extension = os.path.splitext(file)[1]
  17.            
  18.             new_file_name = filename_without_ext + "_" + str(count) + extension
  19.            
  20.             new_file_path = os.path.join(new_folder_path, new_file_name)
  21.            
  22.             while os.path.exists(new_file_path):
  23.                 count += 1
  24.                 new_file_name = filename_without_ext + "_" + str(count) + extension
  25.                 new_file_path = os.path.join(new_folder_path, new_file_name)
  26.            
  27.             print(f"Moving file number: {count}")
  28.             print(f"Moving file {file} to {new_file_name}")
  29.             os.rename(os.path.join(root, file), new_file_path)
  30.             count += 1
  31.            
  32. #Provide the path to parent directory where you want to move files from
  33. parent_dir_path = "/home/tssfl/Manipulate_files"
  34. move_files_with_unique_names(parent_dir_path)


The provided function move_files_with_unique_names, is designed to move files from a specified parent directory, in this case Manipulate_files and its subdirectories to a new folder, here namely; New_folder while renaming each file with a unique index to prevent overwriting. Here's an explanation of what the function does:
  1. Input: The function takes one input parameter, parent_dir, which represents the parent directory path containing the files to be processed.
  2. Moving Files: The function iterates through each directory and subdirectory within the parent_dir, skipping directories that do not contain any files.
  3. File Renaming: For each file found in the directories, the function renames the file by appending a unique index to the original file name. This ensures that each file moved to the new folder has a distinct name.
  4. Moving and Renaming: The function moves the files to a new folder, updating the folder structure and renaming the files accordingly.
  5. Print Statements: Throughout the process, the function prints messages to indicate the current directory being processed, the incremental number of the file being moved, and the original file name along with the new file name after renaming.
  6. Skipping Empty Directories: The function includes a check to skip directories that do not contain any files, avoiding unnecessary processing of empty directories.
  7. Reusable Functionality: The function is designed to be reusable with different parent directory paths, allowing users to easily move and rename files within a specified directory and its subdirectories.
You can specify to only move files with certain extension(s), for example, to move files with .xls or .xlsx extensions only.

I have however modified the function to check for existing files in the New_folder and start indexing from the maximum index.

  1. import os
  2.  
  3. def move_files_with_unique_names(parent_dir):
  4.     #Provide the path where you want to move files to
  5.     new_folder_path = "/home/tssfl/New_folder"
  6.     count = 0
  7.  
  8.     existing_files = os.listdir(new_folder_path)
  9.     existing_indices = [int(file.split("_")[-1].split(".")[0]) for file in existing_files if file.endswith(".xls") or file.endswith(".xlsx")]
  10.    
  11.     if existing_indices:
  12.         max_index = max(existing_indices)
  13.         count = max_index + 1
  14.  
  15.     for root, _, files in os.walk(parent_dir):
  16.         if not files:
  17.             print(f"Skipping empty directory: {root}")
  18.             continue
  19.  
  20.         print(f"Current Directory: {root}")
  21.         for file in files:
  22.             filename_without_ext, extension = os.path.splitext(file)
  23.             #Check if the file extension is either '.xls' or '.xlsx'
  24.             if extension.lower() == ".xls" or extension.lower() == ".xlsx":
  25.                 new_file_name = filename_without_ext + "_" + str(count) + extension
  26.                 new_file_path = os.path.join(new_folder_path, new_file_name)
  27.  
  28.                 while os.path.exists(new_file_path):
  29.                     count += 1
  30.                     new_file_name = filename_without_ext + "_" + str(count) + extension
  31.                     new_file_path = os.path.join(new_folder_path, new_file_name)
  32.  
  33.                 print(f"Moving file number: {count}")
  34.                 print(f"Moving file {file} to {new_file_name}")
  35.                 os.rename(os.path.join(root, file), new_file_path)
  36.                 count += 1
  37.             else:
  38.                 print(f"Skipping file {file} with extension {extension}.")
  39.  
  40. #Provide the path to parent directory where you want to move files from
  41. parent_dir_path = "/home/tssfl/Manipulate_files"
  42. move_files_with_unique_names(parent_dir_path)

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 187
Posts: 5693
Joined: 10 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#4

The provided Python function move_files_with_unique_names, is designed to move files from a specified parent directory, in this case Manipulate_files and its subdirectories to a new folder, here namely; New_folder while renaming each file with a unique index to prevent overwriting. The code however move files with certain extension(s), in this case, it move files with .xls or .xlsx extensions only. If the there are files in the New_folder and they are indexed, then its starts indexing the files being moved after the maximum index. For example, if the maximum index is 10, it starts indexing at 11.

Now, modify the code so that it first checks for existing files in the New_folder, if any, and:

1. If they are all not indexed, it should index them, and it should then start indexing the files being moved after the maximum index.

2. If there are a mix of indexed and non indexed files in the New_folder, it should first correctly index all the files. Suppose there are files with indices 3, 4, but others are not indexed, it should start indexing the unindexed files from 0, 1, 2, etc., depending on the number of unindexed files in the New_folder. After that it should then move files into New_folder but taking care to make sure that it indexes files being moved with the missing indices before the maximum index in the New_folder. After filling all indices below the maximum index, it should then continue indexing after the maximum index. Files with the same name but different extensions must have different unique indices too. The code should not overwrite any file.
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 187
Posts: 5693
Joined: 10 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#5

This code is designed to modify the file extensions from .csv to xls. It will apply this change to all .csv files located in both the parent directory and its subdirectories:

  1. import os
  2.  
  3. def rename_files(path):
  4.     for root, dirs, files in os.walk(path):
  5.         for file in files:
  6.             if file.endswith(".csv"):
  7.                 old_name = os.path.join(root, file)
  8.                 new_name = os.path.join(root, os.path.splitext(file)[0] + ".xls")
  9.                 os.rename(old_name, new_name)
  10.                 print(f"Renamed file: {file}")
  11.  
  12. #Specify the parent directory path
  13. parent_directory = "path_to_parent_directory"
  14. rename_files(parent_directory)

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 187
Posts: 5693
Joined: 10 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#6

Here is another Python code that checks a folder for .xls files and determines if they are in Excel format. If the file is not in Excel format, it deletes the file:

  1. import os
  2. import pandas as pd
  3.  
  4. folder_path = 'path_to_parent_directory'
  5.  
  6. for file in os.listdir(folder_path):
  7.     if file.endswith('.xls'):
  8.         file_path = os.path.join(folder_path, file)
  9.        
  10.         try:
  11.             pd.read_excel(file_path)
  12.         except ValueError:
  13.             print(f"File {file} is not in Excel format. Deleting...")
  14.             os.remove(file_path)
  15.             print(f"File {file} deleted.")

0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
User avatar
Eli
Senior Expert Member
Reactions: 187
Posts: 5693
Joined: 10 years ago
Location: Tanzania
Has thanked: 75 times
Been thanked: 88 times
Contact:

#7

Here is Python one-liner that renames all files in the directory by appending the label "special_case_2021_" at the beginning of each file name:

  1. import os; [os.rename(f, f"special_case_2021_{f}") for f in os.listdir()]


You can modify the code and run it to quickly label your files!
0
TSSFL -- A Creative Journey Towards Infinite Possibilities!
Post Reply

Return to “Linux and Unix Based Operating Systems”

  • Information
  • Who is online

    Users browsing this forum: No registered users and 1 guest