Jump to content

How can I use part of a file name to search and delete all file names with the same text?


Recommended Posts

My hard drive has tens of thousands of files numbered sequentially, plus some additional identifying text.  E.g. I may have a document called "1234568 Bank Statement November 2019.pdf" For historical reasons there are five or six copies of each file with slightly different names, but each contining the unique number.  For example in addition to the above file name I may have files with names like:

"1234568 Bank Statement November 2019-1.pdf"

"1234568 Bank Statement November 2019-2.pdf"

"1234568 Bank Statement November 2019-3.pdf"

"1234568 Bank Statement November 2019-4.pdf"

 

These duplicate files could be anywhere on the Mac, in random folders, somewhere on my main hard drive,  on external drives and cloud drives.

 

Often I find such files have been mis-named.  for example the file should have been named "1234568 Bank Statement Jan 2019.pdf".

 

Whenever I find one of these wrongly named files, I would like to manually drag and drop them into a folder called "files to delete", and then have Alfred examine all the filenames in the "files to delete" folder, and then find all files on my hard drive which have the same 7 digit file number, and move them into the folder "files to delete", or to simply delete them.

 

Is this possible with Alfred?  If so, could someone walk me through step-by-step?

 

For clarity, I am trying to get away from doing manual searches for the 7 digit number (the actual number is longer and more complex, making it too tedious to type for the thousands of wrongly named files that need deleting).

Link to comment
7 hours ago, trickyt57 said:

Is this possible with Alfred?

 

Not really, no. Alfred is fundamentally designed to do as little as possible until you explicitly tell it to do something. It doesn't watch folders, which appears to be what you're asking for.

 

You can ask Alfred to find files matching a serial number, but again, it's not designed to perform actions automatically without user interaction.

 

You're almost certainly going to have to write a script to perform the file system search and move the files, and you'll need some way to trigger it. You can use Hazel if you have it, create a Folder Action in Automator, or use macOS's FSEvents API to watch the folder yourself. You can use mdfind to find all files on the system whose names contain the serial number.

 

I would generally advise against straight-up deleting matching files unless you're 101% certain the serial number cannot appear in the name of any unrelated file. Or you can restrict the search in other ways (e.g. only PDF files) to ensure nothing gets deleted that shouldn't be deleted.

Link to comment

Looks like what you want is a duplicate file finder. Tools like that can search your hard drive and compare files down to the byte to make sure they’re indeed the same, and delete everything but one of each.


Gemini 2 is a GUI app for that. I have never used, but from the screenshots it does look good and appears to be somewhat popular. I’ve seen it on sale many times, usually in bundles. You may be able to find some Christmas software bundle with it, right now.


dupeGuru is another GUI app, but free and open-source. I have also never used it.


I can recommend rmlint as a free, open-source, and capable tool. I’ve always used it from the command line, but if I recall correctly they were working on a GUI version. Last time I used it was years ago, so that might be done already.

Link to comment
4 hours ago, vitor said:

Gemini 2 is a GUI app for that. I have never used, but from the screenshots it does look good and appears to be somewhat popular.

 

It's great, but as you can imagine, it takes a long time to scan the whole system.

 

It works very well for a tidy-up-the-HDD session, but isn't really suitable for the if-I-delete-this-file-then-delete-all-copies scenario that OP describes.

Link to comment
3 minutes ago, deanishe said:

It works very well for a tidy-up-the-HDD session, but isn't really suitable for the if-I-delete-this-file-then-delete-all-copies scenario that OP describes.

 

The impression I get from the top post is not that they are tied to the suggested solution, only that it’s what they thought could work. They mention that the duplicates exist “for historical reasons”, and my understanding is that no more of these will be created.

 

So instead of an imperfect method that needs to be run every time a bad file is encountered by accident, making it a frustrating experience for what could be years (“tens of thousands of files”), a one-time sweep would fix everything in a day, with far greater accuracy.

Link to comment

Vitor, and Deanishse.  Thank you both for some really great answers.  It is really good to find people who can write intelligently and helpfully, instead of the all too common "Sorry, I don't know how to do that".   May you both go to heaven.

Deanishe.  Thanks for explaining that Alfred can't do things like watching folders and performing actions.  At least I am not going to spend days looking for an answer that doesn't exist.  Also thanks for the guidance that I will need a script to do this.  I have never used scripts, and don't know what they are.  I will learn.  I have Hazel which can perform loads of useful actions, like naming files, and which is able to use scripts, so that looks like it will be a perfect combination for my solution.  So, now, I am off to learning scripting...

 

Vitor:  Thanks also for your answers.  Although the topic of my question wasn't specifically about duplicate files, but more specifically about an automatic way to delete ALL the files which have a particular reference number in the name, you nevertheless correctly assessed that I have a parallel problem, which is lots of duplicate files.  I am going to try all the deletion programs you mentioned.  I have in fact tried many in the past, and can't say that I have ever found one which meets my needs.   I am coming across two specific problems with deletion programs:

1) Completely identical files in different folders but with slightly different names are not found to be identical, either because the name is different or the creation date is different.

2) Virtually identical files (such as two PDF bank advices for withdrawals), are found to be "identical", when they are not.   Let's see.  I will report back here on the file duplicate finders you proposed.

Link to comment
1 hour ago, trickyt57 said:

Completely identical files in different folders but with slightly different names are not found to be identical

 

Yeah. This is difficult to do because you have to hash the files' contents to check whether they're really identical, and it would take many hours to hash every file on a large disk.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...