Jump to content

PDF Metadata Editor?


Recommended Posts

10 hours ago, dfay said:

This is doable with script hooks in BibDesk.

 

https://github.com/derickfay/BibDesk-MavericksTags/issues/1

 

This could be a game breaker! Have you done any testing with it? Frankly, for what I'd like to do, any citation management software or (file organization system, like DEVONthink) would work for me if it would embed its tags/collections/groups/whatever on the file as a tag for the native OS to view. Then, tools like Houdah Spot could be a real tour de force (Side Note - I love the metadata window that's displayed in Houdah Spot, I wonder why they don't allow editing in this window?). Can BibDesk write its citation-related records to the PDF's metadata, too (e.g., author, title, etc. - that it will follow the PDF)?

 

Relatedly, have you ever tinkered around with other citation managers? Frankly, since I'm not so reliant on the citation management side of things, any file organization application would work. Your approach with Bibdesk reminds me somewhat of what is found in a lot of image-based organization systems, like Pixave (collections, tags, etc.). It'd be great if there were an all-purpose version of it ... or at least one that catered to PDFs (i.e, that embedded its metadata for Finder/Alfred to use)!

 

I tried goofing around with Zotero a little - since it works, somewhat, with the citation style that I usually have to use, and there are a lot of great integrations for it, including Alfred - but I can't find any tools for embedding its collections as tags for the PDFs or its other metadata (e.g., authors, title, etc.). Maybe there are some plugins/add-ons that do this?

 

I don't know why citation managers don't make this easier. Shouldn't they all (1) embed their internal organization scheme to the PDF as a tag or a series of tags, and (2) write the citation information for the source to the PDF? This would make everything portable, and easy to use on a Mac ... but maybe I'm dreaming?

Link to comment
11 hours ago, dfay said:

Save as PDF for research material, Pinboard (with the paid account which archives your bookmarked sites to PDF) for most everything else.

 

Do you add these web-clippings to BibDesk?

 

I use Evernote, at the moment, for web-clippings of news and other secondary sources (as well as for other personal things, etc.). And then I just highlight their records in the app as I go along. I usually don't save these as PDFs because they usually look like garbage, and I like to maintain their images, etc. Then, if I actually wind up using the source in an article, I'll add it to Perma for future researchers.

 

Do you save them as PDFs so that it's easier to extract your annotations, etc.? Do you use a tool when saving them as PDFs to avoid all the usual website clutter? Thanks!

Link to comment
1 hour ago, Jasondm007 said:

 

Do you add these web-clippings to BibDesk?

 

Generally not, only if I end up citing them.

 

Quote

I use Evernote, at the moment, for web-clippings of news and other secondary sources (as well as for other personal things, etc.). And then I just highlight their records in the app as I go along. I usually don't save these as PDFs because they usually look like garbage, and I like to maintain their images, etc. Then, if I actually wind up using the source in an article, I'll add it to Perma for future researchers.

 

Perma looks great.  Very US-focused at the moment.  

 

Quote

Do you save them as PDFs so that it's easier to extract your annotations, etc.? Do you use a tool when saving them as PDFs to avoid all the usual website clutter? Thanks!

 

Mostly it’s just because it’s the path of least resistance for me.  I can use existing annotation tools etc.  I have a Shortcut set up in iOS that lets me select pages like the print dialog on the Mac.

 

Nearly everything gets named YYYY-MM-DD source - title .  For most of my purposes this is adequate.

Link to comment

@vitor Sorry to bug you, but I was wondering if you had a chance to take a look at the EXIF tool for editing metadata in PDFs? Thanks again for all your help!

 

On 10/22/2018 at 2:10 PM, Jasondm007 said:

@vitor In my opinion, the easiest way to create a metadata tool for PDFs would be to create a file action for each important field, which uses the EXIFTool to write the information. So, the user could simply use a different file action for each field. Once the file action is selected it would trigger a keyword input where the user could type in the value of the field that they want it to be. Relatedly, it be great if the subtext of the keyword told the user its current value. However, I assume that’s not possible in Alfred.

 

A better approach, however, would be one where (1) a file action triggers (2) a script filter which provides the current values for the following metadata fields: (a) Authors, (b) Title, (c) Subject, and (d) keywords. From there, the user could view the fields, and, if necessary, select one that they might want to rewrite. So, if the user selected the Authors field, for example, it would trigger a keyword input where they would type in the authors’ names (separated by commas) and the EXIFTool could be used to update them accordingly. This tool would be great for people with lots of PDFs, regardless of whether they use tools like DEVONthink, etc. Thanks!

 

Link to comment

@Jasondm007 ExifTool might, after all, not be the best tool for the job. It says in its man page:

 

 Changes to PDF files by ExifTool are reversible (by deleting the update with "-PDF-update:all=") because the original information is never actually deleted from the file. So ExifTool alone may not be used to securely edit metadata in PDF files.

 

Is that acceptable?

Link to comment
40 minutes ago, vitor said:

@Jasondm007 ExifTool might, after all, not be the best tool for the job. It says in its man page:

 

 

 

 

Is that acceptable?

 

@vitor Thanks for getting back to me. Unfortunately, I'm not sure that I understand what this means. Is the new metadata only visible on the machine that makes the update? For example, if I viewed the PDF on a different computer, is the old content going to show up (do the new machines need ExifTool installed, etc.)? 

 

I guess that my preference would be to delete the information, but if that's not possible, and other computers can view the new metadata without special tools, then ExifTool may work just fine. If not, then maybe the PDFTK utility or the XATTR command work better? Thanks again!

Link to comment
54 minutes ago, vitor said:

@Jasondm007 Can you provide a typical PDF of yours?

 

@vitor There's nothing special about the Pdfs that I use - same fields. However, I've updated a random classic chapter of something for you here: https://cl.ly/bc6972285a77

 

I've attached a screenshot of the usual fields from Adobe Acrobat below. However, these are the same fields that you'll see if you "Get Info" through Finder. 

 

Relatedly, I took @dfay's advice and purchased HoudahSpot, which is great for viewing and search through this metadata, too. When you add the columns/fields to your search results, you can even sort by them. It's great!

 

738865633_Adobe-Properties.jpg.5b7da2d5b5b91f03d3fef35753580514.jpg

 

 

 

Link to comment
8 minutes ago, Jasondm007 said:

There's nothing special about the Pdfs that I use

 

Yeah, but I need something to test on. That one doesn’t seem to have any Subject of Keywords. Please add them (more than one Keyword), so I can see their typical output.

Link to comment

@vitor This workflow is awesome! After tinkering around with it a little, I was wondering if I could bounce a few questions off you about the workflow?

 

 

Question 1 - After running a few tests on the fields, I noticed some discrepancies between what the workflow is reporting for a few fields and what Adobe Acrobat is reporting (i.e., the gold standard for readers - though, to be honest, I only ever use it to edit things).

 

Using the same PDF from last time, I updated each of the four fields in your workflow. The new information can be viewed by the workflow and Finder (Side Note: I made a small modification so that the field's name was in the subtitle and the value was in the title - just to make it easier to read in the workflow). The following images show the output from the workflow and Finder.

 

Alfred Workflow

2039959797_AlfredWorkflowOutput.thumb.jpg.5a5fcf0a27af5823eae3f5c617414af0.jpg

 

Finder - Get Info

995432468_GetIntoOutput.jpg.bcf6dc5f5972d5909f52b3d0e1dca756.jpg

 

However, if you view the same fields in Adobe Acrobat, only the "Title" field appears to be updating correctly. 

 

Adobe Acrobat - Properties w. Comments

2064526011_AdobeAcrobat-Properties.jpg.98758bbf9671a764b27702a52a5c97e5.jpg

 

From what I can tell in Adobe Acrobat, it looks like the Author and Subject fields were untouched. Interestingly, the the Keywords field appears to have absorbed the subject and keywords from the workflow (separated by a semicolon).

 

For testing purposes, I've uploaded the PDF displayed above here: https://cl.ly/1618a40a381d

 

 

Question 2 - I'd love to add icons to each field in the output. I have some problems with my eyes, so I tend to go heavy on icons. Out out curiosity, which fields in workflow should I update?

 

During the first step, when the workflow is gathering the PDF's metadata, would I update the pdfedit.rb file to add icons for each of the four fields? For example, let's say that I placed four icons in the root folder for the workflow, and each is named after the fields as follows: Title.png, Author.png, Subject.png, and Keywords.png. Would updating the pdfedit.rb file as follows cause the icons to be displayed?

    script_filter_items.push(
      title: value,
      subtitle: field,
      arg: field
      icon: field.png

 

And, during the second step, when the script writes the new value for the selected field, I have no idea how to reference the different icons for the field being overwritten (i.e., beyond just placing a generic icon in the keyword - which would look the same, regardless of the field that's being updated).

 

 

Question 3 - I guess I never realized this before using your workflow, but is there a way to run a File Action in Alfred without having the file copied to the clipboard? 

 

The reason I ask is because I usually copy the metadata from another source before updating it. As a result, when I run your file action, that text automatically becomes second on the clipboard (i.e., because file that's being actioned is now first). This makes it tough to access the text while running the file action (i.e., the user can't kickoff snippet viewer without aborting the workflow - which means the user has to manually write the field in).

 

Is it possible to turn this off for your file action (or all file actions, for that matter, as I can't imagine why would need to access this from the clipboard's history)?

 

 

Thanks a ton! This workflow is awesome!!!

Link to comment
54 minutes ago, Jasondm007 said:

Question 1

 

As you noted, there’s some discrepancy depending on where the PDF is being viewed from. That means Acrobat and the other sources are reading different information. I did notice when testing that one of the fields (Keywords?) was repeated on another (Description). You might want to play with that (third line of the Script Filter).


But as I don’t have Acrobat (I’m a designer but shunned Adobe programs from my computer) I can’t test that properly. It might be a question for either the Adobe or exiftool forums.

 

57 minutes ago, Jasondm007 said:

Question 2

 

Don’t forget the comma after arg: field. As for the icon line, make it icon: field + '.png' and name each image after each field (Author.png, Title.png, …).

 

1 hour ago, Jasondm007 said:

Question 3

 

Yes. That’s a feature I also asked for before it existed. It’s global, though (so you either change it for every Workflow or none). Alfred Preferences → Advanced → Selection Hotkeys (change to Restore previous clipboard item (uses more memory)).

 

1 hour ago, Jasondm007 said:

Thanks a ton! This workflow is awesome!!!

 

Good to know it’s as desired.

Link to comment

Thanks for all of the helpful feedback, @vitor! I really appreciate it.

 

 

Question 1

 

On 11/2/2018 at 5:37 PM, vitor said:

That means Acrobat and the other sources are reading different information. I did notice when testing that one of the fields (Keywords?) was repeated on another (Description). You might want to play with that (third line of the Script Filter).


But as I don’t have Acrobat (I’m a designer but shunned Adobe programs from my computer) I can’t test that properly. It might be a question for either the Adobe or exiftool forums.

 

No problem. I'll keep goofing around with the EXIFtool to see if I can get the results to stick to the correct fields. I wonder if this problem is related to the one you mentioned at the outset, about how EXIFtool doesn't actually change the file?

 

PS - I can certainly understand your frustration with Adobe products!

 

 

Question 2

 

On 11/2/2018 at 5:37 PM, vitor said:

Don’t forget the comma after arg: field. As for the icon line, make it icon: field + '.png' and name each image after each field (Author.png, Title.png, …).

 

I've tried several variations of this, but I haven't had any luck yet. I was under the impression that Alfred would know to look in the workflow's folder for the icons (presently named Author.png, Keywords.png, Subject.png, and Title.png). Is there some additional code that has to be placed in the pdfedit.rb script? Based on your previous recommendation, I updated that element as follows:

 

def show_fields(file, *fields)
  require 'json'
  script_filter_items = []

  fields.each do |field|
    value = get_field(file, field)

    script_filter_items.push(
      title: value,
      subtitle: field,
      arg: field,
      icon: field + '.png'
    )
  end

 

Unfortunately, it's still not adding any of the icons for the fields. Do I need to add something else to the pdfedit.rb script?

 

If not, does the script filter object in Alfred have to be updated/changed, too? I haven't touched it, so it still reads as follows:

 

require_relative File.join(Dir.pwd, 'pdfedit.rb')

show_fields(ENV['pdf_file'], 'Title', 'Author', 'Subject', 'Keywords')

 

 

Question 3

 

On 11/2/2018 at 5:37 PM, vitor said:

Alfred Preferences → Advanced → Selection Hotkeys (change to Restore previous clipboard item (uses more memory)).

 

Thanks! I never understood what that feature did, but your suggestios has fixed it! 

 

Oddly enough, it might just be my machine, but the file still shows up in my clipboard history. But now, at least when I paste from it, Alfred's smart enough enough to paste the previous text (i.e., almost as if the file was not there). 

 

 

Thanks again for all of your help!!

Link to comment
4 hours ago, vitor said:

Ah, right. The correct line is icon: { path: field + '.png' }.

 

@vitor That was it! It works great now. Thanks a ton.

 

 

If you're curious, after tinkering around with the EXIFtool today, it looks like what it calls the "description" is what corresponds to what Adobe calls the "subject." Since your workflow centers around field names, this was really easy to change. Everything looks like it's working great now.

 

 

To supplement this workflow, if I wanted to create a simple file action that uses the EXIFtool to delete all of this metadata at once, is there an easy way to reference the file in a shell script? For example, I've used the following command in Mac's Terminal app to accomplish this:

exiftool -all= -overwrite_original FILEPATH

Is this as easy as attaching a File Action object to a Run NSAppleScript object with the following code?

on alfred_script(q)
	do shell script "exiftool -all= -overwrite_original q"
end alfred_script

I tried several variations of this script (e.g., shifting around quotation marks, references to the file, and some with the arg/vars tool), but I've never really understood how to run shell scripts in Alfred.

 

Thanks again!

Link to comment
On 11/6/2018 at 9:15 PM, Jasondm007 said:

To supplement this workflow, if I wanted to create a simple file action that uses the EXIFtool to delete all of this metadata at once, is there an easy way to reference the file in a shell script? For example, I've used the following command in Mac's Terminal app to accomplish this:


exiftool -all= -overwrite_original FILEPATH

Is this as easy as attaching a File Action object to a Run NSAppleScript object with the following code?


on alfred_script(q)
	do shell script "exiftool -all= -overwrite_original q"
end alfred_script

I tried several variations of this script (e.g., shifting around quotation marks, references to the file, and some with the arg/vars tool), but I've never really understood how to run shell scripts in Alfred.

 

 

@vitor I finally figured out how to run the shell script in the appropriate manner through Alfred. For others that might be interested in doing the same, the following code will remove all of the metadata at once from a PDF (i.e., all the metadata that the EXIFtool can reach, anyways):

 

on alfred_script(q)
	set posixPath to quoted form of POSIX path of q
	do shell script "/usr/local/bin/exiftool -overwrite_original -all= " & posixPath
end alfred_script

Thanks!

Link to comment
2 hours ago, vitor said:

@Jasondm007 You don’t need all that. Make it a Run Script with /bin/bash and exiftool -all= -overwrite_original "${1}" (untested).

 

@vitor I'm not sure what's wrong with it, but I can't get your suggested script working (quoted above). The good news is that the old one works fine. And, I actually simplified it to the following:

 

on alfred_script(q)
	do shell script "/usr/local/bin/exiftool -overwrite_original -all= " & q
end alfred_script

It deletes everything, and it appears to work well with multiple files, too - which is great!

 

 

Related Question: I've been trying to come up with an easy way to write the same input to multiple files at once (e.g., if you wanted to rename several files with the same author), do you have any suggestions for how to approach it?

 

First, I tried feeding into your initial workflow, but I couldn't get anywhere (e.g., coming in at the second step - skipping the part where you'd read the current values and select a field).

 

Instead, I tried adjusting the shell script above, but I haven't had any luck with it either. More specifically, I created  

  1. a file action that accepts multiple pdfs, 
  2. that is linked to an Arg & Vars utility that captures the variable pdf_file (Name) with the selected files {query} (Value)
  3. that feeds to a keyword input for the user to input the author's name for all of the files, and
  4. that feeds to another Run NSAppleScript with the following code:
on alfred_script(q)
	do shell script "/usr/local/bin/exiftool -overwrite_original -Author=" & q pdf_file
end alfred_script

 

Here, I thought that q would pick up the value of the author's name from step 3, and that the variable pdf_file could be referenced from step 2. Any idea what's wrong with the shell script above? I'm sure that I probably screwed up the later portion of the code (" & q pdf_file), but I also wasn't sure if I had to "set" the pdf_file variable in the script, too?

 

To make things easy, I've broken this workflow off and uploaded it here: https://cl.ly/b76255aa15d9 . And, to test, the same PDFs from before can be used. You'd just need more than one of them: https://cl.ly/1618a40a381d

 

Thanks for any help you can lend!!

Link to comment
  • 3 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...