Jump to content

PDF Metadata Editor?


Recommended Posts

By chance, does anyone have a workflow for editing the metadata of a PDF that they'd be willing to share (Fields: Title, Author, Subject, Keywords )?

 

I have a huge library of PDFs for academic articles, and I am constantly stuck updating the metadata in new PDFs that I download. To edit these fields, at the moment, I have to open the entire file in an all-purpose PDF editor, like Acrobat Pro or PDF Expert. However, I was hoping to find an easier way of doing this within Alfred. Ideally, I'd love to be able to both view and edit those specific metadata fields within Alfred (i.e., without having to rely on Finder, Terminal, or some other tool). 

 

If Alfred's not good for these tasks, what tools do others use? I imagine there are lots of researchers, journalists, students, etc, with similar problems.

 

Thanks for any help you can lend!

Link to comment
3 minutes ago, vitor said:

Do you always edit the fields to the same values, or does it change each time? And do you need to see what the original fields were before editing?

 

No - the fields' values change each time (e.g., different authors, titles, etc.).

 

Ideally - it'd be great to see them first, but this isn't necessarily required.

 

Thanks!

Link to comment
On 10/19/2018 at 11:51 AM, vitor said:

Are they always the same fields? And what’s their order of importance/the order you’d edit them?

 

Yes - In order of importance, I always update the (1) Title and (2) Author fields, and sometimes the (3) Subject and (4) Keyword fields.

 

However, I don't always update them at the same time. That's why I was initially thinking of creating a sperate file action for each field using the EXIFTool. However, I couldn't figure out how to attach it to an argument correctly (i.e., the input from the user that would be triggered once the file action was triggered, allowing them to input the value for the field, like the author's names), and still get the shell script's syntax to work.

 

For example, it might look something like the following:

  1. File action
  2. Arg/Vars Utility - save variable with Name "FilePath" and Value {query}
  3. Keyword w. space & Argument Required - this is where the user could input the authors' names, separating each with a comma. So, for an article with two authors, it might look something like this: "FN1 LN1, FN2 LN2"
  4. And this is where I get lost ... Do I save this new argument as a variable or can I dump it right into the shell script?
  5. Run NSAppleScript - Depending on the previous step, you could essentially run something like this:  exiftool -author="AUTHORS NAMES FROM PREVIOUS STEP" {var:FilePath}. However, this syntax would have to be changed.

I'm sure there are a ton of other ways to update a PDF's metadata, like using the PDFTK utility or the XATTR command. I don't really care if the solution uses the EXIFTool. I'm just throwing it out as an option here because I know it would work if I could figure out steps 4 and 5. I've used it through Terminal in the past (and have it installed). I'm just new at scripting, and Alfred for that matter.

 

Thanks again @vitor !

Edited by Jasondm007
typo
Link to comment
7 minutes ago, Jasondm007 said:

I'm afraid these examples are a little over my head.

 

I’m not suggesting you edit the Workflows to fit your need, but that you use them to get a feel for which behaviour you’d prefer.

Edited by vitor
Link to comment
3 minutes ago, vitor said:

 

I’m not suggesting you edit the Workflows to fit your need, but that you use them to get a feel for which behaviour you’d prefer.

 

I understood what you meant. And, at a high level of generality, I understand what's going on in the workflows you suggested (which are great, btw). I just don't understand if what I suggested is even the right tool, let alone the best approach to updating a PDFs metadata through Alfred. Thanks!

Link to comment

I don't.  In fact I prefer to leave it alone since it often has things like the "Where From:" field that I'd like to keep unchanged.

 

I use BibDesk to manage my academic PDFs (but any reference manager will work the same way).  The BibDesk record is where the author, date etc. goes, and this can typically be downloaded from the publisher and/or Google Scholar (most often for me this means using Andrew Ning's reference importer workflow .  I then add the PDF to the BibDesk record, and it's set to autofile and rename based on the info. in the BIbDesk record (I use Author Date Title existing filename as my autofile template).  

 

The BibDesk record for me is the canonical version of the pub -- I might have multiple versions of the PDF, various sets of reading notes, links to related pubs, etc., attached to the BibDesk record, but I don't worry about the details of the files.  I can't remember the last time I looked up an article by browsing in the PDF directory, actually -- I always begin with a search in BibDesk.

 

I don't see the payoff of editing PDF metadata directly, basically because I don't have a use for it.  Searching BibDesk (via Alfred usually) lets me find the files, as well as create linked annotations etc., and generate citations and bibliographic refs. for the articles.  To me it makes much more sense to create a record in BibDesk (or again whatever reference manager you're using) and let it manage the filing for you.

 

Having said that, if I wanted to copy reference info. from BibDesk into the metadata of linked PDFs, I would definitely use ExifTool .  (I use this with Hazel to automatically set a bunch of metadata for my photos).

 

Incidentally I use the same approach for primary sources but with a custom FileMaker database -- I was working on a couple databases earlier tonight to manage court cases (mostly saved as PDF but again I'm not going to save the details in the metadata of the file itself) and archival records.  Each entry has a link to the PDF of the court judgments, and I can add other docs. (Heads of Argument, annotations, my own notes, citing cases, etc).

 

 

Link to comment

Actually there is one piece of metadata that I do keep in sync with BibDesk. I have AppleScripts to copy BibDesk keywords to linked files as Tags and to Ulysses sheets as keywords.  I rarely search for academic PDFs by Tag but I do use keyword searching on my reading notes in Ulysses quite a bit.  For the file tagging I use https://github.com/jdberry/tag .  For Ulysses I use the Ulysses URL scheme.

 

Actually I just looked at my code, and it turns out I don't use the tag command line tool ... but I would if I were to do it again.  Here's the script I use, which gives an idea of how to use AppleScript and xattr to write file metadata.  It's ugly and awkward but it works so I can't be bothered to clean it up :)

set plistFront to "<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\"><plist version=\"1.0\"><array>"
set plistEnd to "</array></plist>"

(* By Derick Fay, 2013-10-28 *)
(* Thanks to http://mosx.tumblr.com/post/54049528297/convert-openmeta-to-os-x-mavericks-tags-with-this for getting me started *)

tell application "BibDesk"
	
	-- without document, there is no selection, so nothing to do
	if (count of documents) = 0 then
		beep
		display dialog "No documents found." buttons {"•"} default button 1 giving up after 3
	end if
	set thePublications to the selection of document 1
	
	-- get the keywords
	repeat with thePub in thePublications
		set currentKeywords to get keywords of thePub
		
		
		-- convert the keywords to a plist for use with xattr
		set {myTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, {", "}}
		set tagList to text items of currentKeywords
		set AppleScript's text item delimiters to myTID
		set plistTagString to ""
		repeat with theTag in tagList
			set plistTagString to plistTagString & "<string>" & theTag & "</string>"
		end repeat
		set plistTagString to plistFront & plistTagString & plistEnd
		
		-- add the tags
		set theFiles to POSIX path of linked files of thePub
		
		repeat with f in theFiles
			do shell script "xattr -w com.apple.metadata:_kMDItemUserTags '" & plistTagString & "' " & quoted form of f
		end repeat
	end repeat
end tell

 

Edited by dfay
Link to comment
1 hour ago, Jasondm007 said:

I just don't understand if what I suggested is even the right tool, let alone the best approach to updating a PDFs metadata through Alfred.

 

EXIFTool should work just fine. But that’s the easy part. What I really need to know is the Alfred interface (hence why I linked the Workflows to give an idea of the possibilities) because that’s what takes longer to build.

Link to comment

@vitor & @dfay Thanks for your thoughtful responses! And, sorry for the slow response on my end - family emergency this weekend, but everyone’s in good shape now.

 

@dfay  Thanks for the helpful workflow details. To be honest, I’m one of those dinosaurs that still writes his citations out. Don’t get me wrong, I hate it. I just never bothered to adopt a citation manager because they were never well equipped to deal with the style that I most frequently use (Bluebook). However, that seems to have changed over the last few years, so perhaps I ought to consider taking the plunge. That would help with the metadata and filename issues.

 

For me, the metadata issue came to head when considering whether to adopt DEVONthink Office Pro. While I have always maintained a pretty simple filename approach for my PDFs (journals, chapters, books, etc.) and corrected their metadata as needed, DEVONthink’s displays made me realize how many old PDFs that I have on my computer that are still a mess. Unfortunately, DEVONthink’s tool for correcting PDF metadata is terrible, so I was hoping to use Alfred. Prior to testing out DEVONthink, I used to just correct PDFs’ metadata through Adobe Acrobat Pro (one at a time … or through PDF Expert, as I read them). Although I’m not sure if I will ultimately adopt DEVONthink Office Pro, I’d love to use Alfred to visualize and correct metadata in PDFs. I am still a little surprised that a workflow didn’t already exist for this issue.

 

Relatedly, have you guys considered using DEVONthink Office Pro to help organize your files? At the moment, I use a pretty intricate series of folders (and aliases/shortcuts where there is overlap). And, I also have a few tags that use as visual cues in Finder for the more important files - and because my eyes are terrible - but I’ve never done much with Finder’s tagging system, in large part, because it’s not built for more complex tag hierarchies (you can’t visualize them easily, they’re a pain to update and reorganize, etc.). While Alfred is great for supplementing Finder, there’s only so much that it can do on this front (searching for and applying variations of tags, etc.). This is what pushed me to consider using DEVONthink Office Pro. Simply being able to see my tags and fluidly apply and reorganize them is super helpful. It gets around the complicated Finder folder structures (and the stupid alias/shortcut approach that I use to avoid duplicating PDFs - which comes up a lot because of all the interdisciplinary work that I do). In addition to organizing my PDFs, I also wanted to replace Evernote - which I use for web-clippings - and the Notes app - which I use for short notes on everything. That way, I could place everything in the same receptacle, and apply the same tagging structure to it (everything except OmniOutliner, anyways, which I use for outlining research projects, teaching outlines, etc.). Have you ever considered using DEVONthink for these purposes? I’ve only used the application for a few days, but I have mixed feelings about it. My knee-jerk reaction is that, if they hired a good UX designer, it could probably be the best app ever created for the mac (no offense Alfred, I still love you). But there are so many stupid little things about it, that I’m not sure if I’m going to adopt it. DEVONthink feels like it’s capable of doing anything … but everything requires an AppleScript or some weird workaround to accomplish really basic stuff (I’m not talking about its aesthetics either … just basic usability issues). 

 

Is BibDesk’s tagging system comparable to DEVONthink’s? Can you create complex hierarchal tagging systems to find everything for your projects? I’m a Scrivener user, so that complicates things a little. I usually have to export things to Word at the end, anyway; so, it shouldn’t bee too bad. If you were to start new, today, would you use BibDesk again? If not, which app would you go with?

Link to comment

@vitor In my opinion, the easiest way to create a metadata tool for PDFs would be to create a file action for each important field, which uses the EXIFTool to write the information. So, the user could simply use a different file action for each field. Once the file action is selected it would trigger a keyword input where the user could type in the value of the field that they want it to be. Relatedly, it be great if the subtext of the keyword told the user its current value. However, I assume that’s not possible in Alfred.

 

A better approach, however, would be one where (1) a file action triggers (2) a script filter which provides the current values for the following metadata fields: (a) Authors, (b) Title, (c) Subject, and (d) keywords. From there, the user could view the fields, and, if necessary, select one that they might want to rewrite. So, if the user selected the Authors field, for example, it would trigger a keyword input where they would type in the authors’ names (separated by commas) and the EXIFTool could be used to update them accordingly. This tool would be great for people with lots of PDFs, regardless of whether they use tools like DEVONthink, etc. Thanks!

Link to comment
8 minutes ago, dfay said:

Re: tag - I use it almost exclusively through scripts run from Alfred or Hazel.  There are examples using bash and calling it from python in this thread:

 

 

Thanks - That screenshot is actually from a workflow that I created using your script - which works great! Thanks again!!

 

However, I only use it to apply existing Finder tags (not to search or apply more complicated hierarchical tags).

Link to comment
On 10/22/2018 at 2:19 PM, dfay said:

Very briefly (will reply re DevonThink and workflow later....) the file action you describe its totally feasible in Alfred...

 

@dfay Since it sounds like you have a great system for managing your research - and have tried out many of the same tools - I was wondering if you had an opportunity to think a little about my post last week (re workflows, DEVONthink, and tagging/citation managers)? I’m still struggling with these issues, and was hoping to learn more about your own workflow and thoughts on these tools. Thanks again!

Link to comment

Sure.  I've demo'ed Devonthink seriously three times in the past 12 years or so, and have never liked it.  The AI seems ok for filing invoices and that kind of largely standardised document but I never found it to work with academic lit. -- it would do things like try to file all my JSTOR downloads together because the cover pages were similar.  It predates native tagging and Spotlight (pre-Tiger was it? can't remember) so it probably solved some problems way back but it has never impressed me.  UI and menu systems desperately need a cleanup and reorganisation but the developers really don't seem to care (e.g. not fixing Zoom eight years after the original request: https://forum.devontechnologies.com/viewtopic.php?f=2&amp;t=8131 ).  I was quite put off by some of their responses on the DT forums.  Likewise DevonAgent seems like a tool that has outlived its usefulness - see https://talk.macpowerusers.com/t/the-other-devons-agent-sphere/4913 . The other downside is that DT is really not oriented to creating custom metadata in a very usable way, as far as I can tell.  The only reason I could see using it now is to take advantage of document linking that works on MacOS and iOS, but sync doesn't sound that reliable.  Long discussion pro and con here: https://talk.macpowerusers.com/t/good-arguments-to-buy-devonthink-pro/3554/67

 

The core of my academic paper workflow is BibDesk as described above.  The underlying BibTeX format and existing tools are definitely not focused on legal citation.  But you can use your .bib file with citeproc & CSL which has been pretty aggressively pushed toward accommodating legal citation (not least by Frank Bennett of Nagoya U).

 

Having said that, I have big piles of PDFs that aren't in BibDesk yet -- stuck in various folders which serve as project inboxes, waiting for me to review and organise them.  For this stuff, HoudahSpot has been the most valuable addition to my toolkit in the last few years -- it can do everything I might have got from DT while relying on the native file system and tagging.  Especially with its Search Bar and some custom templates to limit the search scope.

 

I tend not to use hierarchical tags at the file system level - have never really felt the need.  For detailed paragraph-level coding, I do (using MaxQDA) but that's very project-specific.

 

I also rarely if ever use aliases -- BibDesk lets me stick publication records in multiple folders in the app, so I just have a folder or two per project and I can dump the record in every one where it belongs without worrying about the linked files themselves.

 

Besides tagging linked files with BibDesk keywords, my main use of tags is actually for more short-term purposes.  Sometimes I'll add tags to files in a project inbox folder as a form of preliminary triage.  Or I'll use tags to track status of things - so I have ridiculously long tags like checkedFor2017MedicalExpenses, checkedAgainstZCFolders, Gone0915 (for households that were no longer part of a panel sample in 2009 or 2015).  That kind of thing.  Combine these with HoudahSpot and Arrange By... and Smart Folders in the Finder and that provides pretty robust file management, for my purposes anyway.  Some of this tagging is combined with filing (e.g. 

).

 

What else?  It's all a work in progress, perpetually.

 

On FileMaker this recent thread may be helpful:

https://community.filemaker.com/message/805921

 

 

 

 

 

 

Edited by dfay
Link to comment

@dfay Thanks for the response! This was really helpful, and you’ve given me a lot to think about.

 

As for your frustrations with DEVONthink, I definitely share them. In fact, I laughed about your reference to sticky zooming, as it was the first thing I contacted their support group about (i.e., because I just thought I was overlooking something - given that it’s UI is so clunky). 

 

I guess I’m just attracted to the nuanced tagging options that DEVONthink brings, as well as using it for web clippings (to replace Evernote). What do you use to capture websites?

 

Whether it’s BibDesk (or DEVONthink, for that matter), I wish that their tags could be dynamically added to the PDFs as native Finder tags. While I’m no DEVONthink expert, I think that they’re only added to files when they’re exported from a database. Like yourself, I’d prefer to stay in the native OS (and use other tools that can see this information, too). I just don’t see how to use more nuanced and dynamic tagging approaches in Finder (even with Alfred’s help) - though HoudahSpot sounds great for finding things! I wish there was an app that would work within Finder to quickly show the user’s tags (and their hierarchy) so that they could be visualized and quickly added to a file without having to remember each tag or create a unique Alfred workflow that displays them, etc. That way, Alfred/Finder/HoudahSpot could all use this information to search and display things.

 

Regardless of how hard I try, file organization feels like a fool’s errand.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...