Jump to content

Skimmer: PDF actions for Skim


Recommended Posts

Skimmer

Actions for PDF Viewer Skim

 

icon.png

To download, visit the Packal page

 

Version: 2.2.1

 

Description

This is a fairly simple workflow that works with the free Mac PDF app Skim. Skim is a fantastic app with great Applescript support. This workflow provides quick, easy access to a few custom Applescripts that I've written to deal with certain pesky problems I come across when dealing with PDFs.There are currently only 3 actions:

  • Crop and Split PDF
  • Extract Data and Search Google Scholar
  • Search your PDFs
First, Skimmer allows you to properly format those darned scanned PDFs. You know the ones I'm talking about, 2 books pages scanned into one, landscape-oriented PDF page. I want all of my PDFs in pretty, proper format with one PDF page corresponding to one portrait-oriented book/article page. In the past, it was quite the ordeal to crop the PDF so that the right- and left-hand margins were equal, and then to split each individual page and finally reconstruct the entire PDF. Skimmer makes this whole process as simple as π. You can use either a Hotkey or the Keyword split to activate this feature. 

 

skimmer_split.png

 

Skimmer then does 3 things:

  • Crop the PDF using a user-inserted Line Annotation (if necessary) (see image below)
  • Split the two-page PDF into individual pages
  • Re-assemble everything and clean up
Let me walk you thru the process. To begin, you will need to ensure that the two scanned book pages have equal margins. Skimmer will split the PDF page right down the middle, so we want the middle of the PDF to be the middle of the two pages. If the margins are unequal, you only need to use Skim's Line Annotation to create a border for Skimmer. Here's an example:

 

skimmer_original.png

 

Note the small, vertical line at the bottom of the page. Skimmer will crop off everything to the left of this line. You could put the line anywhere on the page. If you the right-hand margin were too big, you could put it to the right, and Skimmer would automatically crop the excess stuff to the right of that line. If both margins are too big, you can put two lines on each side and Skimmer will take care of the rest. Note, Skimmer will crop every page at this point, so find the farthest extremity on any page and use that as your guide. Skimmer can tell what page you are looking at, so it'll make things work (note that in the image above, this is one of the middle pages being used as the cropping template). Skimmer does not crop Top or Bottom Margins, so you will need to manually crop PDFs with wacky top and/or bottom margins.

 

Once Skimmer has cropped the PDF, it will go thru and split each page into two separate pages. Depending on the length of the PDF, this can take a bit (appr. 0.67 seconds per original PDF page). This is all done invisibly tho, so that's a bonus. In order to ensure that Skimer splits the PDF properly, regardless of orientation, the script will split the first page and ask you what portion of the page you are seeing (left-hand, right-hand, top-half, or bottom-half). Your choice will ensure that Skimmer does the splitting just so.

 

After it splits all the pages, Skimmer will save a copy of your original PDF and then close it as it opens the new, split PDF. This new PDF will be properly formatted and saved in the same folder as the original PDF. Here's an example of the PDF above after it was automatically cropped and split:

 

skimmer_final.png

 

For anyone who deals with lots of scanned PDFs, I can promise you, this is a godsend. 

 

 

The second feature will take OCR'd PDFs and try to extract relevant search information and then search Google Scholar (which will make it easy to then add citation information to your citation manager of choice. Users of ZotQuery will immediately see where I'm going with this...). This feature can be activated by a user-assigned Hotkey or by the Keyword extract when the desired PDF is open in Skim. 

 

skimmer_extract.png

 

This feature will look for three possible things in the currently viewed page:

  • a DOI (Digital Object Identifier)
  • an ISBN (for books)
  • JSTOR title page
If it cannot find any of these things, it will present the user with a list of Capitalized Words from the currently viewed page. You then select whichever words you want to be the Google Scholar query. Once the query is chosen (whether automatically as one of the 3 types above, or user-chosen keywords), Skimmer will automatically launch your default browser to Google Scholar using the query. What you do from there is up to you. 

 

Finally, you can also search through all of your PDFs and open any one of them right in Skim. Use either the keyword `skimmer` or the shorter `sk` to begin the query. Then enter your query term. The results will update as you type. You can hit `return` to open any item directly in Skim, or you can `right-arrow` to enter Alfred's file browser for that item.

 


 

As I said, these are the only two functions for Skimmer currently, but I will be adding at least one more (for exporting notes) soon enough. If you have any killer Applescripts for Skim app, let me know and maybe we can add them in. 

 

Here's to PDF management, 

stephen

Edited by smarg19
Link to comment

hmm...Do you have version 2.2 of Alfred, with the debugger feature? If not, could you upgrade and use the debugger when you run the workflow? Also, could you post your system information. What OS? What version of Alfred? What version of Skim? And if you could post like a Dropbox link to the PDF you were trying to split, I can test on my machine and get a clearer idea of what might be going wrong. 

Link to comment

That happened to me when the page wasn't rotated. Make sure the orientation is right, and then it might work.

Undoubtedly, the blank page relates to rotation/orientation issues. The script tries to be smart tho, and if the PDF isn't oriented the way it wants, it tries to figure out how things are oriented. Basically, if the PDF isn't properly oriented (there can be many reasons for this), it will split the first page and show it to you, asking you whether it is showing you the right-hand page or the left-hand page. Once you answer, the script splits the document accordingly. 

 

When you are getting this blank page, is this the final product? Or is this a pop-up? If it is a pop-up (i.e. not the whole PDF split, just part of the first page), then could the blank page be the left-hand page of a book scan or something, where the left-hand page is blank before the right-hand page title? Let me know exactly what's happening, because the script should work with any two-page scanned PDF, regardless of orientation, even if it needs some user-input. 

 

Also, I will edit the original post to better describe this scenario. 

Link to comment

Stephen... I've been looking for a way to split pages easily like this for... um... years. This is amazing.

Wait till I finish the Note Exporting feature too. If you've read any about my Skim->Evernote system on my blog, I'm basically making this portable for everyone, *including the custom URLs that will automatically return you to that exact page in the PDF!* I'm only working on the last part: user-configuration of highlight color to text title. But the custom url and url handler work perfectly :)

Link to comment

Wait till I finish the Note Exporting feature too. If you've read any about my Skim->Evernote system on my blog, I'm basically making this portable for everyone, *including the custom URLs that will automatically return you to that exact page in the PDF!* I'm only working on the last part: user-configuration of highlight color to text title. But the custom url and url handler work perfectly :)

 

Sounds pretty sweet. I generally don't use pdf->evernote functionality though. I prefer just to put the pdfs that I'm using into my dropbox and then mark them up on my iPad just like I would a book. For some reason it works better for me that way.

 

Quick question: say that you have a bad scan of a two-page book in which the pages don't line up with each other from pdf page to pdf page. If I do a regular split, then it cuts off some text. How do you get around this? Is there any pre-processing that you do?

Link to comment

1. Currently you can select either HTML or Markdown export format, and then Evernote or Clipboard for destination. So you could store MD notes in nvAlt or DEVONthink or anything. I just like the idea of having my thoughts and excerpts from PDFs I've read stored separately from the highlighted/annotated PDFs themselves. An extra layer of storage and searchable text. But it will obviously also be totally optional within Skimmer.

 
 

2. I would manually crop the odd-margin pages before running the script. In the end, I think this is the most error-free method. I confess, however, to having tinkered with a script that took OCR'd PDFs and tried to add consistent margins on either side of the text, regardless of the visible margins. It's been a while since I've looked at that code, and I don't distinctly remember the edge cases that made me deprecate it (I do know that at that time, I hadn't figured out how to split and save pages without popping up the individual pages, and this process would require two pop-ups per page to run, and I hated that. If that was the only thing, and my math was good, maybe I could refresh it.). That code should be in my Gists, feel free to look it over and try it. I would recommend changing out the final stage of the script to save the PDF raw data to a file without using the "Create New from Clipboard..." command in Skim (which would create the individual projects). Conceptually, I like the idea and I'd def be willing to work on it, tho it's not at the top of my list currently (I'm still fiddling with some ZotQuery stuff, on top of the note exporting stuff for Skimmer).

 
 
PS. What are your thoughts on workflows sharing features? I personally use ZotQuery, SEND, and Skimmer all in the same workflow and I've thought about bundling them all together under ZotQuery, but I also think that may be a bit hefty and over-bearing on defining the user's workflow. Should I keep them all separate, as I've done, but just describe how I use them together, for people who may want to copy my workflow? Or would there be value in having PDF management funcationality (that's what I see Skimmer and SEND doing) baked into ZotQuery?
Edited by smarg19
Link to comment

I have Skim 1.4.8 I just download the new version of Alfred. I will try again and let you know. Thanks! 

 

Now I have the last version of Alfred with debugger. I run split again and this is what I got:

 

Starting debug for 'Skimmer'

 

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make POSIX file "/Users/yoyontzin/Documents/Faisceaux Pervers_left_1.pdf" of application "Finder" into type alias. (-1700)

 

This is the link with the document and the result: https://www.evernote.com/shard/s43/sh/b7af16eb-8658-4286-8f43-ca3d4a25c8e7/01bcc9c27e58c5a388493e65dd700db2

 

Thank you! 

 

I am using OS X 10.9.2

Edited by Yoyontzin
Link to comment
Perfect. That's exactly what I needed to know. It was a simple bug to squash. Sorry for the mistake. I've also generally upgraded the Splitter. It now presents the user with an example split, and asks the user what part of the page is being shown (left-hand, right-hand, top-half, bottom-half). Depending on what you choose, the program will then proper split the rest of your PDF. This ensures a clean split, no matter the orientation. 

 

I'm still working on the OCR based splitting, so that's not here yet. But hopefully sooner rather than later. 

 

I tested the new script on your PDF @Yoyontzin (which was 90 pages originally) and everything worked properly. For those interested in benchmarks: on my 2013 (newest model) 13" MBA, it took basically 1 minute from start to finish to process that 90 page PDF (to create a 180 page PDF). So that's basically 2/3 second per original page. 

 

Also, a general tip, the built in auto-cropping only works on left/right margins. You should crop top/bottom bottoms manually if necessary. Skim makes full PDF cropping pretty simple, so this shouldn't be a burden. 

 

Please head to Packal and upgrade to version 1.1. 

Link to comment

Downloaded your Skimmer today. Have loads of those doublesided PDFs. Tried your workflow with an English 69page document, – latest Skim, latest Alfred, latest Skimmer (webpage says 1.1, workflow says 1.0) and got this:

 

[iNFO: alfred.workflow.input.keyword] Processing output 'alfred.workflow.action.script' with arg ''
[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Skim got an error: Can’t make 69 into type specifier. (-1700)

 

Very promising vision to tidy up my PDFs with this workflow!

Edited by kithairon
Link to comment

Argh. Had changed something in one part of the code (which I didn't use in my testing), and failed to change it everywhere. Dumb mistake. But easily fixed. Version 1.1.1 is now up on Packal (with proper version number in the workflow as well). 

 

Thanks for the catch @kithairon

Link to comment

Thanks for the update. Tried Skimmer 1.1.1 with the same pdf as before:

 

[iNFO: alfred.workflow.input.keyword] Processing output 'alfred.workflow.action.script' with arg ''
[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t get item 1 of 69. (-1728)

Link to comment

Something's happening, however after about 10 seconds I get this error:

 

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: System Events got an error: Datei enthält kein Symbol (-50)

 

The German says that that the file does not contain any symbol. However, that file *does* contain two clean blue lines. (I love the little bug-feature in the latest Alfred.)

 

On two of my samples I get the dialogue coming up and the trim actually takes place, however the split is not yet happening.

Edited by kithairon
Link to comment

It looks like the same error you had with ZotQuery. Your comp doesn't like icons in dialog boxes. I'm away right now, so I'll try to fix it later tonight. Hopefully it's that icon error in the dialog in the combinePDFPages handler.

Link to comment

Just tried with 1.2.1. The trims work smooth, the 2nd dialogue (about keeping the first page) comes up – however:

 

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make POSIX file "/Users/amw/Downloads/original_sarvastivada_literature_left_1.pdf" of application "Finder" into type alias. (-1700)

 

There is new file called [filename]_split.pdf which is, however, just a single empty page.

Edited by kithairon
Link to comment

Thanks, but no luck. Result as before: nice trim, backup of original and new split.pdf are produced, albeit empty. The error remains:

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make POSIX file "/Users/amw/Downloads/temp/" of application "Finder" into type alias. (-1700). I'm still on 10.8.5.

Edited by kithairon
Link to comment

Ok. Well this is a difference, apparently, in our two versions of Applescript. That makes it impossible for me to test and fix. What this means is that you will need to do a little fiddling with Applescript to determine how your Applescript handles POSIX paths. Give me some time and I'll throw together the key code in the splitter that deals with paths, and have you fiddle with those. Unfortunately, there's not much else I can do...

Link to comment

Ran your latest script on a 10.9.2 machine. The error here is:

[iNFO: alfred.workflow.input.keyword] Processing output 'alfred.workflow.action.script' with arg ''
[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: System Events got an error: osascript hat keine Berechtigung für den Hilfszugriff. (-1719)

i.e. "osascript has no permission for the auxiliary access"

Link to comment

Seems like your machine has some different settings/preferences, which is fine. From that error, I would guess that the problem is in the helper scripts that `splitter.scpt` calls to for various things. Could you do me a favor and confirm this by opening the workflow folder and manually running the `splitter.scpt` script. When the error hits, it should highlight a specific line. Let me know what line it is. This will help me to more efficiently fix the problem.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...