Jump to content

Skimmer: PDF actions for Skim


Recommended Posts

Hi,

 

great workflow. Works perfectly for me. Im very much looking forward to the notes feature. Will automatic urls to the pdf pages also work without evernote? 

 

My actual question: I tried the split functionality with several PDFs. It works nicely. However the first page seems to be missing always; it doesn't matter whether I chose "keep" or "discard" in the "keep the first left hand page"-dialogue. Any idea how to fix that? 

 

I'm still on 10.8.5 as well…The debugger thingie inside alfred doesn't show _any_ errors/warning.

 

//

 

Another bug, it seems: 

 

Starting debug for 'Skimmer'

 

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make text items -4 thru -1 of "a" into type string. (-1700)

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make text items -4 thru -1 of "ab" into type string. (-1700)

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make text items -4 thru -1 of "ab" into type string. (-1700)

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make text items -4 thru -1 of "abc" into type string. (-1700)

 

These files are exactly the same pdf. It seems that for the workflow to function the file needs to be at least 4 characters long; otherwise I get this error.

Edited by mervin
Link to comment

Ran the splitter.scpt. The error kicks in on the following line:

tell application "System Events" to tell process "Skim" to tell menu bar 1 to tell menu bar item 3 to tell menu 1 to click menu item 1

Hope that helps.

 

Ok. That part of the script uses GUI scripting to generate the section of the PDF. GUI scripting is infamously tricky, and in Mavericks it's even harder for user's to allow for GUI scripting. So, I've rewritten that part of the script to save the page section to a temporary file and then to open that temporary file before deleting it. This avoids the GUI scripting all together. 

 

The issues with file types and path types is complex, but I've worked hard to clean everything up. I really hope that this new version will work. 

 

Hi,

 

great workflow. Works perfectly for me. Im very much looking forward to the notes feature. Will automatic urls to the pdf pages also work without evernote? 

 

My actual question: I tried the split functionality with several PDFs. It works nicely. However the first page seems to be missing always; it doesn't matter whether I chose "keep" or "discard" in the "keep the first left hand page"-dialogue. Any idea how to fix that? 

 

I'm still on 10.8.5 as well…The debugger thingie inside alfred doesn't show _any_ errors/warning.

 

//

 

Another bug, it seems: 

 

These files are exactly the same pdf. It seems that for the workflow to function the file needs to be at least 4 characters long; otherwise I get this error.

The Note Exporter is coming along well. It has been temporarily put on hold, however, as I deal with these problems. The page-specific URLs will work anywhere, you just need to ensure that they are "clickable". So, Evernote displays HTML notes, so links are clickable. You will be able to export in Markdown, so you could export notes as Markdown and use Marked to get an HTML preview where the links are clickable, etc. 

 

The "losing first page" error is odd. I can't replicate it, but let me make sure I fully understand what you mean. When you first open a PDF that you want to split, you have 1 PDF-page that has 2 scanned-pages on it. The goal of the splitter is to have 1 PDF-page for every 1 scanned-page. When you say that "the first page seems to be missing", do you mean the first PDF-page or the first scanned-page? 

 

Also, I've fixed the PDF title bug. Thanks for the find.

 

- - -

To both of you, please upgrade to version 1.3 on Packal. Please also test the new splitter function on both OSes (the more testing the better). As always, if an error does pop up, please use the debugger to let me know what it is. 

 

1.3 has a thorough re-thinking of the splitter script, so I'm hoping for the best. Let me know, 

 

stephen

Link to comment

Thanks for the latest update. I'm happy to say it works beautifully on both 10.8.5 and 10.9 – truly amazing. I tried about 10 pdfs (some of them north of 500 p)  and it worked fine – and faster than I expected according your description earlier. This is so helpful for folks with pdfs of out-of-print-books and old scanned double-page articles. Hat tip across the pond!

 

I noticed that the pdfs after cropping & splitting seem to triple sometimes quadruple in size; after saving them in Acrobat with the "reduce size" option they come down again without noticeable loss of reading quality. Still, it is more than worth it.

Link to comment

I'm so happy that version 1.3 cleared things up. And I'm glad that this workflow will prove helpful to you. It's def quite helpful to me. And yes, things have sped up some. I'll retest and try to get a sense for how fast it is now. 

 

I will also explore PDF reduction from within the workflow. Maybe there's something simple out there that I could add easily enough. 

Link to comment

Smarg19, if the scanned PDF starts with pages 1 and 2 (of the scanned document), the split pdf starts with page 2, regardless of what I choose in the dialogue. This doesn't happen with 1.3 on 10.9.2 anymore and I upgraded my machines a few days ago, so I can't test it on 10.8.5 anymore.

 

But there still seems to be a name related problem. I have a file named "Adorno, Horkheimer/ Über ein neues, zu schreibendes  komm. Manifest?-OCR.pdf", if I try to split it, it creates the file "original_Adorno, Horkheimer/ Über ein neues, zu schreibendes  komm. Manifest?-OCR", but it doesn't seem to work on the splitting process. 

 

Alfred debug: [ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make file "machine:Users:mervin:Desktop:Adorno, Horkheimer: Über ein neues, zu schreibendes  komm. Manifest?-OCR_split.pdf" into type alias. (-1700)

Edited by mervin
Link to comment

Smarg19, if the scanned PDF starts with pages 1 and 2 (of the scanned document), the split pdf starts with page 2, regardless of what I choose in the dialogue. This doesn't happen with 1.3 on 10.9.2 anymore and I upgraded my machines a few days ago, so I can't test it on 10.8.5 anymore.

 

Well, as long as it works. 

 

 

I have a file named "Adorno, Horkheimer/ Über ein neues, zu schreibendes  komm. Manifest?-OCR.pdf", if I try to split it, it creates the file "original_Adorno, Horkheimer/ Über ein neues, zu schreibendes  komm. Manifest?-OCR", but it doesn't seem to work on the splitting process. 

 

Alfred debug: [ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make file "machine:Users:mervin:Desktop:Adorno, Horkheimer: Über ein neues, zu schreibendes  komm. Manifest?-OCR_split.pdf" into type alias. (-1700)

 

There is likely some text encoding issues. I didn't test against Unicode titles, so I'll need to test and refactor. Good catch tho. I will post when I push an update.

Link to comment

Thanks for the latest update. I'm happy to say it works beautifully on both 10.8.5 and 10.9 – truly amazing. I tried about 10 pdfs (some of them north of 500 p)  and it worked fine – and faster than I expected according your description earlier. This is so helpful for folks with pdfs of out-of-print-books and old scanned double-page articles. Hat tip across the pond!

 

I noticed that the pdfs after cropping & splitting seem to triple sometimes quadruple in size; after saving them in Acrobat with the "reduce size" option they come down again without noticeable loss of reading quality. Still, it is more than worth it.

 

A few things I've found to reduce PDFs more dramatically if you're using Acrobat:

 

Make sure that Text Recognition and Document Processing are enabled in the sidebar (View -> Tools).

(1) Make sure that the pages are shifted, and use Preview for this rather than Acrobat.

(2) Make sure that it's in Black and White or Monochrome rather than any sort of color.

(3) Run Optimize Scanned PDF under Document Processing, I use Apple Adaptive Compression (JPEG2000 and JPEG2 (Lossy) with the size about a 1/4 from Small Size), Deskew On, Background Removal Low, Descreen On, Text Sharpening Low. Don't turn on OCR here. Depending on the size, this might take a bit.

(4) Save the PDF (this is necessary).

(5) Run OCR (Clearscan).

 

I've found that separating the Optimize Scanned PDF and OCR makes the files smaller. Quite often the clearscan pulls the file size down even further. Sometimes it adds a bit, but it's worth it to make it searchable. The OCR isn't always the best ABBYY Finereader has a better OCR engine, but the clearscan makes sure that the integrity of the original image is intact (which I'm attached to).

 

If you're using Acrobat, you should also OCR them with Clearscan. You can actually do a more aggressive Document Processing, and the Clearscan makes the text sharper. It works for me.

Link to comment

But there still seems to be a name related problem. I have a file named "Adorno, Horkheimer/ Über ein neues, zu schreibendes  komm. Manifest?-OCR.pdf", if I try to split it, it creates the file "original_Adorno, Horkheimer/ Über ein neues, zu schreibendes  komm. Manifest?-OCR", but it doesn't seem to work on the splitting process. 

 

Alfred debug: [ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: Can’t make file "machine:Users:mervin:Desktop:Adorno, Horkheimer: Über ein neues, zu schreibendes  komm. Manifest?-OCR_split.pdf" into type alias. (-1700)

Try version 1.3.1 and let me know how that goes. Upgrade from Packal

Link to comment

@Shawn Rice: Thanks for your tricks in PDF trimming. Felt confirmed in some of my choices – and was particularly happy to pick up your suggestion about using Preview (rather than Acrobat) for shifting and deleting pages – it does that more gracefully than Adobe's behemoth. Also, plan to start using ClearSpace.

 

@Stephen: Have used your latest Skimmer on 40+ pdfs. I report with some wistfulness that I'll foresee using it very little in the future – it's so good it has put itself out of business. I have literally no more frumpy PDFs left. Managed to throw one last error on a huge file (700p unspilt). Here it is:

 

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: The command exited with a non-zero status. (255)

 

Thanks for a great tool.

Link to comment

@Stephen: Have used your latest Skimmer on 40+ pdfs. I report with some wistfulness that I'll foresee using it very little in the future – it's so good it has put itself out of business. I have literally no more frumpy PDFs left. Managed to throw one last error on a huge file (700p unspilt). Here it is:

 

[ERROR: alfred.workflow.action.script] Code 1: splitter.scpt: execution error: The command exited with a non-zero status. (255)

 

Thanks for a great tool.

 
Ah. Such a huge file is pushing the code to its limits. Here's what I've found from Apple itself
 

Q: How long can my command be? What’s the maximum number of characters?

 
A: There is no precise answer to this question. (See Gory Details for the reasons why.) However, the approximate answer is that a single command can be up to about 262,000 characters long — technically, 262,000 bytes, assuming one byte per character. Non-ASCII characters will use at least two bytes per character — see Dealing with Text for more details.
 
Note: This limit used to be smaller; in Mac OS X 10.2 it was about 65,000 bytes. The shell command sysctl kern.argmax will give you the current limit in bytes.
 
Overrunning the limit will cause do shell script to return an error of type 255. Most people who hit the limit are trying to feed inline data to their command. Consider writing the data to a file and reading it from there instead.

 

 
This makes good sense. With over 700 original pages, you were trying to combine 1400+ ind. pages in the shell command. With a long PDF title, this will easily create too long of a command. My question, however, is how exactly I should "[c]onsider writing the data to a file and reading it from there instead." Clearly if I do this and store the result in a variable before passing it to the shell, it will still be too long. This means I need the shell script to read the file. I'm new to such things, any suggestions from the crowd?
Edited by smarg19
Link to comment

any suggestions from the crowd?

 

Simplest:

 

User side solution: manually split the PDF into two or three separate PDFs and run it on each one of those. Then, combine them all at the end, again, manually.

 

Workflow solution: do a file-size check on the PDF (or a page count), and then, basically, do what is outlined above.

Link to comment

Do you happen to know (as you know more bash than I do) how to read the contents of a .txt file as the parameters of a command?

 

What do you mean?

 

You could do this:

 

Contents of abc.txt

ls tmp

Contents of test.sh

t=`cat abc.txt`
echo `$t`

In that, abc.txt just has a command in it. In test.sh, the first reads the contents of abc.txt into the variable "t" (so, the contents of "t" is now "ls tmp"). Then, the second line echoes the output of the contents of the variable. So, if you run sh test.sh, then it will print out the contents of the directory "tmp".

 

Is that the sort of thing you want to accomplish? In bash, '' denote a literal string, so '$test' will be, literally, the string $test. "" evaluates the variables, so "$test" will be the contents of test.  `` executes commands and evaluates variables, so `$test` would treat the contents of the variable $test as a command.

 

Does that get at what you want?

Link to comment
UPDATE: Version 1.4 is now live on Packal. 

 

I've found a way to split even the largest of PDFs, so that edge case bug is fixed. 

 

There is also now a new script filter, triggered by either sk or skimmer to search through all of your PDFs. Hitting return on any one of them will open that PDF in Skim. You can also right-arrow into Alfred's file browser from any result. 

 

I've also added a simple filter to help you get at the workflow's primary directories. Using the keyword sk:bug you can open either:


  • Skimmer's root workflow folder


  • the non-volatile storage folder


  • the cache (volatile) folder


  • or Skimmer's log file
Edited by smarg19
Link to comment
  • 1 month later...

Hi,

 

it get's bettter and better :-) So hugely useful! Thanks again for your work.

 

Again a little problem with odd file names. I do have a file called "Oskar / Adam, Subjektdiskurse-OCR". This leads to a conflict, the debugger tells me something about  not being able to make it into a nametype. I suppose my Mac saves the "/" as ":" and this somehow conflicts with the name handling. It seems as if there was still something fishy about handling special characters. 

 

Thanks :-)

 

mervin.

 

(The log doesn't show anything, I use the most recent version)

Edited by mervin
Link to comment
  • 1 month later...

Stephen - do you have a github repo for this?  I have a bunch of related workflows, most of which depend on having that Applescript URL handler you wrote installed (and then having the top 3 notes generated on the current PDF so that the page # conversion will work).  I have been lazy to post them here and wasn't sure about it since they are largely adaptations of your underlying hard work.  Here is what they do - let me know & I can get them to you somehow - if you want to incorporate them it seems it might make more sense to consolidate them in a single workflow.

 

Generate Top 3 Notes in Skim

Export all Skim notes to Clipboard with MD links

Copy Markdown Reference Link for current Skim page

Copy Markdown Inline Link for current Skim page

Copy current Skim page # to clipboard

Go to Page in Skim

 

--Derick

Link to comment

Derick,

I'm always a fan of integration! And I do have a repo for this workflow: https://github.com/smargh/alfred_skimmer

I'd love to see this code and see how it might be integrated. My current code has evolved a bit since the code you were likely building off of, however. For instance, I no longer generate the top 3 notes, instead I have the scripts compute all the needed stuff internally. Also, I've slightly modified the URL scheme (you can see my new handler under the dev section of the repo, where all of my current testing is going on).

I'd also love to get your feedback and thoughts on a few advancements I've been thinking about.

stephen

Link to comment

UPDATE to version 2.0!

Version 2.0 adds (finally) the Annotation Export feature. I know that this has been a long time coming, but I was working really hard to make this flexible for everyone. So, let me break down this new feature.

In short, this feature allows you to export all of your PDF annotations from your Skim PDF to Evernote (or the clipboard) while giving you live hyperlinks back to the exact PDF page for the annotation!! You heard me, your Evernote note will have all of your PDF annotations, and each annotation will have a hyperlink that will open up that PDF to the exact page where that annotation is. Trust me, it's super cool, amazingly helpful, and downright near magical.

Compatible annotations include Text notes, Anchor notes, Underlined text, Strike-Thru text, and Highlighted text. Skimmer will take all of your annotations, format them into some pretty HTML and send that to Evernote. I have been working on this code for quite some time, so it is FAST! It can handle and 100+ page book in a jiffy. But, since we all work slightly differently, I've also worked hard to make it FLEXIBLE. In order to use this function, simply use the export keyword. Alternatively, you can assign a keyboard shortcut to the command as well (I use cmd + shift + - myself).

Let me outline how you can make Annotation Export work exactly as you'd like.

First and foremost, I've added the ability for you to assign your own custom palette of Highlight Colors. One of the nicer touches to this feature is the ability to translate certain highlight colors into text headers. This can come in quite handy for really breaking down your text and your thoughts about the text into certain groupings. Now, I have a default set of 6 colors and their 6 corresponding text values, but you can change both the colors and the text to fit exactly your needs. But how, you might ask? Well, version 2.0 comes with a new Help PDF. Simply use the sk:help keyword and select Open PDF to view this document. On the second page, you will see these annotations:

skimmer_config1.pdf-(page-2-of-4).png

The text of the PDF will lay this all out for you, but basically, you simply change the highlight colors and change the corresponding text to what ever you like. There are an (nearly) infinite number of possibilities. The only things to remember are don't mess with the actual highlights, merely change their colors and don't delete prefixed numbers in the text notes, only the text. Otherwise, you can fiddle to your hearts content. Just so you can get a feel for how the process will work, here's what the Evernote note would look like if you ran the Annotation Export script on the Help PDF (well, this is only the highlights section; run the script to see how text notes are handled):

skimmer_config.pdf---Evernote-Premium.pn

NOTE: If you change the highlight colors and/or the text meanings, you will have to run sk:help -> Set Highlights before Skimmer will apply your changes. So, to change the Highlights:

  • Open the Help PDF (sk:help -> Open PDF) and alter the highlights and text on the second page.
  • Run sk:help -> Set Highlights to save your changes.
  • Then, you can use export to actually send your Skim annotations to Evernote.

Now, the ability to alter your highlights palette goes a long way in making this script personalizable (is that even a word?), but I went a step further. You can also tweak the HTML formatting used to create the Evernote note. Unfortunately, however, this will require opening up some Applescript and doing some code tweaking. But I've tried to make it not so scary. Essentially, each annotation type has a general formatting template used to create the HTML. I've abstracted this format and placed each variable element under your control. You can find all of the templates and some basic examples in the Help PDF (page 3), but here is one example, for the Highlight Notes:

--The alterable variables are wrapped in {curlies}, while the fixed elements are in <carets>.
{pre}{wrap}<title>{/wrap} {wrap}<note text>{/wrap} {wrap}<link>{p.} <#>{/wrap}

So, you can prefix anything you'd like to the front of a note type: a dash, a tab, a few tabs, a word, etc. You can then wrap the title of the highlight (this is the text given for whatever color that highlighted annotation was) in anything at all: make it bold, italics, wrap it in [brackets], whatever. You can also wrap the actual text highlighted: in "quotes", make it italics, etc. Finally, you have what you wrap the hyperlink in: it could be (parentheses) or {braces}, etc. And you can specify what page abbreviation you want: p., page, #. Now, the script defaults to settings that I think work pretty well, and you can use those to get a feel for what's possible. Just remember, it needs to be valid HTML. All of these properties are near the top of the action_export-notes.scpt found in the workflow directory (you can use sk:bug -> Root to open this folder easily). Feel free to ask me if you have something you'd like to format but can't quite figure it out.

Since I've added the PDF hyperlinking functionality, I've also added the ability to copy a PDF pages custom URL to the clipboard, if you want to hyperlink to that PDF page in any other context. Simply use the sk:copy keyword. This whole URL hyperlinking works because I have written a custom URL handler which is bundled with the workflow that interprets the custom URLs that I've written to open PDFs in Skim to the appropriate page. It's pretty cool, but the URL uses the path to the PDF, so if you move the PDF the URL will break until you alter all the old URLs to use your new path.

Anyways, there's lots to like in this new release. I'm sure I haven't covered everything. Let me know if you have any problems or questions.

stephen

Link to comment

The export notes works great!!! Many thanks,

 

I also liked the ability to adjust the page number naming of the Evernote links with, "Subtract printed page number from Skim's indexed page number." message.

Thanks Moses, glad to hear it!

Link to comment

Thanks for the new Skimmer, its my biggest time saver.

 

If any students out there desire to use this for pdf generated powerpoints I've taken snippets from smarg19's previous versions (nothing original from me), so that exports have indentation, bullets, and colors (colors are either from users favorite colors, or a hex value). Can be found here.

 

 

 

So if your powerpoint slide looks like this after you highlight:

tnSVykd.png

 

 

 

It looks like this in Evernote:

 

253vrZi.png

Edited by DrLulz
Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...