Jump to content
xilopaint

Alfred PDF Tools – Optimize, encrypt and manipulate PDF files

Recommended Posts

6 minutes ago, BobRe said:

or (preferably) via available GitHub resources?

 

What exactly are "GitHub resources"? If you want a concrete answer, I think you'll have to ask a more concrete question.

Share this post


Link to post
19 hours ago, BobRe said:

Sorry - meant to say free OCR options 

 

 

Alfred PDF Tools is embedded with k2pdfopt which is a CLI that has OCR capability and works with Tesseract Open Source OCR engine. The problem is the use of Tesseract engine with k2pdfopt requires the download of the language training files and additional configuration by the user (more info here).

 

Trying another open source OCR engine would be an option, but we have to consider if increasing the already large workflow file size (12 MB) would worth it. Maybe the OCR feature would be a work for another workflow.

 

In any case I'm open for suggestions.

Share this post


Link to post
1 hour ago, xilopaint said:

we have to consider if increasing the already large workflow file size (12 MB) would worth it

 

Personally, I'd say "no". An OCR engine invariably needs large language library files.

 

Far better to define some kind of "API" for integrating the user's preferred OCR engine. Something like an OCR_COMMAND in the workflow's settings where a user can specify the command-line invocation for their preferred OCR engine with placeholders for the input and output files, e.g. OCR_COMMAND = /usr/local/bin/ocr-x --input $1 --output $2 --lang en.

 

Edited by deanishe
punctuationalisation

Share this post


Link to post

Yeah free OCR in my experience has always been a source of frustration in the end.  Since I gave in and bought OCRKit a couple years ago I've been using OCR much more, with very few issues.  It has a CLI and decent AppleScript support, & thus integrates quite easily with Alfred....  All of which is to say investing in free OCR is probably not the best approach.

 

Share this post


Link to post
21 hours ago, dfay said:

Yeah free OCR in my experience has always been a source of frustration in the end.  Since I gave in and bought OCRKit a couple years ago I've been using OCR much more, with very few issues.  It has a CLI and decent AppleScript support, & thus integrates quite easily with Alfred....  All of which is to say investing in free OCR is probably not the best approach.

 

 

Do you know if ORCKit's recognition accuracy is comparable to some renowned OCR engines like FineReader? I don't see any mention to the CLI in the App Store. Where can I read about it?

Edited by xilopaint

Share this post


Link to post

@dfay, I have Prizmo 3 ($50 in MAS) that works poorly for Portuguese. Considering that OCRKit is only $30 in MAS I fear that its recognition accuracy for Portuguese is not so good.

 

Could you process a single-page PDF file (sent by me) with the OCRKit Portuguese engine and send me the output?

Share this post


Link to post

I use PDFScanner for OCR. €17 and it works very well (with English and German). Better than Adobe Acrobat in my experience.

 

If you send me a page, too, @xilopaint, I'll run it through PDFScanner for you. It says it supports Spanish, so it should be able to handle your weird dialect, too.

Share this post


Link to post
14 minutes ago, deanishe said:

I use PDFScanner for OCR. €17 and it works very well (with English and German). Better than Adobe Acrobat in my experience.

 

Does it have a CLI? I'm searching for an OCR app with a CLI.

 

14 minutes ago, deanishe said:

If you send me a page, too, @xilopaint, I'll run it through PDFScanner for you. It says it supports Spanish, so it should be able to handle your weird dialect, too.

 

May I reply here saying thank you and f*** you? :lol:

Edited by xilopaint

Share this post


Link to post

Update (v2.13):

  • Added a Scale file action.
  • Improved performance of the Split by File Size action.
  • Allowed multiple PDF files selection for the Crop file action.
  • Fixed a bug that prevents the user from splitting PDF files by file sizes smaller than 1 MB.
Edited by xilopaint

Share this post


Link to post

@xilopaint I love this workflow! Thanks a ton for sharing it.

 

Quick Question: Is there any way to quickly view a selected PDF's info (e.g., total page numbers - or other info)? 

 

For example, I love using your workflow to quickly move the first page of a PDF to the end of the document. While the Slice file action (single file) is great for this, its syntax requires knowing the total number of pages in the PDF (e.g., for a 15 page PDF, it would be 2-15, 1). Is there any way to get the page count through the workflow (kMDItemNumberOfPages - I think ...)?

 

I understand that the "Get Info" default file action in Alfred can do this, but it's a little clunky to manually open/close, etc. I was just wondering if the workflow displayed this info internally.

 

If not - and, of course, assuming that you're looking for suggestions for future updates - I was thinking that it might be cool to display the relevant file information in the file action's title. So, for example, in the slice file action for a single file, it might do something like this:

  • Current Title: Enter page numbers and/or page ranges (e.g. 2, 5-8).
  • New Title: Enter page numbers and/or page ranges (e.g. 2, 5-8) of X pages.

And, for the resolution or size file actions of a single file, it might also provide their respective details. 

 

Alternatively, adding generic reference points to the end of a document might also do the trick (e.g., last, blank for remaining pages, etc.). So, users could do the same thing without knowing the actual page count (e.g., last, 2-).

 

Thanks again for posting the workflow! It's awesome!!

Edited by Jasondm007
typo

Share this post


Link to post
On 8/6/2018 at 4:30 PM, Jasondm007 said:

@xilopaint I love this workflow! Thanks a ton for sharing it.

 

Quick Question: Is there any way to quickly view a selected PDF's info (e.g., total page numbers - or other info)? 


No, currently there's no such feature in the workflow.

 

On 8/6/2018 at 4:30 PM, Jasondm007 said:

For example, I love using your workflow to quickly move the first page of a PDF to the end of the document. While the Slice file action (single file) is great for this, its syntax requires knowing the total number of pages in the PDF (e.g., for a 15 page PDF, it would be 15, 1-14). Is there any way to get the page count through the workflow (kMDItemNumberOfPages - I think ...)?

 

The input in your example moves the last page to the beginning, not the first page to the end.

 

On 8/6/2018 at 4:30 PM, Jasondm007 said:

If not - and, of course, assuming that you're looking for suggestions for future updates - I was thinking that it might be cool to display the relevant file information in the file action's title. So, for example, in the slice file action for a single file, it might do something like this:

  • Current Title: Enter page numbers and/or page ranges (e.g. 2, 5-8).
  • New Title: Enter page numbers and/or page ranges (e.g. 2, 5-8) of X pages.

And, for the resolution or size file actions of a single file, it might also provide their respective details.

 

The file actions get user input by keywords objects. Such objects can't be dynamically populated. Your suggestion is a work for script filters, but I'm not yet convinced that the additional information would worth it. The page count information in particular can be handled by the syntax in a more straightforward manner.

 

On 8/6/2018 at 4:30 PM, Jasondm007 said:

Alternatively, adding generic reference points to the end of a document might also do the trick (e.g., last, blank for remaining pages, etc.). So, users could do the same thing without knowing the actual page count (e.g., last, 2-).

 

Blank as the end of the document is a well known syntax for page ranges. It will be implemented soon.

Edited by xilopaint

Share this post


Link to post

@xilopaint - Oops - Let's chalk that one up to exhaustion. In the previous post, it should have said: 2-15, 1

 

As for the generic reference point, "blank" would be great, too. That would make the following syntax work: 2-blank, 1

 

In any event, I'm not sure if it's helpful (to you or anybody else), but I created a file action that counts the pages in the pdf and then copies the total to the clipboard. Since I wasn't sure if these things would be incorporated into your workflow, it made it easy for me to run this file action and then yours. I'm an Alfred/scripting neophyte, so the workflow comes with the usual caveats ... 

Cheers!

Edited by Jasondm007

Share this post


Link to post

Update (v2.14)

 

Improved syntax for Slice file action to allow the user to omit the second value of a page range in order to indicate that it ends on the last page of the PDF file (eg. "8-").

Edited by xilopaint

Share this post


Link to post
Posted (edited)

Update (v2.16)

 

• Fixed a bug that prevented the Crop file action to work properly on PDF files with mutiple pages.
• PyPDF2 replaced with PyPDF4.

Edited by xilopaint

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...