deanishe Posted March 11, 2018 Share Posted March 11, 2018 6 minutes ago, BobRe said: or (preferably) via available GitHub resources? What exactly are "GitHub resources"? If you want a concrete answer, I think you'll have to ask a more concrete question. Link to comment
brrrrrr Posted March 11, 2018 Share Posted March 11, 2018 Sorry - meant to say free OCR options Link to comment
xilopaint Posted March 11, 2018 Author Share Posted March 11, 2018 19 hours ago, BobRe said: Sorry - meant to say free OCR options Alfred PDF Tools is embedded with k2pdfopt which is a CLI that has OCR capability and works with Tesseract Open Source OCR engine. The problem is the use of Tesseract engine with k2pdfopt requires the download of the language training files and additional configuration by the user (more info here). Trying another open source OCR engine would be an option, but we have to consider if increasing the already large workflow file size (12 MB) would worth it. Maybe the OCR feature would be a work for another workflow. In any case I'm open for suggestions. Link to comment
deanishe Posted March 11, 2018 Share Posted March 11, 2018 (edited) 1 hour ago, xilopaint said: we have to consider if increasing the already large workflow file size (12 MB) would worth it Personally, I'd say "no". An OCR engine invariably needs large language library files. Far better to define some kind of "API" for integrating the user's preferred OCR engine. Something like an OCR_COMMAND in the workflow's settings where a user can specify the command-line invocation for their preferred OCR engine with placeholders for the input and output files, e.g. OCR_COMMAND = /usr/local/bin/ocr-x --input $1 --output $2 --lang en. Edited March 11, 2018 by deanishe punctuationalisation xilopaint 1 Link to comment
dfay Posted March 12, 2018 Share Posted March 12, 2018 Yeah free OCR in my experience has always been a source of frustration in the end. Since I gave in and bought OCRKit a couple years ago I've been using OCR much more, with very few issues. It has a CLI and decent AppleScript support, & thus integrates quite easily with Alfred.... All of which is to say investing in free OCR is probably not the best approach. Link to comment
xilopaint Posted March 13, 2018 Author Share Posted March 13, 2018 (edited) 21 hours ago, dfay said: Yeah free OCR in my experience has always been a source of frustration in the end. Since I gave in and bought OCRKit a couple years ago I've been using OCR much more, with very few issues. It has a CLI and decent AppleScript support, & thus integrates quite easily with Alfred.... All of which is to say investing in free OCR is probably not the best approach. Do you know if ORCKit's recognition accuracy is comparable to some renowned OCR engines like FineReader? I don't see any mention to the CLI in the App Store. Where can I read about it? Edited March 13, 2018 by xilopaint Link to comment
dfay Posted March 13, 2018 Share Posted March 13, 2018 (edited) http://ocrkit.com/help/ I don’t know about how it’s performance compares but I’ve had very few issues with quality. Edited March 13, 2018 by dfay xilopaint 1 Link to comment
xilopaint Posted March 13, 2018 Author Share Posted March 13, 2018 51 minutes ago, dfay said: http://ocrkit.com/help/ I don’t know about how it’s performance compares but I’ve had very few issues with quality. Nice. I'm buying the app. What's your native language? I hope it works well with Portuguese. Link to comment
xilopaint Posted March 13, 2018 Author Share Posted March 13, 2018 @dfay, I have Prizmo 3 ($50 in MAS) that works poorly for Portuguese. Considering that OCRKit is only $30 in MAS I fear that its recognition accuracy for Portuguese is not so good. Could you process a single-page PDF file (sent by me) with the OCRKit Portuguese engine and send me the output? Link to comment
deanishe Posted March 14, 2018 Share Posted March 14, 2018 I use PDFScanner for OCR. €17 and it works very well (with English and German). Better than Adobe Acrobat in my experience. If you send me a page, too, @xilopaint, I'll run it through PDFScanner for you. It says it supports Spanish, so it should be able to handle your weird dialect, too. Link to comment
xilopaint Posted March 14, 2018 Author Share Posted March 14, 2018 (edited) 14 minutes ago, deanishe said: I use PDFScanner for OCR. €17 and it works very well (with English and German). Better than Adobe Acrobat in my experience. Does it have a CLI? I'm searching for an OCR app with a CLI. 14 minutes ago, deanishe said: If you send me a page, too, @xilopaint, I'll run it through PDFScanner for you. It says it supports Spanish, so it should be able to handle your weird dialect, too. May I reply here saying thank you and f*** you? Edited March 14, 2018 by xilopaint deanishe 1 Link to comment
deanishe Posted March 14, 2018 Share Posted March 14, 2018 Just now, xilopaint said: Does it has a CLI? I'm searching for an OCR app with a CLI. Don't think so, no. Sorry. 1 minute ago, xilopaint said: May I reply here saying thank you and f*** you? My apologies xilopaint 1 Link to comment
dfay Posted March 14, 2018 Share Posted March 14, 2018 Yes, sure. The company is based in Berlin fwiw. Link to comment
xilopaint Posted March 14, 2018 Author Share Posted March 14, 2018 6 hours ago, dfay said: Yes, sure. The company is based in Berlin fwiw. Thank you! I've sent you a download link by private message. Link to comment
xilopaint Posted March 20, 2018 Author Share Posted March 20, 2018 (edited) Update (v2.13): Added a Scale file action. Improved performance of the Split by File Size action. Allowed multiple PDF files selection for the Crop file action. Fixed a bug that prevents the user from splitting PDF files by file sizes smaller than 1 MB. Edited March 20, 2018 by xilopaint Link to comment
Jasondm007 Posted August 6, 2018 Share Posted August 6, 2018 (edited) @xilopaint I love this workflow! Thanks a ton for sharing it. Quick Question: Is there any way to quickly view a selected PDF's info (e.g., total page numbers - or other info)? For example, I love using your workflow to quickly move the first page of a PDF to the end of the document. While the Slice file action (single file) is great for this, its syntax requires knowing the total number of pages in the PDF (e.g., for a 15 page PDF, it would be 2-15, 1). Is there any way to get the page count through the workflow (kMDItemNumberOfPages - I think ...)? I understand that the "Get Info" default file action in Alfred can do this, but it's a little clunky to manually open/close, etc. I was just wondering if the workflow displayed this info internally. If not - and, of course, assuming that you're looking for suggestions for future updates - I was thinking that it might be cool to display the relevant file information in the file action's title. So, for example, in the slice file action for a single file, it might do something like this: Current Title: Enter page numbers and/or page ranges (e.g. 2, 5-8). New Title: Enter page numbers and/or page ranges (e.g. 2, 5-8) of X pages. And, for the resolution or size file actions of a single file, it might also provide their respective details. Alternatively, adding generic reference points to the end of a document might also do the trick (e.g., last, blank for remaining pages, etc.). So, users could do the same thing without knowing the actual page count (e.g., last, 2-). Thanks again for posting the workflow! It's awesome!! Edited August 17, 2018 by Jasondm007 typo xilopaint 1 Link to comment
xilopaint Posted August 7, 2018 Author Share Posted August 7, 2018 Hi @Jasondm007! Thank you for the feedback. I'm a bit busy for the next few days but I'll look into your suggestions once I have some free time. Link to comment
Jasondm007 Posted August 8, 2018 Share Posted August 8, 2018 @xilopaint Of course! If there's anything I can do to help, just let me know. All the best Link to comment
xilopaint Posted August 17, 2018 Author Share Posted August 17, 2018 (edited) On 8/6/2018 at 4:30 PM, Jasondm007 said: @xilopaint I love this workflow! Thanks a ton for sharing it. Quick Question: Is there any way to quickly view a selected PDF's info (e.g., total page numbers - or other info)? No, currently there's no such feature in the workflow. On 8/6/2018 at 4:30 PM, Jasondm007 said: For example, I love using your workflow to quickly move the first page of a PDF to the end of the document. While the Slice file action (single file) is great for this, its syntax requires knowing the total number of pages in the PDF (e.g., for a 15 page PDF, it would be 15, 1-14). Is there any way to get the page count through the workflow (kMDItemNumberOfPages - I think ...)? The input in your example moves the last page to the beginning, not the first page to the end. On 8/6/2018 at 4:30 PM, Jasondm007 said: If not - and, of course, assuming that you're looking for suggestions for future updates - I was thinking that it might be cool to display the relevant file information in the file action's title. So, for example, in the slice file action for a single file, it might do something like this: Current Title: Enter page numbers and/or page ranges (e.g. 2, 5-8). New Title: Enter page numbers and/or page ranges (e.g. 2, 5-8) of X pages. And, for the resolution or size file actions of a single file, it might also provide their respective details. The file actions get user input by keywords objects. Such objects can't be dynamically populated. Your suggestion is a work for script filters, but I'm not yet convinced that the additional information would worth it. The page count information in particular can be handled by the syntax in a more straightforward manner. On 8/6/2018 at 4:30 PM, Jasondm007 said: Alternatively, adding generic reference points to the end of a document might also do the trick (e.g., last, blank for remaining pages, etc.). So, users could do the same thing without knowing the actual page count (e.g., last, 2-). Blank as the end of the document is a well known syntax for page ranges. It will be implemented soon. Edited September 5, 2018 by xilopaint Link to comment
Jasondm007 Posted August 17, 2018 Share Posted August 17, 2018 (edited) @xilopaint - Oops - Let's chalk that one up to exhaustion. In the previous post, it should have said: 2-15, 1 As for the generic reference point, "blank" would be great, too. That would make the following syntax work: 2-blank, 1 In any event, I'm not sure if it's helpful (to you or anybody else), but I created a file action that counts the pages in the pdf and then copies the total to the clipboard. Since I wasn't sure if these things would be incorporated into your workflow, it made it easy for me to run this file action and then yours. I'm an Alfred/scripting neophyte, so the workflow comes with the usual caveats ... File Action: https://cl.ly/2r3E2k3p243F Cheers! Edited August 17, 2018 by Jasondm007 Link to comment
xilopaint Posted August 17, 2018 Author Share Posted August 17, 2018 (edited) Update (v2.14) Improved syntax for Slice file action to allow the user to omit the second value of a page range in order to indicate that it ends on the last page of the PDF file (eg. "8-"). Edited August 17, 2018 by xilopaint Link to comment
Jasondm007 Posted August 17, 2018 Share Posted August 17, 2018 @xilopaint Works like a charm! Thanks a ton!! Link to comment
xilopaint Posted September 5, 2018 Author Share Posted September 5, 2018 (edited) Update (v2.15) Added Keyword to change the value of the "suffix" environment variable. Edited September 5, 2018 by xilopaint Link to comment
xilopaint Posted May 4, 2019 Author Share Posted May 4, 2019 (edited) Update (v2.16) • Fixed a bug that prevented the Crop file action to work properly on PDF files with mutiple pages.• PyPDF2 replaced with PyPDF4. Edited May 4, 2019 by xilopaint cands 1 Link to comment
evanfuchs Posted September 6, 2019 Share Posted September 6, 2019 Merge Error: "Cannot merge a malformed PDF file" I am suddenly getting this error when I attempt to merge PDF files. (v2.16) Any ideas? Link to comment
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now