zeitlings Posted March 2, 2023 Posted March 2, 2023 (edited) Alfred OCR I noticed that Apple's Vision framework finally produces some usable results. This means: No external dependencies are required to perform the OCR. Alfred OCR Light The workflow allows you to copy text from images using optical character recognition. Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard. Alfred OCR+ The workflow allows you to copy text from images, or to convert PDF files into searchable PDF documents using optical character recognition, and to apply compression to PDF documents. 1 / Snapshot Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard. Default shortcut: ⌘+⇧+6 Default keyword: ocr 2 / PDF Document To convert a PDF into a searchable PDF document, pass it to the workflow’s Universal Action. To compress the resulting PDF, pass the source document on while pressing the ⌘+⇧ keys. To open the resulting PDF, pass the source document on while pressing the ⌥+⇧ keys. To force the replacement of a source document, pass it on while pressing the ⌥+⌘ keys. To compress a PDF without performing OCR, pass it to the Compress PDF Document File Action. To view the progress tracker, re-enable the workflow with the Keyword (default: ocr). Configuration To open the OCR Workflow Configuration, type the keyword preceeded by a colon (default: ocr). Languages Specify the languages you want the OCR process to consider by adding the appropriate RFC-5646 language tag. The following languages (and regions) are currently supported: en-US, fr-FR, it-IT, de-DE, es-ES, pt-BR, zh-Hans, zh-Hant, yue-Hans, yue-Hant, ko-KR, ja-JA, ru-RU, uk-UA Explanations: en-US: (English as used in the United States) de-DE: (German as used in Germany) fr-FR: (French as used in France) it-IT: (Italian as used in Italy) es-ES: (Spanish as used in Spain) pt-BR: (Portuguese as used in Brazil) ko-KR: (Korean as used in South Korea) uk-UA: (Ukrainian as used in Ukraine) ja-JA: (Japanese as used in Japan) ru-RU: (Russian as used in Russia) yue-Hant: (Traditional Cantonese) yue-Hans: (Simplified Cantonese) zh-Hant: (Traditional Chinese) zh-Hans: (Simplified Chinese) Change Log v1.4.0 (OCR+) Added bitmap compression and compression facets Added embedding strategy options "Word Granularity" attempts to embed the text word by word "Line Oriented" is the strategy previously used (use if you encounter unexpected results) Improved OCR embedding granularity Fixed 'Recognizer init' error v1.3.0 (OCR+) Added PDF compression Added a keyword for quick access to the workflow configuration (Alfred 5.1+) Added Universal Action modifier option to apply compression to PDFs (⇧⌘) Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧) Added a configuration option to open converted PDFs in the default application Added a configuration option to specify how text should be joined when taking a snapshot Added a File Action to compress PDF documents Changed the modifier keys to replace a PDF and added noticeable visual cues (⌥⌘) Changed the way an export strategy is specified by using a pop-up selection box Improved performance v1.2.3 (OCR+) Fixed an error thrown due to missing workflow cache directory Fixed snapshot tasks queuing up if they are started before the previous task has finished Added explicit opt-out of Snapshot tasks while PDF conversions are running v1.2.2 (OCR+) Fixed low contrast output images produced for some PDF documents Added progress tracker for the document recognition process Added three options to handle document output: export to location, copy to same location and replace. Priority behavior: Replace > Copy > Export. Added new icons Improved output file size for PDF documents that do not already contain text v1.2.0 (OCR Light) Add File Action to extract text from images Fix for macOS Sonoma (Compiles the script en passant to compensate for the failure to link objc symbols on macOS 14). v1.1.1 (OCR Light) Adds if #available check to accommodate macOS 12.0 v1.1.0 (OCR Light) Updated configuration and documentation Added new icon Edited November 10, 2023 by zeitlings v1.2.0 OCR Light sepulchra and xilopaint 2
vitor Posted March 2, 2023 Posted March 2, 2023 Nice! This would be a quick and useful one to add to the Gallery. Just two notes: The icon is very low resolution, it even looks pixelated in the editor. When exporting from SF Symbols you can pick the size, the recommended for workflows is 256x256px. Your repo is quite organised, but has workflows which can go in the Gallery (like this one) and others which cannot (like the dictionary workflow, due to the unsigned binary). That is OK, but because they are shared as releases (as opposed to files in the repo) it becomes harder to check for updates because GitHub only provides a unified releases feed. The more you release, the more difficult it’ll become to separate them. To be clear, posting to releases is the preferred method, just not when the repo has many unrelated workflows. Would you consider having them on their own repo, or having some files checked-in which are modified when the corresponding workflow is updated, for example? Basically the idea is to provide something which can be checked for changes. Also, may I recommend adding a Hotkey Trigger? I can see myself adding ⌘⇧6 as a natural shortcut to this one. Acidham 1
zeitlings Posted March 3, 2023 Author Posted March 3, 2023 (edited) Sure, ⌘⇧6 feels like a natural extension. There's already an updated version that also bundles a higher resolution icon. As for the dictionary workflow, it works by calling some cryptic API endpoints that are only accessible via Objective-C. Unfortunately, there is no way to do this in plain swift that I know of. I'll send you a message about the rest. Edited March 3, 2023 by zeitlings
xilopaint Posted March 5, 2023 Posted March 5, 2023 On 3/2/2023 at 1:27 PM, zeitlings said: I noticed that Apple's Vision framework finally produces some usable results. This means: OCR without external dependencies! AlfredOCR Description: The workflow allows you to copy text from images using optical character recognition. Take a snapshot with your mouse or trackpad and the recognized text is copied to the clipboard. No external dependencies are required to perform the OCR. ‣ Download on Github Nice workflow. Would it be possible to make it work in PDF files via a File Action?
zeitlings Posted March 5, 2023 Author Posted March 5, 2023 4 hours ago, xilopaint said: Nice workflow. Would it be possible to make it work in PDF files via a File Action? I guess so. My first experiments, from which the workflow is derived, were actually with PDF documents. I'll play around with that sometime. xilopaint and sepulchra 2
zeitlings Posted March 5, 2023 Author Posted March 5, 2023 Ok, here's a follow-up. I was thinking about converting PDFs to searchable PDFs by embedding a hidden text layer. Turns out PDFKit doesn't provide any access to the underlying PDF content streams at all, and no alternative way to embed text layers. At best, the information can be inserted as annotations, which are not embedded statically, but as objects that you can change at will. This is rather annoying, because the Preview app shows that PDFKit is very much capable of embedding text layers. Example: When you open a PDF with no text or an image in the Preview app, the "Live Text" feature lets you select and copy recognized text as if OCR had been fully performed. When exporting the PDF you can even enable "Embed Text", which does exactly what we're trying to accomplish here. (And they do sell it as a feature of PDFKit). Anyway, as it stands now, it's a convoluted process I haven't made sense of yet. Pulling the plain text out of PDFs without an OCR layer isn't a problem, though. But I'm not convinced how useful that is ¯\_(ツ)_/¯
sepulchra Posted March 6, 2023 Posted March 6, 2023 This would be a great addition if it was possible and thank you for the super useful workflow in the meantime. I've find this command line tool really useful for OCR on existing PDFs and have alfred set up to trigger with a workflow but obviously would be far more convenient if PDFKit was able to do the work instead.
zeitlings Posted March 7, 2023 Author Posted March 7, 2023 😱 Try this! I managed to get some acceptable results. The internal font handling and bounding box scaling works with some heuristics for now, though. Also, since there's no progress tracking and the code is completely synchronous, it's best to test the workflow on small documents. Still, the debugger will log some landmarks that you can review after the fact. @sepulchra You're welcome 🤗 "OCRmyPDF" will most likely give you better results, and should probably remain your go-to if it is already set up. But at least here's a few steps towards a native solution 😁. sepulchra 1
sepulchra Posted March 7, 2023 Posted March 7, 2023 (edited) Hey this is great. Would it be possible to have a modifier used and give the option of overwriting the existing file instead of exporting to another location? Edited March 7, 2023 by sepulchra
xilopaint Posted March 10, 2023 Posted March 10, 2023 (edited) On 3/7/2023 at 2:34 PM, zeitlings said: 😱 Try this! Wow! This is impressive and promising! Would you consider to add a suffix to the name of the OCR’d document? I think it could be an option in the User Configuration. Also, I think /tmp is not a good export location. I’d suggest to use ~/Desktop. That’s what ~/Desktop is for, so the user can later decide where to put the file. Edited March 10, 2023 by xilopaint
zeitlings Posted April 6, 2023 Author Posted April 6, 2023 @sepulchra yes! 😁 @xilopaint yes! 😁 I have given the project a little more attention, i.e. fixed some bugs, implemented some optimizations and added a proper progress tracking system. There is now a light version to keep the binary-free workflow alive, and a plus version that adds the PDF processing to it. v1.2.2 (OCR+) Fixed low contrast output images produced for some PDF documents Added progress tracker for the document recognition process Added three options to handle document output: export to location, copy to same location and replace. Priority behavior: Replace > Copy > Export. Added new icons Improved output file size for PDF documents that do not already contain text
xilopaint Posted April 7, 2023 Posted April 7, 2023 (edited) On 4/6/2023 at 8:13 PM, zeitlings said: @xilopaint yes! 😁 The new version doesn't work for me. It fails when checking for a path that does not exist: [02:20:57.247] Optical Character Recognition[Universal Action] Processing complete [02:20:57.254] Optical Character Recognition[Universal Action] Passing output '/Users/xxxx/Desktop/test.pdf' to Arg and Vars [02:20:57.257] Optical Character Recognition[Arg and Vars] Processing complete [02:20:57.258] Optical Character Recognition[Arg and Vars] Passing output '' to Run Script [02:20:57.369] STDERR: Optical Character Recognition[Run Script] 2023-04-07 02:20:57.365 ocr[2995:28153] The folder “progress.txt” doesn’t exist. [02:20:57.376] Optical Character Recognition[Run Script] Processing complete [02:20:57.378] Optical Character Recognition[Run Script] Passing output '- Error Domain=NSCocoaErrorDomain Code=4 "The folder “progress.txt” doesn’t exist." UserInfo={NSFilePath=/Users/xxxx/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr/progress.txt, NSUserStringVariant=Folder, NSUnderlyingError=0x600000e7a760 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} #0 - super: NSObject ' to Conditional [02:20:57.379] Optical Character Recognition[Conditional] Processing complete [02:20:57.380] Optical Character Recognition[Conditional] Passing output '- Error Domain=NSCocoaErrorDomain Code=4 "The folder “progress.txt” doesn’t exist." UserInfo={NSFilePath=/Users/xxxx/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr/progress.txt, NSUserStringVariant=Folder, NSUnderlyingError=0x600000e7a760 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} #0 - super: NSObject ' to Post Notification I’ve also tried to create progress.txt myself. The workflow ran with no errors but the OCR’d PDF file has not been created. Edited April 14, 2023 by xilopaint
zeitlings Posted April 7, 2023 Author Posted April 7, 2023 Ah, thanks for reporting that! Looks like the cache folder (where the temporary progress file lives) has to be manually created first. If you'd like to run a quick test, you can replace the following line in both the Run Script and Script Filter objects. if [[ -f "${alfred_workflow_cache}/progress.txt" || -v pdf_path ]]; then With [[ -d "${alfred_workflow_cache}" ]] || mkdir -p "${alfred_workflow_cache}" if [[ -f "${alfred_workflow_cache}/progress.txt" || -v pdf_path ]]; then (And make sure to delete the manually created progress.txt again. If it exists, the workflow assumes that an OCR job is running.)
zeitlings Posted April 7, 2023 Author Posted April 7, 2023 v1.2.3 • Fixed an error thrown due to missing workflow cache directory • Fixed snapshot tasks queuing up if they are started before the previous task has finished • Added explicit opt-out of Snapshot tasks while PDF conversions are running
TomBenz Posted April 8, 2023 Posted April 8, 2023 @zeitlings Thanks for sharing your workflow. The file size generated is huge. For 80 MB pdf file created from video screenshots, it is giving ocr pdf file with 1.15 GB size. Otherwise, I use the PDF exchange editor on windows and get a similar size after ocr. For your kind information and review
xilopaint Posted April 8, 2023 Posted April 8, 2023 (edited) On 4/7/2023 at 7:36 AM, zeitlings said: v1.2.3 • Fixed an error thrown due to missing workflow cache directory • Fixed snapshot tasks queuing up if they are started before the previous task has finished • Added explicit opt-out of Snapshot tasks while PDF conversions are running Thanks for the fix. I'm very impressed with the code simplicity given the quality of the text recognition. We can see Apple has done a great job with the Vision framework. Edited April 14, 2023 by xilopaint zeitlings 1
zeitlings Posted April 8, 2023 Author Posted April 8, 2023 (edited) 4 hours ago, TomBenz said: @zeitlings Thanks for sharing your workflow. The file size generated is huge. For 80 MB pdf file created from video screenshots, it is giving ocr pdf file with 1.15 GB size. Otherwise, I use the PDF exchange editor on windows and get a similar size after ocr. For your kind information and review It's true, the increase in file size can be significant. I did some comparisons with DEVONthink's OCR for your usual text-based PDF documents, which uses Abbyy's engine in the background. In some cases, I was able to get better results in terms of file size with the workflow, so I guess it can hardly be avoided. Abbyy's in-house application can export PDFs by applying MCR compression to the images, which is great and would be awesome to have access to. But that goes way beyond what I had in mind for the workflow 😂. What I find interesting is that the screenshots most likely don't have any previous text embedded in them, which should allow for a more efficient way to create the new PDF pages, and that the file size still escalates as it does. There is a difference between real-world image dimensions and device-specific metrics. Perhaps what looks like a rather modest screenshot could be a 4K ∞-DPI monstrosity? Image compression is its own subgenre, I suppose, and another rabbit hole to get lost in for sure. Maybe I'll look into it sometime in a moment of muse. You may want to take a look at @xilopaint's Alfred PDF Tools to reduce the pixel density of your document or to scale the images down before attempting the OCR. For really large documents, however, I would recommend to use, and I myself will continue use, the professional tools. Edited April 8, 2023 by zeitlings TomBenz and sepulchra 1 1
zeitlings Posted April 27, 2023 Author Posted April 27, 2023 (edited) @TomBenz I just updated the workflow to include compression. If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document! I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks: v1.3.0 (OCR+) Added PDF compression Added a keyword for quick access to the workflow configuration (Alfred 5.1+) Added Universal Action modifier option to apply compression to PDFs (⇧⌘) Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧) Added a configuration option to open converted PDFs in the default application Added a configuration option to specify how text should be joined when taking a snapshot Added a File Action to compress PDF documents Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘) Changed the way an export strategy is specified by using a pop-up selection box Improved performance Edited April 27, 2023 by zeitlings xilopaint and TomBenz 2
xilopaint Posted April 27, 2023 Posted April 27, 2023 (edited) 14 hours ago, zeitlings said: @TomBenz I just updated the workflow to include compression. If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document! I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks: v1.3.0 (OCR+) Added PDF compression Added a keyword for quick access to the workflow configuration (Alfred 5.1+) Added Universal Action modifier option to apply compression to PDFs (⇧⌘) Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧) Added a configuration option to open converted PDFs in the default application Added a configuration option to specify how text should be joined when taking a snapshot Added a File Action to compress PDF documents Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘) Changed the way an export strategy is specified by using a pop-up selection box Improved performance I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings? What kind of compression are you doing, btw? Is it similar to the work of my workflow’s optimize action you mentioned earlier? Edited April 28, 2023 by xilopaint zeitlings 1
TomBenz Posted April 28, 2023 Posted April 28, 2023 14 hours ago, zeitlings said: @TomBenz I just updated the workflow to include compression. If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document! I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks: v1.3.0 (OCR+) Added PDF compression Added a keyword for quick access to the workflow configuration (Alfred 5.1+) Added Universal Action modifier option to apply compression to PDFs (⇧⌘) Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧) Added a configuration option to open converted PDFs in the default application Added a configuration option to specify how text should be joined when taking a snapshot Added a File Action to compress PDF documents Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘) Changed the way an export strategy is specified by using a pop-up selection box Improved performance thank you @zeitlings I will test it over this weekend and update zeitlings 1
zeitlings Posted April 29, 2023 Author Posted April 29, 2023 (edited) On 4/27/2023 at 10:37 PM, xilopaint said: I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings? Thanks! 😁 The project is becoming more and more interesting for me as well. What I mean when I say that the compression utility is decoupled is just that I've rewritten the code internally so that it can be called independently of the OCR routine - as opposed to being an ad hoc change to the output content stream that writes the PDF to disk. It's still baked into the program. On 4/27/2023 at 10:37 PM, xilopaint said: What kind of compression are you doing, btw? Is it similar to the work of my workflow’s optimize action you mentioned earlier? I've checked out what it is you're doing with your optimize action. It looks like you're just tweaking the DPI/ PPI, right? (k2pdfopt <> -ui- -as -mode copy -dpi <>) Since I’ve also been checking out K2pdfopt, I came across the -bpc option btw., which reduces the number of bits per color plane. Maybe you already know about it, but this looks like it could be interesting for your implementation 😄 I am not actually touching the DPI at the moment. I am playing around with CoreImage, Quartz Filters and Image Bitmaps. Quartz Filters are something the Preview App also uses for it's "compression", btw. You can find the macOS presets under `/System/Library/Filters`. I'm sure there is a way to apply them via the command line. Edited April 29, 2023 by zeitlings
xilopaint Posted April 29, 2023 Posted April 29, 2023 13 minutes ago, zeitlings said: On 4/27/2023 at 5:37 PM, xilopaint said: I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings? Thanks! 😁 The project is becoming more and more interesting for me as well. What I mean when I say that the compression utility is decoupled is just that I've rewritten the code internally so that it can be called independently of the OCR routine - as opposed to being an ad hoc change to the output content stream that writes the PDF to disk. It's still baked into the program. I actually asked why you didn't add that piece of code to the repository.
xilopaint Posted April 29, 2023 Posted April 29, 2023 18 minutes ago, zeitlings said: I've checked out what it is you're doing with your optimize action. It looks like you're just tweaking the DPI/ PPI, right? (k2pdfopt <> -ui- -as -mode copy -dpi <>) Since I’ve also been checking out K2pdfopt, I came across the -bpc option btw., which reduces the number of bits per color plane. Maybe you already know about it, but this looks like it could be interesting for your implementation 😄 While it can often be used this way, the optimize action is not really intended to compress PDF files, but rather to improve the readability of low-quality scanned PDFs.
zeitlings Posted April 29, 2023 Author Posted April 29, 2023 9 minutes ago, xilopaint said: I actually asked why you didn't add that piece of code to the repository. The piece of code you're referring to isn't very useful on its own, and I haven't uploaded any of my more complex Xcode projects yet. There are various reasons why I haven't shared them at the moment, but I appreciate your interest.
xilopaint Posted April 29, 2023 Posted April 29, 2023 2 minutes ago, zeitlings said: The piece of code you're referring to isn't very useful on its own, and I haven't uploaded any of my more complex Xcode projects yet. There are various reasons why I haven't shared them at the moment, but I appreciate your interest. It's up to you to decide what to do with your code, as long as you're transparent. The problem here is that you use a MIT license. When it comes to Alfred's workflows, transparency is important for security reasons, of course.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now