AlfredOCR - Optical Character Recognition

zeitlings · March 2, 2023

Alfred OCR

I noticed that Apple's Vision framework finally produces some usable results.

This means: No external dependencies are required to perform the OCR.

ocr.png.065f1bf3ffaeba1388e3dcdf944420f0.png Alfred OCR Light

The workflow allows you to copy text from images using optical character recognition.

Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard.

Alfred OCR+

The workflow allows you to copy text from images, or to convert PDF files into searchable PDF documents
using optical character recognition, and to apply compression to PDF documents.

1 / Snapshot
Take a snapshot with your mouse or trackpad to automatically copy the recognized text to the clipboard.

Default shortcut: ⌘+⇧+6
Default keyword: ocr

2 / PDF Document

To convert a PDF into a searchable PDF document, pass it to the workflow’s Universal Action.
- To compress the resulting PDF, pass the source document on while pressing the ⌘+⇧ keys.
- To open the resulting PDF, pass the source document on while pressing the ⌥+⇧ keys.
- To force the replacement of a source document, pass it on while pressing the ⌥+⌘ keys.
To compress a PDF without performing OCR, pass it to the Compress PDF Document File Action.
To view the progress tracker, re-enable the workflow with the Keyword (default: ocr).

Configuration

To open the OCR Workflow Configuration, type the keyword preceeded by a colon (default: ocr).

Languages

Specify the languages you want the OCR process to consider by adding the appropriate RFC-5646 language tag. The following languages (and regions) are currently supported: en-US, fr-FR, it-IT, de-DE, es-ES, pt-BR, zh-Hans, zh-Hant, yue-Hans, yue-Hant, ko-KR, ja-JA, ru-RU, uk-UA

Explanations:

en-US: (English as used in the United States)
de-DE: (German as used in Germany)
fr-FR: (French as used in France)
it-IT: (Italian as used in Italy)
es-ES: (Spanish as used in Spain)
pt-BR: (Portuguese as used in Brazil)
ko-KR: (Korean as used in South Korea)
uk-UA: (Ukrainian as used in Ukraine)
ja-JA: (Japanese as used in Japan)
ru-RU: (Russian as used in Russia)
yue-Hant: (Traditional Cantonese)
yue-Hans: (Simplified Cantonese)
zh-Hant: (Traditional Chinese)
zh-Hans: (Simplified Chinese)

Change Log

v1.4.0 (OCR+)

Added bitmap compression and compression facets
Added embedding strategy options
- "Word Granularity" attempts to embed the text word by word
- "Line Oriented" is the strategy previously used (use if you encounter unexpected results)
Improved OCR embedding granularity
Fixed 'Recognizer init' error

v1.3.0 (OCR+)

Added PDF compression
Added a keyword for quick access to the workflow configuration (Alfred 5.1+)
Added Universal Action modifier option to apply compression to PDFs (⇧⌘)
Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)
Added a configuration option to open converted PDFs in the default application
Added a configuration option to specify how text should be joined when taking a snapshot
Added a File Action to compress PDF documents
Changed the modifier keys to replace a PDF and added noticeable visual cues (⌥⌘)
Changed the way an export strategy is specified by using a pop-up selection box
Improved performance

v1.2.3 (OCR+)

Fixed an error thrown due to missing workflow cache directory
Fixed snapshot tasks queuing up if they are started before the previous task has finished
Added explicit opt-out of Snapshot tasks while PDF conversions are running

v1.2.2 (OCR+)

Fixed low contrast output images produced for some PDF documents
Added progress tracker for the document recognition process
Added three options to handle document output: export to location, copy to same location and replace. Priority behavior: Replace > Copy > Export.
Added new icons
Improved output file size for PDF documents that do not already contain text

v1.2.0 (OCR Light)

Add File Action to extract text from images
Fix for macOS Sonoma (Compiles the script en passant to compensate for the failure to link objc symbols on macOS 14).

v1.1.1 (OCR Light)

Adds if #available check to accommodate macOS 12.0

v1.1.0 (OCR Light)

Updated configuration and documentation
Added new icon

Edited November 10, 2023 by zeitlings
v1.2.0 OCR Light

vitor · March 2, 2023

Nice! This would be a quick and useful one to add to the Gallery. Just two notes:

The icon is very low resolution, it even looks pixelated in the editor. When exporting from SF Symbols you can pick the size, the recommended for workflows is 256x256px.

Your repo is quite organised, but has workflows which can go in the Gallery (like this one) and others which cannot (like the dictionary workflow, due to the unsigned binary). That is OK, but because they are shared as releases (as opposed to files in the repo) it becomes harder to check for updates because GitHub only provides a unified releases feed. The more you release, the more difficult it’ll become to separate them. To be clear, posting to releases is the preferred method, just not when the repo has many unrelated workflows.

Would you consider having them on their own repo, or having some files checked-in which are modified when the corresponding workflow is updated, for example? Basically the idea is to provide something which can be checked for changes.

Also, may I recommend adding a Hotkey Trigger? I can see myself adding ⌘⇧6 as a natural shortcut to this one.

zeitlings · March 3, 2023

Sure, ⌘⇧6 feels like a natural extension. There's already an updated version that also bundles a higher resolution icon. As for the dictionary workflow, it works by calling some cryptic API endpoints that are only accessible via Objective-C. Unfortunately, there is no way to do this in plain swift that I know of.

I'll send you a message about the rest.

Edited March 3, 2023 by zeitlings

xilopaint · March 5, 2023

On 3/2/2023 at 1:27 PM, zeitlings said:

I noticed that Apple's Vision framework finally produces some usable results. This means: OCR without external dependencies!

AlfredOCR

Description: The workflow allows you to copy text from images using optical character recognition. Take a snapshot with your mouse or trackpad and the recognized text is copied to the clipboard. No external dependencies are required to perform the OCR.

‣ Download on Github

Nice workflow. Would it be possible to make it work in PDF files via a File Action?

zeitlings · March 5, 2023

4 hours ago, xilopaint said:

Nice workflow. Would it be possible to make it work in PDF files via a File Action?

I guess so. My first experiments, from which the workflow is derived, were actually with PDF documents. I'll play around with that sometime.

zeitlings · March 5, 2023

Ok, here's a follow-up. I was thinking about converting PDFs to searchable PDFs by embedding a hidden text layer. Turns out PDFKit doesn't provide any access to the underlying PDF content streams at all, and no alternative way to embed text layers. At best, the information can be inserted as annotations, which are not embedded statically, but as objects that you can change at will.

This is rather annoying, because the Preview app shows that PDFKit is very much capable of embedding text layers. Example: When you open a PDF with no text or an image in the Preview app, the "Live Text" feature lets you select and copy recognized text as if OCR had been fully performed. When exporting the PDF you can even enable "Embed Text", which does exactly what we're trying to accomplish here. (And they do sell it as a feature of PDFKit).

Anyway, as it stands now, it's a convoluted process I haven't made sense of yet.

Pulling the plain text out of PDFs without an OCR layer isn't a problem, though. But I'm not convinced how useful that is ¯\_(ツ)_/¯

sepulchra · March 6, 2023

This would be a great addition if it was possible and thank you for the super useful workflow in the meantime.

I've find this command line tool really useful for OCR on existing PDFs and have alfred set up to trigger with a workflow but obviously would be far more convenient if PDFKit was able to do the work instead.

zeitlings · March 7, 2023

😱 Try this!

I managed to get some acceptable results. The internal font handling and bounding box scaling works with some heuristics for now, though.

Also, since there's no progress tracking and the code is completely synchronous, it's best to test the workflow on small documents. Still, the debugger will log some landmarks that you can review after the fact.

@sepulchra You're welcome 🤗 "OCRmyPDF" will most likely give you better results, and should probably remain your go-to if it is already set up. But at least here's a few steps towards a native solution 😁.

sepulchra · March 7, 2023

Hey this is great. Would it be possible to have a modifier used and give the option of overwriting the existing file instead of exporting to another location?

Edited March 7, 2023 by sepulchra

xilopaint · March 10, 2023

On 3/7/2023 at 2:34 PM, zeitlings said:

😱 Try this!

Wow! This is impressive and promising!

Would you consider to add a suffix to the name of the OCR’d document? I think it could be an option in the User Configuration. Also, I think /tmp is not a good export location. I’d suggest to use ~/Desktop. That’s what ~/Desktop is for, so the user can later decide where to put the file.

Edited March 10, 2023 by xilopaint

zeitlings · April 6, 2023

@sepulchra yes! 😁

@xilopaint yes! 😁

I have given the project a little more attention, i.e. fixed some bugs, implemented some optimizations and added a proper progress tracking system.

There is now a light version to keep the binary-free workflow alive, and a plus version that adds the PDF processing to it.

v1.2.2 (OCR+)

Fixed low contrast output images produced for some PDF documents
Added progress tracker for the document recognition process
Added three options to handle document output: export to location, copy to same location and replace. Priority behavior: Replace > Copy > Export.
Added new icons
Improved output file size for PDF documents that do not already contain text

xilopaint · April 7, 2023

On 4/6/2023 at 8:13 PM, zeitlings said:

@xilopaint yes! 😁

The new version doesn't work for me. It fails when checking for a path that does not exist:

[02:20:57.247] Optical Character Recognition[Universal Action] Processing complete
[02:20:57.254] Optical Character Recognition[Universal Action] Passing output '/Users/xxxx/Desktop/test.pdf' to Arg and Vars
[02:20:57.257] Optical Character Recognition[Arg and Vars] Processing complete
[02:20:57.258] Optical Character Recognition[Arg and Vars] Passing output '' to Run Script
[02:20:57.369] STDERR: Optical Character Recognition[Run Script] 2023-04-07 02:20:57.365 ocr[2995:28153] The folder “progress.txt” doesn’t exist.
[02:20:57.376] Optical Character Recognition[Run Script] Processing complete
[02:20:57.378] Optical Character Recognition[Run Script] Passing output '- Error Domain=NSCocoaErrorDomain Code=4 "The folder “progress.txt” doesn’t exist." UserInfo={NSFilePath=/Users/xxxx/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr/progress.txt, NSUserStringVariant=Folder, NSUnderlyingError=0x600000e7a760 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} #0
  - super: NSObject
' to Conditional
[02:20:57.379] Optical Character Recognition[Conditional] Processing complete
[02:20:57.380] Optical Character Recognition[Conditional] Passing output '- Error Domain=NSCocoaErrorDomain Code=4 "The folder “progress.txt” doesn’t exist." UserInfo={NSFilePath=/Users/xxxx/Library/Caches/com.runningwithcrayons.Alfred/Workflow Data/com.zeitlings.ocr/progress.txt, NSUserStringVariant=Folder, NSUnderlyingError=0x600000e7a760 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}} #0
  - super: NSObject
' to Post Notification

I’ve also tried to create progress.txt myself. The workflow ran with no errors but the OCR’d PDF file has not been created.

Edited April 14, 2023 by xilopaint

zeitlings · April 7, 2023

Ah, thanks for reporting that! Looks like the cache folder (where the temporary progress file lives) has to be manually created first.

If you'd like to run a quick test, you can replace the following line in both the Run Script and Script Filter objects.

if [[ -f "${alfred_workflow_cache}/progress.txt" || -v pdf_path ]]; then

With

[[ -d "${alfred_workflow_cache}" ]] || mkdir -p "${alfred_workflow_cache}"
if [[ -f "${alfred_workflow_cache}/progress.txt" || -v pdf_path ]]; then

(And make sure to delete the manually created progress.txt again. If it exists, the workflow assumes that an OCR job is running.)

zeitlings · April 7, 2023

v1.2.3
• Fixed an error thrown due to missing workflow cache directory
• Fixed snapshot tasks queuing up if they are started before the previous task has finished
• Added explicit opt-out of Snapshot tasks while PDF conversions are running

TomBenz · April 8, 2023

@zeitlings Thanks for sharing your workflow. The file size generated is huge. For 80 MB pdf file created from video screenshots, it is giving ocr pdf file with 1.15 GB size. Otherwise, I use the PDF exchange editor on windows and get a similar size after ocr. For your kind information and review

xilopaint · April 8, 2023

On 4/7/2023 at 7:36 AM, zeitlings said:

v1.2.3
• Fixed an error thrown due to missing workflow cache directory
• Fixed snapshot tasks queuing up if they are started before the previous task has finished
• Added explicit opt-out of Snapshot tasks while PDF conversions are running

Thanks for the fix. I'm very impressed with the code simplicity given the quality of the text recognition. We can see Apple has done a great job with the Vision framework.

Edited April 14, 2023 by xilopaint

zeitlings · April 8, 2023

4 hours ago, TomBenz said:

@zeitlings Thanks for sharing your workflow. The file size generated is huge. For 80 MB pdf file created from video screenshots, it is giving ocr pdf file with 1.15 GB size. Otherwise, I use the PDF exchange editor on windows and get a similar size after ocr. For your kind information and review

It's true, the increase in file size can be significant. I did some comparisons with DEVONthink's OCR for your usual text-based PDF documents, which uses Abbyy's engine in the background. In some cases, I was able to get better results in terms of file size with the workflow, so I guess it can hardly be avoided. Abbyy's in-house application can export PDFs by applying MCR compression to the images, which is great and would be awesome to have access to. But that goes way beyond what I had in mind for the workflow 😂.

What I find interesting is that the screenshots most likely don't have any previous text embedded in them, which should allow for a more efficient way to create the new PDF pages, and that the file size still escalates as it does. There is a difference between real-world image dimensions and device-specific metrics. Perhaps what looks like a rather modest screenshot could be a 4K ∞-DPI monstrosity?

Image compression is its own subgenre, I suppose, and another rabbit hole to get lost in for sure. Maybe I'll look into it sometime in a moment of muse.

You may want to take a look at @xilopaint's Alfred PDF Tools to reduce the pixel density of your document or to scale the images down before attempting the OCR. For really large documents, however, I would recommend to use, and I myself will continue use, the professional tools.

Edited April 8, 2023 by zeitlings

zeitlings · April 27, 2023

@TomBenz I just updated the workflow to include compression.

If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document!

I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks:

v1.3.0 (OCR+)

Added PDF compression
Added a keyword for quick access to the workflow configuration (Alfred 5.1+)
Added Universal Action modifier option to apply compression to PDFs (⇧⌘)
Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)
Added a configuration option to open converted PDFs in the default application
Added a configuration option to specify how text should be joined when taking a snapshot
Added a File Action to compress PDF documents
Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘)
Changed the way an export strategy is specified by using a pop-up selection box
Improved performance

Edited April 27, 2023 by zeitlings

xilopaint · April 27, 2023

14 hours ago, zeitlings said:

@TomBenz I just updated the workflow to include compression.

If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document!

I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks:

v1.3.0 (OCR+)

Added PDF compression

Added a keyword for quick access to the workflow configuration (Alfred 5.1+)

Added Universal Action modifier option to apply compression to PDFs (⇧⌘)

Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)

Added a configuration option to open converted PDFs in the default application

Added a configuration option to specify how text should be joined when taking a snapshot

Added a File Action to compress PDF documents

Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘)

Changed the way an export strategy is specified by using a pop-up selection box

Improved performance

I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings?

What kind of compression are you doing, btw? Is it similar to the work of my workflow’s optimize action you mentioned earlier?

Edited April 28, 2023 by xilopaint

TomBenz · April 28, 2023

14 hours ago, zeitlings said:

@TomBenz I just updated the workflow to include compression.

If you remember the 1.2 MB sample you sent me, it blew up to 7.9 MB uncompressed. With compression, we level off at 589 KB on my machine without butchering the file 😁. However, compressing a PDF does not always result in a smaller file size, so I've decided to keep the uncompressed PDF document around if compression does not actually produce the desired result. Let me know what happens to the 1.15 GB document!

I've decoupled the compression utility, which means you can now compress any PDF without having to run OCR on it. For some documents this works really well, for others it may inflate the file size even more. Which factors play a role here remains a task for future experiments to find out... There are also some internal improvements, additions to the workflow and tweaks:

v1.3.0 (OCR+)

Added PDF compression

Added a keyword for quick access to the workflow configuration (Alfred 5.1+)

Added Universal Action modifier option to apply compression to PDFs (⇧⌘)

Added Universal Action modifier option to open converted PDFs in the default application (⌥⇧)

Added a configuration option to open converted PDFs in the default application

Added a configuration option to specify how text should be joined when taking a snapshot

Added a File Action to compress PDF documents

Changed the modifier keys to replace a PDF and added noticeable visual queues (⌥⌘)

Changed the way an export strategy is specified by using a pop-up selection box

Improved performance

thank you @zeitlings I will test it over this weekend and update

zeitlings · April 29, 2023

On 4/27/2023 at 10:37 PM, xilopaint said:

I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings?

Thanks! 😁 The project is becoming more and more interesting for me as well. What I mean when I say that the compression utility is decoupled is just that I've rewritten the code internally so that it can be called independently of the OCR routine - as opposed to being an ad hoc change to the output content stream that writes the PDF to disk. It's still baked into the program.

On 4/27/2023 at 10:37 PM, xilopaint said:

What kind of compression are you doing, btw? Is it similar to the work of my workflow’s optimize action you mentioned earlier?

I've checked out what it is you're doing with your optimize action. It looks like you're just tweaking the DPI/ PPI, right? (k2pdfopt <> -ui- -as -mode copy -dpi <>) Since I’ve also been checking out K2pdfopt, I came across the -bpc option btw., which reduces the number of bits per color plane. Maybe you already know about it, but this looks like it could be interesting for your implementation 😄

I am not actually touching the DPI at the moment. I am playing around with CoreImage, Quartz Filters and Image Bitmaps.

Quartz Filters are something the Preview App also uses for it's "compression", btw. You can find the macOS presets under `/System/Library/Filters`. I'm sure there is a way to apply them via the command line.

Edited April 29, 2023 by zeitlings

xilopaint · April 29, 2023

13 minutes ago, zeitlings said:

On 4/27/2023 at 5:37 PM, xilopaint said:

I'm absolutely in love with this workflow. I like everything, even the icon color tone is appealing to me. Haven't you committed the code for the compression feature @zeitlings?

Thanks! 😁 The project is becoming more and more interesting for me as well. What I mean when I say that the compression utility is decoupled is just that I've rewritten the code internally so that it can be called independently of the OCR routine - as opposed to being an ad hoc change to the output content stream that writes the PDF to disk. It's still baked into the program.

I actually asked why you didn't add that piece of code to the repository.

xilopaint · April 29, 2023

18 minutes ago, zeitlings said:

I've checked out what it is you're doing with your optimize action. It looks like you're just tweaking the DPI/ PPI, right? (k2pdfopt <> -ui- -as -mode copy -dpi <>) Since I’ve also been checking out K2pdfopt, I came across the -bpc option btw., which reduces the number of bits per color plane. Maybe you already know about it, but this looks like it could be interesting for your implementation 😄

While it can often be used this way, the optimize action is not really intended to compress PDF files, but rather to improve the readability of low-quality scanned PDFs.

zeitlings · April 29, 2023

9 minutes ago, xilopaint said:

I actually asked why you didn't add that piece of code to the repository.

The piece of code you're referring to isn't very useful on its own, and I haven't uploaded any of my more complex Xcode projects yet. There are various reasons why I haven't shared them at the moment, but I appreciate your interest.

xilopaint · April 29, 2023

2 minutes ago, zeitlings said:

The piece of code you're referring to isn't very useful on its own, and I haven't uploaded any of my more complex Xcode projects yet. There are various reasons why I haven't shared them at the moment, but I appreciate your interest.

It's up to you to decide what to do with your code, as long as you're transparent. The problem here is that you use a MIT license. When it comes to Alfred's workflows, transparency is important for security reasons, of course.

AlfredOCR - Optical Character Recognition

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

zeitlings

zeitlings

zeitlings

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Create an account or sign in to comment

Create an account

Sign in