jmm28260 Posted January 1, 2019 Posted January 1, 2019 I am trying to automate language recognition of OCR'eds pdf in order to flag the file accordingly. Would Alfred be able to face the challenge with a workflow ? Any help welcome. Thanks.
deanishe Posted January 1, 2019 Posted January 1, 2019 (edited) Do you want to manually select the files to process or otherwise manually trigger whatever program you're using for language recognition? Edited January 1, 2019 by deanishe
jmm28260 Posted January 1, 2019 Author Posted January 1, 2019 The file is handled through an Automator workflow, and after OCR I would like its language to be identified and flagged accordingly to be later dispatched to the appropriate folder.
deanishe Posted January 1, 2019 Posted January 1, 2019 Then I don't really see how Alfred can help you in any way. What were you expecting Alfred to do?
jmm28260 Posted January 1, 2019 Author Posted January 1, 2019 I was hoping to automate the pdf text language identification. Would Applescript be able to do it ? Thanks for your time and efforts, anyway.
deanishe Posted January 1, 2019 Posted January 1, 2019 (edited) 17 minutes ago, jmm28260 said: I was hoping to automate the pdf text language identification I understand that. I don't understand which part of that you thought Alfred might be able to do. 17 minutes ago, jmm28260 said: Would Applescript be able to do it ? No. You're trying to process PDFs, so you need to start with something that can actually understand PDF files. If the file has metadata, you can extract the language from that with exiftool, for example. In a shell, that would look like: exiftool -t -Language /path/to/file.pdf | cut -d$'\t' -f2 Edited January 1, 2019 by deanishe
jmm28260 Posted January 1, 2019 Author Posted January 1, 2019 Here is the Applescript I use: try tell application "FineReader" activate open theFile tell application "System Events" tell process "FineReader" tell menu bar 1 tell menu "Document" tell menu item "OCR Text Recognition" tell menu 1 click menu item "Recognize Text Using OCR..." end tell end tell end tell end tell keystroke return end tell end tell save the front document close the front document close application "FineReader" end tell end try Can I add a few lines to extract the metadata and identify the language ?
deanishe Posted January 1, 2019 Posted January 1, 2019 I have no idea. You're asking questions about FineReader, so you should ask on a FineReader-related forum, where there are people who know the software.
CJK Posted January 3, 2019 Posted January 3, 2019 On 1/1/2019 at 4:21 PM, jmm28260 said: Here is the Applescript I use: try tell application "FineReader" activate open theFile tell application "System Events" tell process "FineReader" tell menu bar 1 tell menu "Document" tell menu item "OCR Text Recognition" tell menu 1 click menu item "Recognize Text Using OCR..." end tell end tell end tell end tell keystroke return end tell end tell save the front document close the front document close application "FineReader" end tell end try Please don't nest tell application blocks inside each other when you don't need to (which is basically ever). It's bad in many ways. You should group things together so that the stuff FineReader does goes only inside a tell app "FineReader" block. The System Events stuff goes separately inside its own block, etc. Anyway, you might not need to use that script. You said you were doing this in Automator. Automator has an action specifically for extracting text from a PDF file: Once you have plain text, detecting the language with AppleSript is pretty easy. Say you store your string in an AppleScript variable called input_string, then the following code will return the language code for what it believes to be the most likely, dominant language that makes up the string (e.g. "en" for English, "de" for German, etc.): use framework "Foundation" property this : a reference to current application property NSLinguisiticTagger : a reference to NSLinguisiticTagger of this NSLinguisticTagger's dominantLanguageForString:input_string result as text deanishe 1
jmm28260 Posted January 3, 2019 Author Posted January 3, 2019 Thank you so much for your time and efforts. I tried your Applescript on a pdf file, and it didn't work. I guess I probably messed it up. What I am trying to do, is have the workflow read the page, get the title, and add to it the resulting text of your Applescript, i.e.Title +EN if the text is english or +FR if it is in french for instance. At the following link you will find the messed up Automator workflow. If you have a minute, can you help ? Many thanks. https://www.dropbox.com/sh/uno2vet0doe7t0b/AAAlwDqqlzkeOMT0b7dNIEOTa?dl=0
deanishe Posted January 3, 2019 Posted January 3, 2019 13 hours ago, CJK said: use framework "Foundation" property this : a reference to current application property NSLinguisiticTagger : a reference to NSLinguisiticTagger of this NSLinguisticTagger's dominantLanguageForString:input_string result as text This is extremely cool. I had no idea it's so easy to use macOS's language identification.
deanishe Posted January 3, 2019 Posted January 3, 2019 6 hours ago, jmm28260 said: I guess I probably messed it up Yeah, it's a bit of a mess. You keep setting Titre to various filepaths, not the PDF's title and you're using CJK's AppleScript incorrectly. You can't just copy-and-paste it (the variable input_string doesn't exist). You have to read the contents of the text file you create and pass that to the script. I tried to fix it myself, but I couldn't because I'm only able to use programming languages that aren't completely stupid while this requires AppleScript. Perhaps CJK can fix it. CJK 1
CJK Posted January 4, 2019 Posted January 4, 2019 21 hours ago, jmm28260 said: At the following link you will find the messed up Automator workflow. If you have a minute, can you help ? Well, that took much more than just a minute. Partly because your workflow was just kinda ugh, and partly because Objective-C is really bloody annoying on occasion, and the AppleScript needed to be re-written to cope with multiple file inputs, and because Automator can't do repeat loops by itself. Here's a screenshot of the Automator workflow, which now only has four actions: The modified AppleScript for use in the Run AppleScript action is below. Largely, the screenshot and script are for the benefit of anyone viewing this post at a later date, from which they can piece together the workflow themselves because I still haven't set up a base for storing permanent links to fileshares, so this one will be temporary: Append Language To Name of PDF File.workflow.zip use framework "Foundation" property this : a reference to current application property NSFileManager : a reference to NSFileManager of this property NSLinguisticTagger : a reference to NSLinguisticTagger of this property NSString : a reference to NSString of this property nil : a reference to missing value on run [fs, null] script fileURLs property list : fs end script set FileManager to NSFileManager's defaultManager() repeat with f in the list of fileURLs try set lang to "_" & ((NSLinguisticTagger's ¬ dominantLanguageForString:(NSString's ¬ stringWithContentsOfURL:f)) as text) set basename to (NSString's stringWithString:(f's ¬ POSIX path))'s stringByDeletingPathExtension() set oldname to (basename's ¬ stringByAppendingPathExtension:"pdf") set newname to ((basename's ¬ stringByAppendingString:lang)'s ¬ stringByAppendingPathExtension:"pdf") tell the FileManager to moveItemAtPath:oldname ¬ toPath:newname |error|:nil -- Rename PDF file tell the FileManager to trashItemAtURL:f ¬ resultingItemURL:nil |error|:nil -- Delete text file end try end repeat end run jmm28260 1
jmm28260 Posted January 4, 2019 Author Posted January 4, 2019 Waouh! You've done a terrific job! Thanks so much; I really appreciate your help. I've run the workflow, it works fine, but unfortunately, the name comes out unchanged, without the language specification. I'm sure you've checked it, so is there something I missed ?
jmm28260 Posted January 4, 2019 Author Posted January 4, 2019 Here is the message I get when I try to run it:
deanishe Posted January 4, 2019 Posted January 4, 2019 Ah crap. I was hoping to learn how to read files in AppleScript (which was the bit that had me tearing my hair out), but you used ObjC instead. Still, that’s also a perfectly fine solution.
CJK Posted January 5, 2019 Posted January 5, 2019 19 hours ago, jmm28260 said: it works fine, but unfortunately, the name comes out unchanged, without the language specification. I'm sure you've checked it, so is there something I missed ? As a tip to help get your problems solved faster, it's typically not very useful when you state that something doesn't work, then ask if I know what's wrong. I need a lot more information than that to be able to diagnose the potential issues. The dialog, whilst a sensible thing to screenshot and share, is sadly not especially helpful in this instance, but that's Automator's fault, not yours. It gives the vaguest errors, by informing you that there's a problem, and then assuming we all love needles in haystacks. What you should do first is edit the AppleScript and remove the line that says try, and remove the line that says end try. Then don't forget to press the hammer icon button, which recompiles the AppleScript code (do this whenever you make an edit to the code and before you run the workflow again). Then, what I need to know are the following: What version of macOS you're running ? What happens inside Automator itself when you run the workflow ? Which actions are completed successfully, with green ticks by them ? Which actions fail to complete, particularly which one fails first ? Of the actions that complete with green ticks, do any of them in the results section have an empty output ? Did you set the value of the directory variable to an appropriate value ? Did you change the location where the text files generated from the PDFs get saved ?
jmm28260 Posted January 5, 2019 Author Posted January 5, 2019 My Mac Os is 10.14.2 I removed the lines with try and end try and followed all your instructions. All actions get results, but they are all with the .txt file. No pdf file appears in any action results except for the first one. And I don't get any pdf file in the directory mentioning the language when the workflow is completed. The last result is the .txt file without mention of the language. I did not change anything, except for the directory and they keep the same location. Hope this helps.
CJK Posted January 7, 2019 Posted January 7, 2019 @jmm28260 OK, thanks for this. Would you mind uploading your workflow as it is and I'll take a look at it on my system, otherwise we could spend days going back and forth here. I think that'll be easiest for us both.
jmm28260 Posted January 7, 2019 Author Posted January 7, 2019 Here is the link: https://www.dropbox.com/sh/t0kw204fgnc5948/AACZn0__tuEwjFfHXvWzyzmHa?dl=0 And again, many thanks for your time and efforts. I really appreciate your assistance.
CJK Posted January 7, 2019 Posted January 7, 2019 On 1/5/2019 at 10:39 AM, jmm28260 said: I did not change anything, except for the directory and they keep the same location. So that was a lie: In the first action, you have the search folder set to 1. OCR. In the second action, you changed the output folder to Desktop. Now compare it with the workflow that I sent to you: Both the search directory and the output directory are the same, and I created a dedicated variable for it that holds the path. It's called directory and it's stored with the Automator variables that are accessible at the bottom of the window by clicking one of the buttons down there. Double-click the directory variable, and you can set the path through this, which will set both the search path, and the output path. [ PS. You can actually delete the Set Variable action and fileList variable. That variable doesn't end up being used, so it and the action that creates it are superfluous. ]
jmm28260 Posted January 7, 2019 Author Posted January 7, 2019 Sorry for that. My mistake, I sent you the wrong workflow. But the problem remains as you can see:
CJK Posted January 7, 2019 Posted January 7, 2019 5 hours ago, jmm28260 said: Sorry for that. My mistake, I sent you the wrong workflow. But the problem remains as you can see: No, I can't see, because you cut off the top of the workflow, so only the output folder was visible. Anyway, let's try again. Please upload your correct workflow, and I'll have another look.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now