CJK

gr8 reacted to a post in a topic: Can Alfred search the Mac Keychain December 2, 2020

aleone89 reacted to a post in a topic: Mapping arguments in Workflows May 15, 2019

CJK changed their profile photo February 25, 2019

manavortex reacted to a post in a topic: Edit clipboard before pasting January 23, 2019

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

That's good to hear. However, it was still an error that occurred in one file, so does require fixing, as there may be other PDFs that have blank pages. I've updated the file on GitHub with a fix for that, and a couple of other minor adjustments that I also realised were potential weak spots in those sorts of one-off situations. No idea I'm afraid. That one will be down to the language tagging function, which isn't something I'm able to change. Mojave does actually have a newer language detection class called NLLanguageRecognizer, which I'm uncertain what the differences are between that and the one my script uses. But I am running High Sierra, so don't have access to the newer one in order to try it out. Without knowing the content of the PDFs that you were expecting to be tagged as being Italian, I can't judge any probable things to investigate. But, you could always try increasing the number of pages that are sampled in each PDF. There's a line in the first section of the script where all the properties are declared: property samples : 4 If you increase this number, the script will use more pages to identify the language. Just in case you're interested, the current method of decision making for the language is done by detecting the language on each sampled page. If different pages produce different results, then the script chooses the language that was tagged in the most number of pages from the sample. So if your PDF had a mixture of Italian and English in it, and the sample pages happened to feature more densely occurring English than Italian, it would tag the file overall as English. The other way I contemplated doing it was combining the sample pages together into one long piece of text, and detecting the language the that as if it were a single page. That would always produce one affirmative result. Then, of course, the last possibility is to remove the sampling altogether and just assess the entire PDF file in one go. However, I avoided that because some of my PDFs are quite large, and if the script had to convert many PDFs each with hundreds of pages, that's obviously going to slow it done compared to just sampling 4 pages from each PDF without necessarily producing better results. Anyway, if you have a particular preference over which method should be used, just let me know and I can change the implementation quite easily.

jmm28260 reacted to a post in a topic: Workflow to identify language in pdf January 22, 2019

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

Would you mind doing me a favour and just sending me the PDFs that you are using for testing purposes ? That way, I can see what sorts of content you're dealing with. This latest error, whilst very similar in nature to the one affecting the Title of the PDF document, seems odd that it would arise when dealing with a page from the PDF. It implies that there was no language detected, which in turn implies the page had no text. I can perhaps believe this might be the case for one random PDF file, but if you're presumably testing different files, then the others ought to work.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

Why did you post the same problem from earlier ? That one has been dealt with. You've posted a screen shot that shows you're using an older version of the script.

CJK reacted to a post in a topic: [Solved] Output of Run Script (osascript) action terminates with line break January 22, 2019

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

Change these lines: set PDFTitleLang to (NSLinguisticTagger's ¬ dominantLanguageForString:PDFTitle) langs's setValue:1 forKey:PDFTitleLang to this: tell PDFTitle to if missing value ≠ it then tell ¬ (NSLinguisticTagger's dominantLanguageForString:it) to if ¬ missing value ≠ it then langs's setValue:1 forKey:it File updated on GitHub.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

*Sigh* My guess here is that the property of the PDF's attributes are specific to ones locale, and so your metadata item that holds the document's title is likely not called Title, but possible Titre. To confirm my hunch, you could change this line: set PDFTitle to the PDF's documentAttributes()'s |Title| as text to this: error PDF's documentAttributes() as record which will show you what the attribute names and values are in the error dialog that pops up. It's also possible that the specific metadata item for the document title wasn't filled in for the particular document you're testing the script on, as they are optional. In both of these scenarios, the fix for it should be to change that line I just highlighted above to this: set PDFTitle to PDF's documentAttributes()'s ¬ objectForKey:(PDFDocumentTitleAttribute of this) Using the PDFDocumentTitleAttribute constant should return a value appropriate for your system; and using objectForKey to access the property as an NSDictionary rather than as a property that is accessed through a method will handle properties that don't exist a lot more gracefully by returning missing value. I've already made the change and updated the script file on GitHub.

solved [Solved] Output of Run Script (osascript) action terminates with line break

CJK posted a topic in Workflow Help & Questions

The Run Script action, particularly relevant to osascript, appears to return the result of the script as text that is terminated with a line break character. This causes problems when the script returns a file path, which gets fed into the subsequent node of the workflow that is expecting to receive a file path that points to an existing file, but instead receives a file path that points to a non-existent file because of the extra character in its name. https://transfer.sh/LEmG3/Bug Report.alfredworkflow The workflow demonstrates the phenomenon. It simply displays an osascript output surrounded by quotes to visualise the string in its entirety. The code used in the `osascript` is: on run return "~/Downloads" end run This might not technically be a bug, per se, as I'm guessing the Run Script actions all execute a script by way of a shell command, and therefore it may be that the osascript command itself is returning this output rather than something Alfred is doing. That said, when I run the following command in Terminal: osascript -e 'return "~/Downloads"' | wc -l it reports that the output does only contain a single line.

January 22, 2019
2 replies
- run script
- osascript
- (and 1 more)
  Tagged with:

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

That's ok. You didn't tag me correctly. The correctly-tagged name will appear in purple. When you type @CJK, you should obtain a list of users that contain these letters in their username, and you can click on the appropriate one. Erm... Yeah, that's because you double-spaced the whole script. You've got a blank line in between every non-blank line. I have no idea how you accomplished that. I did a copy-n-paste test from my previous post and it pasted correctly. This is really helpful actually, because it finally pinpoints the nature of the error and where it's coming from. This is being generated by the sub-routine (handler) called filterContents. It uses an Objective-C data class called NSFileManager to access and manipulate filesystem objects. Basically, NSFileManager needs certain permissions/authorisation to be allowed to access your filesystem, and your system hasn't granted those access rights. I could play around with getting an authorisation request from the script side of things; or you could play around with physically granting those privileges on a permanent basis. Neither of these things are things I know how to do off the top of my head, so it's actually easier to get rid of that handler and get Finder to do the job that NSFileManager was doing. To summarise the changes I am making to the script: ① This handler is now defunct and will be deleted completely: ② This is where the line of code crops up that makes use of the now-defunct handler. I will be changing this block: to this: if class of filepaths ≠ list then set directory to POSIX file (POSIX path of filepaths) as alias tell application "Finder" to set filepaths to (every file ¬ in the directory whose name extension = "PDF") ¬ as alias list end if This will most likely fix the error. It now clarifies why the first workflow wasn't working for you, because that one relied on NSFileManager to do all of it's filesystem calls. Following these edits, the script and workflow will now completely rely on Finder and System Events for its filesystem calls. To avoid getting these posts even more bogged down with long script texts, I'll make the changes to the script in the previous post, so you can copy and paste it from there. Alternatively, since you had problems copying-and-pasting that code, you can download a copy of the AppleScript file from here: https://github.com/ChristoferK/AppleScriptive/blob/master/scripts/Append PDF Language To Filename.applescript The workflow with the updated script can be obtained here: https://transfer.sh/6CGwf/Append Language To Name of PDF File.workflow.zip I will update the link from the previous post too.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

Great. Always happy to hear thoughts/comments/etc.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

@jmm28260 Sorry for the delay—been unwell. (Also, it's a good idea to tag the person to whom you're replying in your post using the @ symbol followed by their username, otherwise they won't necessarily be notified by your reply; or, at least, I'm not, despite having the option selected). OK, well done for finding those bits and pieces. I'll leave those for you to attempt to implement if you really feel any of them are the problem. I don't have Mojave and I don't intend to, so those recommendations aren't really something I can play around with. In the meantime, I rewrote the Automator workflow from scratch. The workflow itself has been dramatically simplified because the AppleScript portion has taken over all of the functionality. I did this for a couple of reasons: 1) I wanted to reduce the number of workflow actions in which problems might potentially arise. There are now only 3 actions, and none of them need to be touched or edited, because there are no directory selections to be made for input and output file sources that was a source of contention for me. The only part of the workflow that requires customising is the path stored in the directory variable, and this can only be edited in the bottom pane where variables are stored in Automator: 2) Being now entirely script based, it means one can also just copy and paste the AppleScript into Script Editor and run it from in there. This will be a useful thing to do particularly if the Automator workflow fails on you. Script Editor will be a lot more helpful with its error messages that will make pinpointing any scripting issues a bit easier. Also, if it works in one environment, but not the other, then that's an entirely different problem that I doubt I'll be able to help with. Using this new method, there are no new files created at any point, so no text files will appear in the PDF folder or on the desktop. The only effect the script will (should) have run from either environment is to rename the PDF files by appending the language code to the end of the name, then revealing those files in Finder. The script doesn't return any value, so don't worry if the workflow results appear empty at the end. Here's the AppleScript: use framework "Foundation" use framework "Quartz" use scripting additions property this : a reference to current application property NSArray : a reference to NSArray of this property NSLinguisticTagger : a reference to NSLinguisticTagger of this property NSMutableDictionary : a reference to NSMutableDictionary of this property NSString : a reference to NSString of this property PDFDocument : a reference to PDFDocument of this property samples : 4 -- The (maximum) number of pages to sample for text on run filepaths set [filepaths] to filepaths & {null} if class of filepaths = script or filepaths = {} then set ¬ filepaths to [(choose file of type ["com.adobe.pdf"] ¬ with multiple selections allowed), null] set [filepaths, null] to filepaths if class of filepaths ≠ list then set directory to POSIX file (POSIX path of filepaths) as alias tell application "Finder" to set filepaths to (every file ¬ in the directory whose name extension = "PDF") ¬ as alias list end if set PDFs to {} repeat with PDFPath in filepaths set lang to probableLanguageForPDF at PDFPath set end of PDFs to my (stick on "_" & lang to PDFPath) end repeat tell application "Finder" reveal the PDFs activate end tell end run # stick # Appends a suffix to the filename (without extension) of the file at the # specified path, without altering the file extension to stick on suffix to fp as text local fp, suffix set filename to null tell (NSString's stringWithString:(fp's POSIX path)) to if ¬ false = ((the lastPathComponent()'s ¬ stringByDeletingPathExtension()'s hasSuffix:suffix)) ¬ as boolean then set filename to ¬ (((the lastPathComponent()'s ¬ stringByDeletingPathExtension()'s ¬ stringByAppendingString:suffix))'s ¬ stringByAppendingPathExtension:(the ¬ pathExtension())) as text tell application "System Events" to tell the item named fp if filename = null then return it as alias set dir to its container set its name to filename return the item named filename in dir as alias end tell end stick # probableLanguageForPDF # Obtain the most likely language of a PDF file based on sampling a small # number of its pages and returning the most commonly detected language code on probableLanguageForPDF at PDFPath as text local PDFPath set PDFFileURL to POSIX file (PDFPath's POSIX path) as alias set PDF to PDFDocument's alloc()'s initWithURL:PDFFileURL set PDFTitle to the PDF's documentAttributes()'s |Title| as text set N to the PDF's pageCount() as integer -- Ignore first and last page unless they are the only pages set PDFPageNumbers to array(0, N - 1) set a to item 2 of (PDFPageNumbers & {0}) set b to item -2 of ({N - 1} & PDFPageNumbers) if a > b then set [a, b] to [b, a] set langs to NSMutableDictionary's dictionary() -- The language of the PDF's title set PDFTitleLang to (NSLinguisticTagger's ¬ dominantLanguageForString:PDFTitle) langs's setValue:1 forKey:PDFTitleLang -- Select only a small sample of pages -- to obtain a language for each repeat with i from a to b by N div samples + 1 set PDFPage to the (PDF's pageAtIndex:i) set PDFPageText to PDFPage's |string|() set PDFPageLang to (NSLinguisticTagger's ¬ dominantLanguageForString:PDFPageText) set [x] to references in {langs's valueForKey:PDFPageLang} & {0} (langs's setValue:((x as integer) + 1) forKey:PDFPageLang) end repeat -- The most common language identified set lang to the last item of (langs's ¬ keysSortedByValueUsingSelector:"compare:") lang as text end probableLanguageForPDF # array() # Generate a list of consecutive (ascending) integers between +a and +b on array(a as integer, b as integer) local a, b if a > b then set [a, b] to [b, a] script |integers| property list : {} end script repeat with i from a to b set the end of the list of |integers| to i end repeat return the list of |integers| end array Finally, here's the Automator workflow file, which any future readers for whom this link will be broken will have no problems creating themselves by copy-n-pasting the above script. It will work as a single AppleScript action inside Automator, or you can copy the action workflow depicted in the image above that simply sets a directory. https://transfer.sh/6CGwf/Append Language To Name of PDF File.workflow.zip @jmm28260, report how things turn out with this iteration, either from Automator, or from Script Editor, or from both. If Automator fails, report how and why, but then definitely run from within Script Editor to confirm the error is script-generated rather than app-generated, and to get a more focused error report.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

It's still showing that you've set the input folder to one thing, and the output folder to another. I can even see you edited the path of the directory variable to lead to a second folder called 2. Renommer When I assigned the variable directory back to both the input and the output folders, then changed the path of directory to point to my test folder where I copied in a bunch of PDFs, the workflow ran as expected. My suggestion is that you first try and use the workflow exactly as I've laid out, and then you can play around with it a bit more when you are more familiar with what does what, etc. If you still have problems, you need to be very specific, and pedantic, about where the very, very, first unexpected occurrence takes place. You're using Mojave (I'm on High Sierra), so it's possible your Automator has more bugs in it (it sounds to me from the things I've heard that Mojave has been a nightmare for AppleScript and Automator). So, look at each action individually - for example, the first action is a Finder action, and they are notoriously buggy in some situations. Therefore, make sure it is actually finding your PDF files successfully (all of them). The script portion of the workflow appears to be working fine, even on your end. I know it's not producing the output you want yet, but there's no error being thrown and it completes its run. To me, it suggests that the problem is in the actions preceding it that are not set correctly, and not producing the output necessary. Worst comes to the worst, I can do a re-write of the whole thing. What I would do is replace the Finder action with another AppleScript action instead, and get that to retrieve the PDF files reliably. Then I'd recode the main AppleScript action to use less Objective-C and more AppleScript, as Objective-C can be quite strict with security protocols, and it's possible (but unlikely) that it's not permitting Automator to make changes to your filesystem. But you do your bit first. Then give me a couple of days and I'll see what's going on.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

No, I can't see, because you cut off the top of the workflow, so only the output folder was visible. Anyway, let's try again. Please upload your correct workflow, and I'll have another look.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

So that was a lie: In the first action, you have the search folder set to 1. OCR. In the second action, you changed the output folder to Desktop. Now compare it with the workflow that I sent to you: Both the search directory and the output directory are the same, and I created a dedicated variable for it that holds the path. It's called directory and it's stored with the Automator variables that are accessible at the bottom of the window by clicking one of the buttons down there. Double-click the directory variable, and you can set the path through this, which will set both the search path, and the output path. [ PS. You can actually delete the Set Variable action and fileList variable. That variable doesn't end up being used, so it and the action that creates it are superfluous. ]

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

@jmm28260 OK, thanks for this. Would you mind uploading your workflow as it is and I'll take a look at it on my system, otherwise we could spend days going back and forth here. I think that'll be easiest for us both.

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

As a tip to help get your problems solved faster, it's typically not very useful when you state that something doesn't work, then ask if I know what's wrong. I need a lot more information than that to be able to diagnose the potential issues. The dialog, whilst a sensible thing to screenshot and share, is sadly not especially helpful in this instance, but that's Automator's fault, not yours. It gives the vaguest errors, by informing you that there's a problem, and then assuming we all love needles in haystacks. What you should do first is edit the AppleScript and remove the line that says try, and remove the line that says end try. Then don't forget to press the hammer icon button, which recompiles the AppleScript code (do this whenever you make an edit to the code and before you run the workflow again). Then, what I need to know are the following: What version of macOS you're running ? What happens inside Automator itself when you run the workflow ? Which actions are completed successfully, with green ticks by them ? Which actions fail to complete, particularly which one fails first ? Of the actions that complete with green ticks, do any of them in the results section have an empty output ? Did you set the value of the directory variable to an appropriate value ? Did you change the location where the text files generated from the PDFs get saved ?

jmm28260 reacted to a post in a topic: Workflow to identify language in pdf January 4, 2019

Workflow to identify language in pdf

CJK replied to jmm28260's topic in Workflow Help & Questions

Well, that took much more than just a minute. Partly because your workflow was just kinda ugh, and partly because Objective-C is really bloody annoying on occasion, and the AppleScript needed to be re-written to cope with multiple file inputs, and because Automator can't do repeat loops by itself. Here's a screenshot of the Automator workflow, which now only has four actions: The modified AppleScript for use in the Run AppleScript action is below. Largely, the screenshot and script are for the benefit of anyone viewing this post at a later date, from which they can piece together the workflow themselves because I still haven't set up a base for storing permanent links to fileshares, so this one will be temporary: Append Language To Name of PDF File.workflow.zip use framework "Foundation" property this : a reference to current application property NSFileManager : a reference to NSFileManager of this property NSLinguisticTagger : a reference to NSLinguisticTagger of this property NSString : a reference to NSString of this property nil : a reference to missing value on run [fs, null] script fileURLs property list : fs end script set FileManager to NSFileManager's defaultManager() repeat with f in the list of fileURLs try set lang to "_" & ((NSLinguisticTagger's ¬ dominantLanguageForString:(NSString's ¬ stringWithContentsOfURL:f)) as text) set basename to (NSString's stringWithString:(f's ¬ POSIX path))'s stringByDeletingPathExtension() set oldname to (basename's ¬ stringByAppendingPathExtension:"pdf") set newname to ((basename's ¬ stringByAppendingString:lang)'s ¬ stringByAppendingPathExtension:"pdf") tell the FileManager to moveItemAtPath:oldname ¬ toPath:newname |error|:nil -- Rename PDF file tell the FileManager to trashItemAtURL:f ¬ resultingItemURL:nil |error|:nil -- Delete text file end try end repeat end run

Sign In

Posts

Joined

Last visited

Days Won

Profile Information

Recent Profile Visitors

VincentX

Vero

Andrew

Wangyou Zhang

rodrigobdz

CJK's Achievements

Member (4/5)

Reputation

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

solved [Solved] Output of Run Script (osascript) action terminates with line break

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Workflow to identify language in pdf

Browse

Activity