Workflow to identify language in pdf in Workflow Help & Questions Posted January 22, 2019 1 hour ago, jmm28260 said: I changed file, and it seems to be working now That's good to hear. However, it was still an error that occurred in one file, so does require fixing, as there may be other PDFs that have blank pages. I've updated the file on GitHub with a fix for that, and a couple of other minor adjustments that I also realised were potential weak spots in those sorts of one-off situations. 2 hours ago, jmm28260 said: One last thing that is strange: when I feed the script with italian text, I may get .en with some files or rightfully so, .it. Apparently its detection of italian is erratic No idea I'm afraid. That one will be down to the language tagging function, which isn't something I'm able to change. Mojave does actually have a newer language detection class called NLLanguageRecognizer, which I'm uncertain what the differences are between that and the one my script uses. But I am running High Sierra, so don't have access to the newer one in order to try it out. Without knowing the content of the PDFs that you were expecting to be tagged as being Italian, I can't judge any probable things to investigate. But, you could always try increasing the number of pages that are sampled in each PDF. There's a line in the first section of the script where all the properties are declared: property samples : 4 If you increase this number, the script will use more pages to identify the language. Just in case you're interested, the current method of decision making for the language is done by detecting the language on each sampled page. If different pages produce different results, then the script chooses the language that was tagged in the most number of pages from the sample. So if your PDF had a mixture of Italian and English in it, and the sample pages happened to feature more densely occurring English than Italian, it would tag the file overall as English. The other way I contemplated doing it was combining the sample pages together into one long piece of text, and detecting the language the that as if it were a single page. That would always produce one affirmative result. Then, of course, the last possibility is to remove the sampling altogether and just assess the entire PDF file in one go. However, I avoided that because some of my PDFs are quite large, and if the script had to convert many PDFs each with hundreds of pages, that's obviously going to slow it done compared to just sampling 4 pages from each PDF without necessarily producing better results. Anyway, if you have a particular preference over which method should be used, just let me know and I can change the implementation quite easily.