Jump to content

Web Search without HTTP links


Recommended Posts

I'm starting out trying to make workflows and have to this point just managed to do web searches on a specific website by using Alfred's buildt inn function with the {query} option. This has worked out perfectly with sites that displays a HTTP address for a search, but how can i tackle pages that does not do this?
ATM i'm trying to make a search for this page http://www.magnusli.no/ifirooms/. I tried to inspect the getRoom() function and stole a link from there 

include/get_room.php?id=

and tried to use the {query} on that (with eg. 'Caml' as a search) and that worked to a point where the function returned the data for that search, but that was not what i intended to do. The request returned this:
1519201981_Skjermbilde2018-07-21kl_14_31_33.thumb.png.eeb93f463056570b7a291b215dbd6790.png
Wanted outcome:
91611419_Skjermbilde2018-07-21kl_14_31_10.thumb.png.cd37cf991ac9d707faa1e21819193ab4.png

Any tips or input as to where to start or how to tackle an issue like this would be much appreciated. 
Thank you!

Edited by FreeeG
Formating
Link to comment

You can't use a regular Alfred web search for this because the page is a JavaScript "app". The URL you've found is called via JavaScript and injects the result into the current page.

 

Consequently, to achieve the result you want, you need to inject JavaScript into the page and execute it.

 

Here's a demo workflow.

 

It's smart enough to inject the JS into the correct tab, but it's hard-coded to Safari because I know how to do all this in Safari.

 

You appear to be using some kind of Chrome browser, so I'll leave adapting it to Chrome as an exercise for the reader :) 

 

Mostly, you'll just need to change the code than injects/executes the JavaScript and point the workflow at the correct app.

Link to comment

Welcome @FreeeG,

 

On 7/21/2018 at 1:35 PM, FreeeG said:

This has worked out perfectly with sites that displays a HTTP address for a search, but how can i tackle pages that does not do this?

 

The answer is “it depends on the website”. In this case what you type in the box is being sent to a PHP script that parses your request and sends the information back, at which point the webpage fits it in. When you’re sending your request directly to the PHP script, as in your example, you’re getting the raw information. There seems to be no simple way to do this by giving the browser a URL.


A way to go about it is to tell the browser to open the page and then execute what needs to be done. Using JavaScript to write what you need on the search box isn’t enough because the page takes a while to detect it. What we can do is ignore the site’s search and do it ourselves, essentially loading the page, figuring out every element that doesn’t match what we want and hiding it.


Like so (edited). I’ve made it work just in Chrome, as it seems to be the browser in your screenshots.

Edited by vitor
Link to comment

Well, there's the code to run JS in Chrome. I'd use that to alter the workflow I posted, as it makes sure the page is open in the browser before trying to run JS in it.

 

11 minutes ago, vitor said:

Using JavaScript to write what you need on the search box isn’t enough because the page takes a while to detect it

 

var js = 'document.getElementById("search_input").value="' + query + '";getRooms();'

:) 

Link to comment
2 minutes ago, deanishe said:

as it makes sure the page is open in the browser before trying to run JS in it.

 

My version opens the page and then waits for it to finish loading before running the JS. I could have an extra check in place to guarantee the active tab is really that one, but in practice I doubt it’ll make a difference.

 

5 minutes ago, deanishe said:

var js = 'document.getElementById("search_input").value="' + query + '";getRooms();'

 

 

I looked at the script tag and thought “nah, I don’t want to wade through javascript files to get the exact function”. It’s literally the first line! I’ve updated the previous link to use that solution. Thank you.


Small technical note you may be interested in. To use that solution we have to change the way the JS is sent to Chrome. If we use execute({javascript: js_code}) it will execute our code but ignore the page’s, meaning it will fail to run getRooms(). The workaround is to instead url = 'javascript:' + js_code.

Link to comment
23 minutes ago, vitor said:

My version opens the page and then waits for it to finish loading before running the JS

 

Right. See what you mean. Does Chrome simply reload an existing tab if the URL is already open? (That's what Safari does.)

 

23 minutes ago, vitor said:

it will execute our code but ignore the page’s

 

That sucks.

 

23 minutes ago, vitor said:

The workaround is to instead url = 'javascript:' + js_code.

 

Presumably, you also have to URL-encode the JS in that case, too?

 

Edited by deanishe
Link to comment
52 minutes ago, deanishe said:

Does Chrome simply reload an existing tab if the URL is already open?

 

Not with the openLocation method I’m using (which is enabled by includeStandardAdditions, not a Chrome feature). Unsure if it does with other methods (I typically want a new tab). Would have to try it out.

 

52 minutes ago, deanishe said:

Presumably, you also have to URL-encode the JS in that case, too?

 

Nope. Just checked and AlfredBookmarklet’s sanitising feature even reverses the encoding it there’s any (so the same code works with both Chrome and Safari). It also seems to work when encoded, so it appears Chrome is smart enough to figure it out when it sees javascript: at the start.

Edited by vitor
Link to comment

@vitor and @deanishe, thank you guys so much for the helping hand! Appreciated! Now I just have to hope and see if i cant retrace your steps and learn from your solutions ?Been a normal Alfred user for quite some time, but now i'm looking forward to get better at thinkering with custom solutions! ?

Edited by FreeeG
Link to comment

I have a new issue i was wondering if any of you more experienced than me could answer. The workflows @vitor and @deanishe worked out great on my mid 2015 macbook pro - but i just got the newest 15" macbook pro on the door today and the workflow does not seem to want to function any longer. Ideas? When i use e.g. @vitor's workflow it opens the page - but does not search it. Still using chrome.


EDIT: Found out that Chrome was just updated. Belive that to be the reason behind it, but dont know why.

Edited by FreeeG
Link to comment
  • vitor changed the title to Web Search without HTTP links
  • 8 months later...

@FreeeG I think I have your solution and hope it is useful to you or others:

 

Chrome > View > Developer > Allow Javascript from Apple Events

 

I am working on a similar problem with an internal site that just stopped using URL-based search strings after an upgrade, and this thread was really useful. I'm NOT a coder but was able to adapt the code to my site and get it working. Prior to making the above change in Chrome, I was getting the same issue as you where the page would open but not run the code. Pasting my code into the Console worked properly. After making the above change it's running properly and I can start building a more complex workflow around it.

Screen Shot 2019-04-04 at 6.01.04 PM.png

Link to comment

@deanishe Very good to know! I'm sure it's come up elsewhere in these forums, which I'll be digging through as I try to make more powerful workflows with this new site.

 

Any other discussions you'd recommend to followers of this thread? I'm very interested in retrieving/caching search results directly in Alfred, which may be a challenge since it's behind a login. I learn a lot from reading how others have met similar needs.

Link to comment
26 minutes ago, Matt LE said:

which may be a challenge since it's behind a login.

 

That's just a bit of web automation. Where it becomes tricky is if the site actually requires Javascript to be executed. If not, you can usually get what you need with an HTTP library that supports cookies and an HTML parsing library. If JS support is required, then you'll have to use a headless browser instead (i.e. run a full copy of Chrome/Firefox and automate that).

 

If you can get away without the full browser, JS apps are often easier to pull data out of because the data is typically in easily-parsed JSON, not HTML. Sometimes the data are actually stored in a nice, machine-readable form in data- attributes on HTML elements, too, which is also easy to extract.

 

Link to comment

@deanish I love your confidence in my skills! Thanks for the few keywords to help me get started on this journey. I really want to make sure whatever I do can be deployed to my team, and fortunately we've standardized on Macs. So we'll be purchasing more Alfred licenses once I make some progress.

 

Any recommended reading?

Link to comment
  • 1 year later...

Can I ask you guys a very basic question about how to run searches on these types of websites (i.e., ones whose URLs don't change after being run via javascript)?

 

Admittedly, whenever I come across one of these websites that I’d like to put in Alfred’s web searches, I always wind up throwing in the towel out of frustration. I can never figure out how to even find the name of the appropriate function. For example, in @FreeeG's room website, @deanishe where did you find “getRooms()”? I’m ashamed to admit that I tried looking through the inspector using Safari, Chrome, and Firefox  - and I still can’t find it. As @vitor points out, I’m sure it’s at the top line of something I’m overlooking. Where did you find that in the inspector? 

 

In any case, the reason I ask is that I’ve been trying to do something similar on the US Patent Office's website. While they've recently developed several APIs for building your own tools, I don't mind just using their GUIs to run the queries. Unfortunately, however, I can’t can’t figure out what function to call from their website. I’d just like to kick off the query with Alfred, like with normal websites whose URL parameters change.

 

While these websites are all pretty similar, here’s one that allows you to search for Board decisions. 

And, here’s a few screenshots from different browsers in case I’m overlooking something that’s directly in front of my face (Safari - Chrome - Firefox):


Safari.thumb.jpg.bfd1a7734f12482e48045f8ff6cebc36.jpg

 

Chrome.thumb.jpg.76073d3ba65ccc72ea1203ed4a5c4a46.jpg

 

FF.thumb.jpg.deaf53927c76a42f99613660d0ff7484.jpg

 

 

Can anyone help me find the appropriate function to call here? Depending on the panel, this website looks like it has millions. Or, if you’re anyone is aware of another forum post or something with a working example, then I could certainly try to work my way backwards. That's what I was trying to do with the rooms website posted here anyways. In fact, after updating the ID (which I can actually find - searchText) in @deanishe's workflow, I can get the text in the search box, but just can't figure out how to run it for obvious reasons.

 

Side Note: although I’m not terribly interested in building my own thing, it’s pretty easy to run a search with the patent office’s API (which just spits out its JSON results). In case it’s helpful to others, here’s a example:

Of course, my preference would be to just see everything visualized in their GUI.

 

Thanks for any help you can lend!

Link to comment
4 hours ago, Jasondm007 said:

I’m ashamed to admit that I tried looking through the inspector using Safari, Chrome, and Firefox  - and I still can’t find it.

 

That's because it's gone. This thread is nearly 3 years old, and they've changed the code since then. Now it's controlled by a bunch of anonymous functions, so manipulating it from outside is much more difficult.

 

4 hours ago, Jasondm007 said:

While these websites are all pretty similar, here’s one that allows you to search for Board decisions. 

 

You don’t necessarily need to call a function. If there’s an actual search form (i.e. with a <form> element), you can often just submit() that instead.

 

With a site like this (Angular/React/Vue SPA), though, it's often easier to use the same API endpoints the website uses and build a Script Filter based on that. Injecting JavaScript is problematic because these frameworks often don't notice the changes your script makes.

 

Link to comment
10 hours ago, Jasondm007 said:

And, here’s a few screenshots from different browsers

 

You’re looking in the wrong place. As @deanishe said:

 

5 hours ago, deanishe said:

it's often easier to use the same API endpoints the website uses and build a Script Filter based on that.

 

Go into the Network tab of the Developer Tools. Leave it open and clear. Do a search on the website. You’ll notice (in this case) that a json was retrieved. Right click it and Copy → Copy as cURL. That’s the network command that was used to retrieve the data that was then parsed by the website. If you paste and run the command as is in a terminal, you’ll see the JSON contents spew out. Look at the command carefully and you’ll see where your search query is. Then start trimming the curl command until you figure out which parts are essential; you seldom need everything that was copied. You’ll end up with something you can generalise to replace your desired queries in the right place. Use that command.

Link to comment

@deanishe & @vitor Thanks a ton for getting back to me! I'm sorry that I missed your responses earlier this week. I accidentally setup an email rule that was being a little too aggressive with messages from the forum! Ha

 

On 3/17/2021 at 12:16 AM, deanishe said:

You don’t necessarily need to call a function. If there’s an actual search form (i.e. with a <form> element), you can often just submit() that instead.

 

I tried several different ways. But was never able to get the website to submit its query properly. For example, I can get search terms into the form by using some version of the following:

 

searchSafari("Yahoo", "searchText") of me

on searchSafari(theQuery, theFormId)
	tell application "Safari"
		activate
		open location "https://developer.uspto.gov/ptab-web/#/search/decisions"
		delay 3.0
		do JavaScript "document.getElementById('" & theFormId & "').value ='" & theQuery & "';" in document 1
	end tell
end searchSafari

 

I even tried separating out for form's submission, like the following example, but most of the time, it wouldn't actually work. Interestingly, the website will display the search graphic, but it doesn't actually submit the query appropriately (e.g., if you clear it out and submit the search manually, you'll get different results).

 

searchSafari("Apple", "searchText", "btn btn-primary")

on searchSafari(theQuery, theFormId, theButtonClass)
	tell application "Safari"
		activate
		open location "https://developer.uspto.gov/ptab-web/#/search/decisions"
		delay 3.0
		do JavaScript "document.getElementById('" & theFormId & "').value ='" & theQuery & "';" in document 1
		delay 2.0
		do JavaScript "document.getElementsByClassName('" & theButtonClass & "')[0].click()" in document 1
	end tell
end searchSafari

 

Any ideas how to get this working? Or is there a more generalizable solution, like what you mentioned above? I assume by "API Endpoints" you're referring to actually viewing the results in the script filter, right? While that sounds cool, honestly, I was just looking for something a little more simple, like running the search on their page.

 

Here's the most relevant portion of the website (highlighted portion is the search box):

 

HTML.thumb.jpg.472f28b416329e4ab4aacbddf63aaf45.jpg

 

On 3/17/2021 at 5:57 AM, vitor said:

Network tab of the Developer Tools

 

This was a really cool suggestion. While I looked at that portion of the inspector prior to my post, I had no idea that you could actually see what was going on this way, too. 

 

I'm sure this is a stupid question, but how do I take the curl and open it back up in Safari through the website? For example, here's the full code for a search for "test":

 

curl 'https://developer.uspto.gov/ptab-api/decisions/json' \
-X 'POST' \
-H 'Accept: application/json, text/plain, */*' \
-H 'Content-Type: application/json' \
-H 'Origin: https://developer.uspto.gov' \
-H 'Cookie: _4c_=XVLBjpswEP2VyueQ2MY4kNtW6mGl1SrSVpV6QsYeghUCyDhQNsq%2Fd5wE0q6RrPF7z2%2FMzFzIWEFDdkzil25TGrOEr8gRpp7sLsQ0YR%2FIjhgo1bn2ZEX%2B3NQZTWjKGUvj64rYzj90yDC%2BFciINPmiRSRoZ0f11evOu%2FFpdSdiueX%2FSwMSpNbMZkUpaFEoFhmR8UiUEEdFRnWUFiKRQhRlVvI5H2eUU55xEYd8unt4XIhuDaAXy9aMrRnK%2FSceI8EpxtCEPL0%2FYHx2NcaV912%2F22zGcVyf%2B86360M7bGwXdW1t9RQi5b3SFUSdaw9Onf6FtHIArkc3JM1Z%2B9xPXUg%2FQvGtN0ckDAxWQz5a46vwLiHoE63AHiosO8noDe1ckGA02sa043IrE%2BkTXC7JWCK6Vx4aj50m73j66ZSBk3LHGXjd529qzPe338lfm7JF4jcSvwDd3IwE6cugbJ3%2FqEF71zZW59%2FtZ%2F4xLU6Nr%2FMX7e1gvYUlIfTentpmyj86ACzKnbjOzZY4SYzhUxm2wmPJUyloWKgYljliUgsoCx3JmGPrsb1RQRMWJSpmW1XKbRyzZdRwyLMELRP5sGTp3fF6%2FQs%3D; _ga_CD30TTEK1F=GS1.1.1616187700.10.1.1616187803.0; _ga=GA1.1.1333116393.1612822856' \
-H 'Content-Length: 138' \
-H 'Accept-Language: en-us' \
-H 'Host: developer.uspto.gov' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15' \
-H 'Referer: https://developer.uspto.gov/ptab-web/' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Connection: keep-alive' \
--data-binary '{"dateRangeData":{},"facetData":{},"parameterData":{},"recordTotalQuantity":25,"searchText":"test","sortDataBag":[],"recordStartNumber":0}'

 

And, I noticed that I could still get it to work when trimmed down to the following:

 

curl 'https://developer.uspto.gov/ptab-api/decisions/json' \
-H 'Content-Type: application/json' \
--data-binary '{"dateRangeData":{},"facetData":{},"parameterData":{},"recordTotalQuantity":25,"searchText":"test","sortDataBag":[],"recordStartNumber":0}'

 

Or, did you mention using curl as a possible substitute for using their website?

 

Thank you both for all of your help!! I really appreciate it.

Link to comment
21 hours ago, Jasondm007 said:

Any ideas how to get this working?

 

I'm not sure if it's possible, at least not easily. It's an SPA (Angular, I think), which means it's managing its own state, not just putting things in form fields. As such, it doesn't notice when you use different JS to manipulate the form. You might have to call some global Angular function (I don't know Angular), or it might not be possible for your JS to interact with the application.

 

21 hours ago, Jasondm007 said:

I assume by "API Endpoints" you're referring to actually viewing the results in the script filter, right?

 

Exactly.

 

21 hours ago, Jasondm007 said:

I was just looking for something a little more simple, like running the search on their page.

 

When dealing with a website like this one, building a Script Filter is the simpler solution…

 

Link to comment
  • 6 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...