What is the easiest way to scrape websites to put into a script filter?

nikivi · January 29, 2017

I wanted to make a list filter for Project Euler (https://projecteuler.net/archives) that would effectively give me a list of all the problems that exist on the site in a list.

I can make a list filter and manually create a link and an appropriate title and put it under one list filter. Kind of how I have done here (https://github.com/nikitavoloboev/alfred-ask-create-share). However this is very time consuming. Project Euler also doesn't offer any API.

Can someone recommend a way where I can get all of these problem titles and their respective links :

Under one script filter.

Thank you for any help.

vitor · January 29, 2017

There is no “easiest way”. Each site is different and as such will require a different approach. The easiest way is the one easiest for you.

I’m a fan of Nokogiri, as I tend to use ruby for this. This workflow should do what you want. It currently only retrieves the first page of results.

For when the link expires, this is the code:

require 'json'
require 'nokogiri'

base_url = 'https://projecteuler.net'
archives_url = "#{base_url}/archives"
problem_url = "#{base_url}/problem="

table = Nokogiri::HTML(%x(curl "#{archives_url}")).at('#problems_table')
table.at('tr').remove

script_filter_items = []

table.css('tr').each do |exercise|
  number = exercise.at('td').text
  name = exercise.css('td')[1].text
  solvers = exercise.css('td')[2].text

  url = "#{problem_url}#{number}"

  script_filter_items.push(title: name, subtitle: "#{solvers} people solved this", arg: url)
end

puts ({ items: script_filter_items }.to_json)

I do realise %x(curl "#{archives_url}") is a tad ridiculous, but the system ruby was giving me an SSL error for this site (latest ruby does not) so I did that as a quick patch.

Edited January 29, 2017 by vitor

nikivi · January 30, 2017

Thank you a lot, @vitor. I will try to extend it to give a list of all the problems from all webpages. Also I am really fan of the approach of downloading these hyperlinks to some local sql database and then just reading from the database. Like this workflow does (https://github.com/lox/alfred-github-jump).

Also I have a question. You use transfer.sh to share files and workflows. Do you have any handy workflows built for it?

Like for example a file filter that will take a folder/file and will give a transfer.sh link.

vitor · January 30, 2017

4 hours ago, nikivi said:

Also I have a question. You use transfer.sh to share files and workflows. Do you have any handy workflows built for it?

Yes, UploadFile.

Sign In

What is the easiest way to scrape websites to put into a script filter?

Recommended Posts

nikivi

Link to comment

vitor

Link to comment

nikivi

Link to comment

vitor

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity