[PHP] Parsing big json file

vdesabou · April 17, 2013

Hi,

For my Spotify Mini Player workflow, I need to parse a library.json file which contains all spottily playlists of the user.

The json data looks like:

[
  {
    "data" : {
      "discNumber" : 1,
      "album" : {
        "artist" : {
          "portrait" : "",
          "type" : "artist",
          "name" : "J-Boogie's Dubtronic Science",
          "uri" : "spotify:artist:76MRWSA9TZZg6s3Ijy7mxW"
        },
        "availableForPlayback" : true,
        "availability" : 0,
        "cover" : "spotify:image:b9346fb9030ede38617e42eafd553d17fa7cf307",
        "year" : 2009,
        "type" : "album",
        "numTracks" : 0,
        "name" : "Soul Vibrations Dub Remixes",
        "uri" : "spotify:album:1QMgSZ88TOWU6ctJRAW4T6"
      },
      "trackNumber" : 12,
      "availableForPlayback" : true,
      "uri" : "spotify:track:7AAR8NjH6RY4Dk5Ny5xQcb",
      "starred" : false,
      "artists" : [
        {
          "portrait" : "",
          "type" : "artist",
          "name" : "J-Boogie's Dubtronic Science",
          "uri" : "spotify:artist:76MRWSA9TZZg6s3Ijy7mxW"
        }
      ],
      "popularity" : 23,
      "type" : "track",
      "isAd" : false,
      "availability" : 0,
      "isLocal" : false,
      "isPlaceholder" : false,
      "isExplicit" : false,
      "duration" : 290000,
      "isInvalid" : false,
      "isLoaded" : true,
      "name" : "1.4.u Dub - Feat. Ohmega Watts & Ragen Fykes"
    }
  }
]

I am new to php (this is the first time i use it), so to parse this json file, I "loop" on all items and use strpos() to find matches:

		$json = file_get_contents($w->data() . "/library.json");
		$json = json_decode($json,true);
		
		foreach ($json as $item) 
		{			
			if (strpos(strtolower($item['data']['album']['artist']['name']),strtolower($query)) !== false ||
				strpos(strtolower($item['data']['album']['name']),strtolower($query)) !== false ||
				strpos(strtolower($item['data']['name']),strtolower($query)) !== false)
			{				
				// do something

			}
		};

This is working fine and I don't have any performances issues(i was surprised by the speed actually) with my Spotify library (2.2Mb JSON file with around 2200 tracks)

But some users have more than 100 playlists, so the json file is about 15Mb

To decode the json file, I had to add:

ini_set('memory_limit', '128M' );

Questions:

Is it a problem to set memory_limit like this?
Is there a clever way to read/parse a JSON file?
What about transforming the json data into a SQL db? Would it give better performances?

I'm open to any suggestions :-)

Cheers

_mk_ · April 18, 2013

Why don't you split the playlists (maybe one file per playlist) into multiple json files and loop over them?

jdfwarrior · April 18, 2013

Hi,

For my Spotify Mini Player workflow, I need to parse a library.json file which contains all spottily playlists of the user.

The json data looks like:
[
  {
    "data" : {
      "discNumber" : 1,
      "album" : {
        "artist" : {
          "portrait" : "",
          "type" : "artist",
          "name" : "J-Boogie's Dubtronic Science",
          "uri" : "spotify:artist:76MRWSA9TZZg6s3Ijy7mxW"
        },
        "availableForPlayback" : true,
        "availability" : 0,
        "cover" : "spotify:image:b9346fb9030ede38617e42eafd553d17fa7cf307",
        "year" : 2009,
        "type" : "album",
        "numTracks" : 0,
        "name" : "Soul Vibrations Dub Remixes",
        "uri" : "spotify:album:1QMgSZ88TOWU6ctJRAW4T6"
      },
      "trackNumber" : 12,
      "availableForPlayback" : true,
      "uri" : "spotify:track:7AAR8NjH6RY4Dk5Ny5xQcb",
      "starred" : false,
      "artists" : [
        {
          "portrait" : "",
          "type" : "artist",
          "name" : "J-Boogie's Dubtronic Science",
          "uri" : "spotify:artist:76MRWSA9TZZg6s3Ijy7mxW"
        }
      ],
      "popularity" : 23,
      "type" : "track",
      "isAd" : false,
      "availability" : 0,
      "isLocal" : false,
      "isPlaceholder" : false,
      "isExplicit" : false,
      "duration" : 290000,
      "isInvalid" : false,
      "isLoaded" : true,
      "name" : "1.4.u Dub - Feat. Ohmega Watts & Ragen Fykes"
    }
  }
]
I am new to php (this is the first time i use it), so to parse this json file, I "loop" on all items and use strpos() to find matches:
		$json = file_get_contents($w->data() . "/library.json");
		$json = json_decode($json,true);
		
		foreach ($json as $item) 
		{			
			if (strpos(strtolower($item['data']['album']['artist']['name']),strtolower($query)) !== false ||
				strpos(strtolower($item['data']['album']['name']),strtolower($query)) !== false ||
				strpos(strtolower($item['data']['name']),strtolower($query)) !== false)
			{				
				// do something

			}
		};
This is working fine and I don't have any performances issues(i was surprised by the speed actually) with my Spotify library (2.2Mb JSON file with around 2200 tracks)

But some users have more than 100 playlists, so the json file is about 15Mb

To decode the json file, I had to add:
ini_set('memory_limit', '128M' );
Questions:

Is it a problem to set memory_limit like this?

Is there a clever way to read/parse a JSON file?

What about transforming the json data into a SQL db? Would it give better performances?

I'm open to any suggestions :-)

Cheers

On large dataset you may start seeing a LITTLE bit of a slow down but parsing the json and such shouldn't be too much of an issue, it would probably be more of the, having to loop through every record and performing a search.

Are you serious, who has over 100 playlists!?

Anyway, yeah, an SQLite database would be one way to speed this up. That's how I did the Chrome Bookmarks Search workflow (which I need to finish updating). I used a library I was building initially to try to simplify the interaction with the database but, it seems now that just executing the query from the command line is a lot faster. To do this you would need to set an "updated" flag in the database somewhere so that you could check the last time it was updated (may could even just check the file mod time of the database). Each time you execute, check mod time of the json file and last updated time, if they differ, truncate the database cache and rebuild it. Truncating and rebuilding (even with a lot of records) is surprisingly fast.

vdesabou · April 18, 2013

Why don't you split the playlists (maybe one file per playlist) into multiple json files and loop over them?

Yes, I would like to do that, but unfortunately the Spotify Apps API does not allow that right now, if I'm correct (http://developer.spotify.com/technologies/apps/docs/).

This is what I am doing with "starred playlist" because there is a "starred" flag in the json data so I generate a json file for starred playlist only.

But yes, if the performances are very bad with 100+ playlists, I would have to find a way to split the json data into multiple files at some point.

Thanks!

vdesabou · April 18, 2013

On large dataset you may start seeing a LITTLE bit of a slow down but parsing the json and such shouldn't be too much of an issue, it would probably be more of the, having to loop through every record and performing a search.

So i was finally able to use the library.json with 100+ playlists (15Mb of data) (php memory was increased to 256M!), and actually it's pretty fast!! with 8 results and all artworks cached, it takes about 2 seconds to display results. This is quite acceptable I believe given the number of tracks :-)

Are you serious, who has over 100 playlists!?

I know :-)

Anyway, yeah, an SQLite database would be one way to speed this up. That's how I did the Chrome Bookmarks Search workflow (which I need to finish updating). I used a library I was building initially to try to simplify the interaction with the database but, it seems now that just executing the query from the command line is a lot faster. To do this you would need to set an "updated" flag in the database somewhere so that you could check the last time it was updated (may could even just check the file mod time of the database). Each time you execute, check mod time of the json file and last updated time, if they differ, truncate the database cache and rebuild it. Truncating and rebuilding (even with a lot of records) is surprisingly fast.

I'll keep this in mind if I want to improve performances at some point.

Thanks for your advices!!

vdesabou · July 12, 2013

Anyway, yeah, an SQLite database would be one way to speed this up. That's how I did the Chrome Bookmarks Search workflow (which I need to finish updating). I used a library I was building initially to try to simplify the interaction with the database but, it seems now that just executing the query from the command line is a lot faster. To do this you would need to set an "updated" flag in the database somewhere so that you could check the last time it was updated (may could even just check the file mod time of the database). Each time you execute, check mod time of the json file and last updated time, if they differ, truncate the database cache and rebuild it. Truncating and rebuilding (even with a lot of records) is surprisingly fast.

Hi David,

I finally implemented this SQL solution in the version 3.0 of my workflow, and yeah that's amazing how this has improved performances!!! I get 100 results with a 18K library, with artworks displayed, within 150ms!!

See it in action here https://vimeo.com/70175318

FYI, I was using your methods from workflows.php to store configuration in plist file but I had some performances issues with it, so I decided to store settings in SQL db as well :-)

Thanks for your help!

jdfwarrior · July 13, 2013

Hi David,

I finally implemented this SQL solution in the version 3.0 of my workflow, and yeah that's amazing how this has improved performances!!! I get 100 results with a 18K library, with artworks displayed, within 150ms!!

See it in action here https://vimeo.com/70175318

FYI, I was using your methods from workflows.php to store configuration in plist file but I had some performances issues with it, so I decided to store settings in SQL db as well :-)

Thanks for your help!

awesome dude, congrats, it looks great

Sign In

[PHP] Parsing big json file

Recommended Posts

vdesabou

Link to comment

_mk_

Link to comment

jdfwarrior

Link to comment

vdesabou

Link to comment

vdesabou

Link to comment

vdesabou

Link to comment

jdfwarrior

Link to comment

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity