IGDB Data Switch Discussion

So here’s a page that I’ve merged just kind of for a test and to have a page for people to look at. No More Heroes | Grouvee I’ll get all the reviews, status, and playthrough data on there very soon. All the developer/publisher/genre data is IGDB stuff on this page.

I bet you can play 7 degrees of No More Heroes and get to any of the IGDB games in our database just by clicking through the developer/publisher/ports/bundles/etc. links.

I’m only at about 65k games downloaded. I knew there were going to be errors downloading the data because that’s just the way things go when you’re trying to work with a huge dataset. I had one that took me a minute to figure out, but it’s back and going strong again.

4 Likes

Gonna open that link on desktop because on mobile it is wonky.

2 Likes

That’s good to know. As a web developer, I should probably test things on mobile before anything. I’m very good at as you can tell :slight_smile:

2 Likes

You have a hoard of minions to test things for you. It looks beautiful on desktop. :slight_smile:

EDIT: If the data is edited on IGDB, does it update / reflect on Grouvee?

1 Like

It will. Once this initial big download is done, I’ll have a task that downloads data that’s been updated every 6 - 8 hours or so. I actually think they have ways to register for game updates so I might be able to get data updates much quicker than that, but I haven’t looked into that too hard yet.

3 Likes

Yep, looks great on the latest Firefox on Desktop!

2 Likes

This is slightly terrifying. I finished downloading the database, and I decided to run the “update” script to see what’s changed in the last few days. It was about 20,000 games have been updated or added. They churn a lot of data over there.

4 Likes

Giant bomb was in the grand scheme of things still niche - especially when it comes to the wiki. I get the feeling that igdb operates on a scale that is, at least, an ordner of magnitude bigger.

It’s also directly attached to twitch, so it shares accounts with one of the biggest gaming communities on the planet. That incentivises the community, and in turn the developers, to keep the database up to date because that directly feeds back into the quality of twitch itself.

I created a game page for a pretty niche game earlier this week because I wanted to see how the process works, and by today someone already added screenshots, artwork, and fixed some minor metadata issues. The sheer scale of the community is mindboggling.

6 Likes

Just curious, what was the game?

2 Likes

https://www.igdb.com/games/buggos-2

2 Likes

Look at that! It got downloaded properly!

5 Likes

It’s super exciting to see that the end-to-end workflow of creating a game on igdb and it then showing up on grouvee already works this well!

5 Likes

whoa that’s wild seeing new stuff… how does one interact with it? Or is it not ready yet?

4 Likes

It is so exciting. I was thinking about adding higher res box art for some of the game entries on IGDB so it’d look better here on Grouvee lmao.

3 Likes

We have already imported the bug(go)s? :face_with_hand_over_mouth: Perfection!

Excited for everything coming to the site.

3 Likes

I feel like this number is highly inflated. When I query their API for games that have changed in the last 24 hours, these really big 20k+ numbers come back. They have fields in their API for ratings, avg_ratings, hypes, etc. that if someone goes and clicks a 5 star rating on their site, it’s going to “update” the data. I obviously don’t care about that data, but there’s no way to filter it out. The main problem is that when I’m looping through all these “changed” games, there’s so much time for a server to time out or throw an error whatever that the likelihood of the loop failing at some point is pretty high.

They have ways to register webhooks where they’ll actually ping my servers when data changes, but I certainly don’t want them pinging every single time a rating occurs. I’m not too sure what’s the best course of action here.

3 Likes

This sounds like a difficult problem.

I have skimmed through their FAQs and the API site, but nowhere is mentioned how to seperate “updated” by specific fields, so star rating and other changes can be left out.

Kind of same for webhooks.

Webhooks allow us to push data to you when it is added, updated, or deleted. Instead of polling the API for changes, you can listen on your own HTTP endpoint (Webhook) and we will deliver the data to you.

“updated” seems here also to be that catch all changes thing, so I guess it changes nothing if you query them or if they deliver to you.

My google fu has also brought up nothing on the problem. I am not a huge friend of Discord but asking the question there seems to be at least an option?

How many changes came up when you did a query on GiantBomb on average? How many do you think would be stable enough to not have too many errors occuring?

2 Likes

As armchair architect with zero knowlege on how grouvee works internally, the first thing that comes to mind is that even if it’s not directly possible to reduce the downloaded data, it could be an idea to use preprocessing to filter out updates that aren’t relevant.

The idea would be to calculate a hash for each game in the grouvee db that consists of all relevant fields that should be updated from igdb. Whenever a new batch of game updates is downloaded, the same hash is calculated for each igdb game, and the hashes are compared for each game. If they match, nothing was changed and the whole game update can be discarded.
Depending on just how noisy the igdb data is this could potentially greatly reduce update and index churn, especially since comparing hashes should be massively faster than doing comparisons on actual game data.

In addition it could also be a consideration to dump the whole igdb update batch into a redis or some other lightweight queue, then use worker processes to pop a couple of messages, do the update check above and batch commit all changed data at once. This would break the big update loop down into smaller chunks to prevent timeouts.

3 Likes

Peter, we use this as one of our tools for modifying metadata in library records (MARC etc). https://marcedit.reeset.net/

For instance, we can export a large batch of records to marc edit, add a note in each record in a certain field like “books by AAPI authors,” and port it all back into our database.

While likely not directly relevant, their site might be informative on how some process like this could modify Grouvee records in batch.

4 Likes

I’m still going to switch over to IGDB like I’m doing, but I’m really happy to see that Giant Bomb still lives. The current staff has bought the site and they’re going to be independent now.

3 Likes