ZORA API Hackathon In Review
ZORA recently released a new API (application programming interface), which enables anybody to query their indexer for NFT market data, and held a hackathon to give it a good shakedown run. By encouraging a bunch of people to hammer away at their API server, ZORA was able to monitor and stress-test their systems under high load, which is the type of real-world testing you need to perform when shipping any product at scale.
The hackathon ran for four days, and offered prizes of 6.9 ETH, 4.20 ETH, and 1.01 ETH to the top three projects, as voted on by Zorbs holders.
By offering a generous hackathon prize pool just as the market was melting down and people were scrambling for funding, ZORA all but ensured a strong turnout for the hackathon, and there was a large pool of participants.
As one of those participants, I spent four days straight running on lots of caffeine, not enough sleep, jamming on code, and more or less having a great time (aside from some annoying slowdowns when it came to the finer aspects of Objective-C’s limitations as a programming language). I’d like to share some thoughts on my hackathon experience, what I learned, and what I’d do differently next time around.
I’ll also share some thoughts on the ZORA API itself, and how I see it playing a critical role in surfacing on-chain data and metrics that might otherwise escape our attention. This type of analysis provides insight into overlapping communities within the NFT ecosystem, and by extension offers an entirely new data set to explore for those seeking deeper market understanding.
Finally, I’ll share my wishlist for future API endpoints, and some thoughts on improvements that could be made to the hackathon voting process.
Creative Goals and Unexpected Detours
I had initially planned on creating a more full-fledged NFT analytics demo, showcasing not only collection data such as number of holders and sales volume, but also things like an individual token’s image, attributes, rarity, and sales data. Instead, I was struck by inspiration partway through the hackathon…which turned into more of a fun puzzle to follow than anything else. You know how it goes.
In the end, I made the decision that what I had accomplished in getting native Objective-C code to work with GraphQL without any third party frameworks or libraries was enough of a success in its own right, and left my attempt at some rough on-chain data analysis in as a secondary feature of sorts. So what strange puzzle distracted me from my primary objective?
I was pondering one of the goals of the hackathon (bringing utility to Zorbs holders), combined with a question surfaced by xuannu.eth, and decided I would like to know the top shared communities across Zorbs holders, where “shared communities” would be represented by the tokens held across the most wallets, ranked in descending order.
I knew in principle the steps I would need to take to get that data — first, I’d need to get a list of all token holders for a given project. Easy enough, Zora offers that as one of the API endpoints.
From there, I would need to go through each wallet address returned by the query, and get a list of the tokens it contains. In this particular instance, I didn’t care as much about the number of tokens a wallet held for any given collection, so instead of keeping track of each individual token, I merely kept an overall list of unique contracts for tokens in a wallet, which is the minimum amount of data I would need to make the shared community calculations.
After repeating this process for every token holder for a given smart contract, it’s possible to calculate which tokens are held across the widest number of wallets, and by extension calculate the largest communities of token holders (for other projects) that exist within a collection. How do we go about doing so?
For each unique collectionAddress held by a given wallet address, add it to a larger data collection (in this case, I used NSCountedSet, an Objective-C data structure that allows you to keep track of the number of times a given piece of data has been added to a unique set). From there, it’s easy to see which collectionAddresses are found amongst the largest number of wallets, and thus see which “communities” exist within the original collection that we’re analyzing.
While I wanted to calculate the top shared communities for Zorbs holders, I knew that calculating that data would take a significant amount of time (assuming the bare minimum of one query needed per Zorb holder, that would be ~33,700 GraphQL requests), and wanted to make sure my algorithm worked correctly on a smaller collection first, so I chose one of my long-time favorites, crypto_coven, to act as a testing ground.
I could immediately see that there were some flaws with my calculations. First, the count was off. The top community for crypto_coven holders should be crypto_coven itself, which was correct, but the count didn’t match the number of crypto_coven holders. There were issues with the API server returning bad gateway errors from time to time throughout the hackathon, so I wrote this off a mix of that plus potential bugs in my own code. The numbers were still good enough to make some rough calculations, in any event.
I encountered a different issue while looking up info on the top communities (something which I would have liked to automate, but ran out of time before being able to implement) — spam tokens were taking up a majority of the top slots. For example, here are the top shared communities for crypto_coven holders:
Some platforms such as etherscan and OpenSea flag or hide spam tokens in order to protect users from potential phishing attempts, so a lot of the time it’s easy to forget they are even in our wallets. As far as I could tell, there was no easy indicator within Zora’s API to seen if a token was flagged as spam or not. From a decentralization point of view, this lack of censorship could be seen in a positive light. At the same time, it’s incredibly helpful to be able to easily and quickly establish guardrails for those who need them, and making it easy for product creators to filter out or flag spam tokens would go a long way towards making the space safer for everyone.
If I had more time during the hackathon, I would have liked to implement some sort of spam filtering mvp, likely built off of cross-referencing transfers and mint/sales events, and filtering out tokens that were airdropped to users — this would essentially be the first step in triaging potential candidates for further spam/phishing analysis. If nothing else, it would remove some noise from the program output, even if it caught some innocent airdrops in the process. Alas, this exercise in programming logic had to remain nothing more than the seed of an idea for lack of time.
Mistakes Made and Lessons Learned
Coming from the cold barren wastelands of web2 and thus being unfamiliar with GraphQL prior to the release of ZORA’s API, I was at a bit of a disadvantage and had to learn as I went, distilling the bare minimum needed to accomplish my goals in a short time period.
Objective-C isn’t the best language to use for things like this, and has its own caveats to be aware of — for example, there’s no built-in way to rate-limit URL connection requests, so I built a really basic rate limiter myself — only, I got the timing wrong.
I realized after the fact that this code absolutely hammers on ZORA’s API server; in my haste to start building, I misread 30 requests per minute as 30 requests per second and never looked back. I did have to keep throttling back the rate limit on my requests during testing, but I never managed to make the connection to double-check the project specs until after the hackathon was over — I just figured that ZORA’s servers were under some strain from all the other hackathon participants.
My apologies for any headaches this caused the ZORA engineering team during the hackathon, it was completely unintentional on my part! At least you know your API servers can handle heavy traffic? 😅😅😅
Another mistake was waiting until the last minute to slap together a UI. I should have taken the time to get a simple, clean UI implemented first, rather than assuming I’d have time to tackle it before the end of the hackathon. Instead, I ended up with the Kevin of hackathon UI’s.
I didn’t get a chance to optimize my code, and it’s incredibly inefficient from a memory usage standpoint — bottlenecks in the code are causing surges in the GraphQL queries being sent to the API server, resulting in unnecessary 502’s and eventually 429’s as it hammers the ZORA servers. I had originally planned on commenting and documenting my code more extensively, but much like a decent UI, that plan also fell by the wayside. Programming always takes longer than you think it’s going to, no matter how confident you are in your assumptions.
Thoughts on the ZORA API
Coming from the world of web2 where so much still runs on REST, Zora’s API using GraphQL is a breath a fresh air.
I’ve tried out a bunch of different web3 APIs on the backend side of things over the past 9 months, and so far the ZORA API is my favorite one by far, for the simple reason that GraphQL is going to win out over traditional RESTful APIs in my book any day of the week after this hackathon.
ZORA did a great job providing easy access to commonly used data points and aggregate data such as ownersByCount and floor prices, with the ability to filter down to just the data you need, and nothing more (the lookbackHours feature for sales volume is an especially nice touch)
Doing the same type of collection analysis using the OpenSea API is much more complicated, requiring multiple RESTful queries (and all of the associated juggling and parsing of data structures) to get the same information that can be had with a single GraphQL query on ZORA API.
That being said, I think the ZORA API could be even better.
Here’s what I’d like to see:
First and foremost, I’d like to see parity across marketplaces as far as the data provided by the ZORA API goes. While sales data from markets like OpenSea is accessible through the ZORA API, OpenSea listings are not currently included when calculating the floorPrice aggregate stat. Likewise, data sources other than mainnet are not yet included in results.
I would like to see some sort of status flag for tokens that have been marked as spam on platforms like etherscan, in order to make it easier to filter them out of results. This will save a lot of people headaches in the long run.
After realizing my mistake regarding the API rate limit, it quickly became apparent that my original idea for calculating the top shared communities among Zorb holders would become infeasible at scale. Instead, I’d like to see ZORA offer some of these things that are inefficient to calculate on the client end of things as data points themselves.
For example, I’d like to see an easier way to get a list of holdings per wallet on a unique-contract-only basis. For some analytics purposes, I care less about the exact number of holdings someone has for a specific project, but rather just want to know if they are holding the project at all.
Further, I’d like to see a convenience accessor for this type of data on a per-project basis — something that returns the data you’d get if you crunched the numbers for every individual holder — so you’d be able to easily identify the top 10 shared communities among tokenholders.
This data serves a broad need in the space for community-building, marketing, and identifying emerging trends, and is all sitting there on chain just waiting to be surfaced.
On a broader level this serves as a form of optimization on the backend infrastructure — providing convenience accessors for these types of queries would be much more efficient than tons of third party clients hitting the ZORA API server every day to grab the data necessary to make these calculations on their own.
Unfortunately the hackathon was marred by some issues when it came to the Zorbs holder vote on the top three projects, which provides an opportunity to examine what went wrong, and brainstorm how similar situations might be avoided in the future.
The initial snapshot vote had to be scrapped due to some hackathon submissions showing up multiple times under slightly different names, causing confusion. This was more of a QA issue than anything else, but I wonder if there were any Zorbs holders who participated in the initial (scrapped) vote and missed out on the second one.
Snapshot doesn’t allow randomization of candidates on ballots, and projects were listed in the order they were submitted — ballot position can have an effect on vote outcome as pointed out by @valcoholics1 during the hackathon wrapup on Twitter Spaces, so projects that were submitted earlier in the hackathon had an advantage. This particular issue has been raised previously in Snapshot discussions, but so far no progress has been made to address it.
It wasn’t initially clear that Zorbs holders only got one vote, no matter the number of Zorbs held. That was supposed to be the intent, as shown by the erc20-with-balance voting strategy adopted for this vote.
Except… you could select multiple projects when voting, and Snapshot would then assign a vote to each one. Some people realized they could select more than one candidate, but others (myself included) clearly did not. This was due to setting the voting system to Approval voting rather than Single choice voting. Clearly the winning strategy here would have been to have everyone vote for every candidate, thus creating a 20-way tie and splitting the entire prize pool of 12.11 ETH amongst all hackathon participants. Ah well.
Overall, I had a net positive experience with the ZORA API hackathon, and found the API easy to use, even as a complete newcomer to GraphQL. I’d like to see the API continue to grow, with more indexer and market data being made available to developers in the future. More clarity during the voting process would have been appreciated.