Code Monkey home page Code Monkey logo

crawler's Introduction

Neume Crawler

Crawl all music NFTs; supersedes neume-network/core. (Work in Progress)

The vision of neume is to index all web3 music activity and make it public. You can run your own instance of neume to index all data or you can use someone else's instance to just consume the indexed data using JSON-RPC calls. musicOS is a proud consumer of neume.

Supported platforms

The following platforms are currently crawled by neume:

Table of Contents

How to consume the indexed data?

The neume daemon exposes JSON-RPC methods to consume data.

Mental Model

neume saves each Track (type NFT) which includes information such as its tokenURI, metadata, current owner and tracks changes to it. Using getIdsChanged_fill function we can ask neume for what's new.

Note

  • Due to editions, multiple NFTs can point to the same song. The consumer can merge these NFTs while consuming.
  • The term track and NFT are sometimes used interchangeably.

Schema

The NFTs provided by neume follow a strict schema which can be found at neume-network/schema

getIdsChanged_fill(from, to)

getIdsChanged_fill is a method that returns inserts/updates to the database in the given block range.

Example

Let's suppose a new NFT was minted at block number 16572314.

Curl

curl -X POST https://sync1.neume.network/ -H 'Content-Type: application/json' --data '
{
  "jsonrpc": "2.0",
  "method": "getIdsChanged_fill",
  "id": "1",
  "params": [
    16572314,
    16572314
  ]
}
'

The result is a list of NFTs that was inserted or updated at 16572314. In our case, the result contains only one NFT. If other NFTs were also minted or updated in this block we should see them in the result.

id is the ID given to the NFT and value is the NFT itself.

[
  {
    id: {
      chainId: "1",
      address: "0xa6c4df945dbb1d71fe9a8d71ae93b8d5c2bbebe4",
      tokenId: "6975",
      blockNumber: 16572314,
    },
    value: {
      version: "2.0.0",
      title: "XYZ",
      artist: {
        version: "2.0.0",
        name: "Snoop Dogg",
        address: "0xE0036fb4B5A3B232aCfC01fEc3bD1D787a93da75",
      },
      platform: {
        version: "2.0.0",
        name: "Sound Protocol",
        uri: "https://sound.xyz",
      },
      erc721: {
        version: "2.0.0",
        createdAt: 16563504,
        transaction: {
          from: "0x59975dFE25845bF9C0eFf1102Ac650599c3f491a",
          to: "0x7Dd7fd8ACd39e557A6c570965eeA2b4008c4Dd1c",
          blockNumber: 16572314,
          transactionHash: "0xa80214bad12482943020cd539c099b45c2acc86373bd28d0173303d6042049c0",
        },
        address: "0xa6c4df945dbb1d71fe9a8d71ae93b8d5c2bbebe4",
        tokenId: "6975",
        tokenURI: "ar://T5aZ_6FYBIRnvMX7O6qbE4ZSEnysMlZbHhlnFmUvAMk/0",
        metadata: {
          animation_url: "ar://_VUsZCeJQWVD4hFcgb5w39GqekARiN8ff-ptw63lH28",
          artist: "Snoop Dogg",
          artwork: {
            mimeType: "image/png",
            uri: "ar://gB04RsiCpgEbwxKs-tRh5gsycsUo-tGDdWQIkM51f6U",
            nft: null,
          },
          attributes: [
            {
              trait_type: "XYZ",
              value: "Song Edition",
            },
          ],
          bpm: null,
          credits: null,
          description:
            "Music OEs. Buy as many as U want\n\n.00420Ξ (8$)\u0003--> Open 72 Hours \n\nGonna give 1 of my vintage cars from tha compound 2 tha golden egg winner... Only If we hit 42,000 mints\n\nClaimable only in Inglewood California",
          duration: 129,
          external_url: "https://www.sound.xyz/snoopdogg/xyz",
          genre: "Hip-hop & Rap",
          image: "ar://gB04RsiCpgEbwxKs-tRh5gsycsUo-tGDdWQIkM51f6U",
          isrc: null,
          key: null,
          license: null,
          locationCreated: null,
          losslessAudio: "ar://_VUsZCeJQWVD4hFcgb5w39GqekARiN8ff-ptw63lH28",
          lyrics: null,
          mimeType: "audio/mpeg",
          nftSerialNumber: null,
          name: "XYZ",
          originalReleaseDate: null,
          project: null,
          publisher: null,
          recordLabel: null,
          tags: null,
          title: "XYZ",
          trackNumber: 1,
          version: "sound-edition-20220930",
          visualizer: null,
        },
      },
      manifestations: [
        {
          version: "2.0.0",
          uri: "ar://_VUsZCeJQWVD4hFcgb5w39GqekARiN8ff-ptw63lH28",
          mimetype: "audio",
        },
        {
          version: "2.0.0",
          uri: "ar://gB04RsiCpgEbwxKs-tRh5gsycsUo-tGDdWQIkM51f6U",
          mimetype: "image",
        },
      ],
    },
  },
];

Let's suppose the previously minted was NFT was transfered at block number 16572315 from 0x7Dd7fd8ACd39e557A6c570965eeA2b4008c4Dd1c to 0x076D520333b2163C51897FAC8939a3606e5b4a95. We will get the following updated NFT as the result.

Curl

curl -X POST https://sync1.neume.network/ -H 'Content-Type: application/json' --data '
{
  "jsonrpc": "2.0",
  "method": "getIdsChanged_fill",
  "id": "1",
  "params": [
    16572315,
    16572315
  ]
}
'

Notice, how the NFT is completely same with the execption of value.erc721.transaction as the NFT was transferred.

[
  {
    id: {
      chainId: "1",
      address: "0xa6c4df945dbb1d71fe9a8d71ae93b8d5c2bbebe4",
      tokenId: "6975",
      blockNumber: 16572314,
    },
    value: {
      version: "2.0.0",
      title: "XYZ",
      artist: {
        version: "2.0.0",
        name: "Snoop Dogg",
        address: "0xE0036fb4B5A3B232aCfC01fEc3bD1D787a93da75",
      },
      platform: {
        version: "2.0.0",
        name: "Sound Protocol",
        uri: "https://sound.xyz",
      },
      erc721: {
        version: "2.0.0",
        createdAt: 16563504,
        /* Note: only the transaction object has changed */
        transaction: {
          from: "0x7Dd7fd8ACd39e557A6c570965eeA2b4008c4Dd1c",
          to: "0x076D520333b2163C51897FAC8939a3606e5b4a95",
          blockNumber: 16572315,
          transactionHash: "0xA3B232aCfC01fE943020cd539c099b45c2acc86373bd28d0173303d6042049c0",
        },
        address: "0xa6c4df945dbb1d71fe9a8d71ae93b8d5c2bbebe4",
        tokenId: "6975",
        tokenURI: "ar://T5aZ_6FYBIRnvMX7O6qbE4ZSEnysMlZbHhlnFmUvAMk/0",
        metadata: {
          animation_url: "ar://_VUsZCeJQWVD4hFcgb5w39GqekARiN8ff-ptw63lH28",
          artist: "Snoop Dogg",
          artwork: {
            mimeType: "image/png",
            uri: "ar://gB04RsiCpgEbwxKs-tRh5gsycsUo-tGDdWQIkM51f6U",
            nft: null,
          },
          attributes: [
            {
              trait_type: "XYZ",
              value: "Song Edition",
            },
          ],
          bpm: null,
          credits: null,
          description:
            "Music OEs. Buy as many as U want\n\n.00420Ξ (8$)\u0003--> Open 72 Hours \n\nGonna give 1 of my vintage cars from tha compound 2 tha golden egg winner... Only If we hit 42,000 mints\n\nClaimable only in Inglewood California",
          duration: 129,
          external_url: "https://www.sound.xyz/snoopdogg/xyz",
          genre: "Hip-hop & Rap",
          image: "ar://gB04RsiCpgEbwxKs-tRh5gsycsUo-tGDdWQIkM51f6U",
          isrc: null,
          key: null,
          license: null,
          locationCreated: null,
          losslessAudio: "ar://_VUsZCeJQWVD4hFcgb5w39GqekARiN8ff-ptw63lH28",
          lyrics: null,
          mimeType: "audio/mpeg",
          nftSerialNumber: null,
          name: "XYZ",
          originalReleaseDate: null,
          project: null,
          publisher: null,
          recordLabel: null,
          tags: null,
          title: "XYZ",
          trackNumber: 1,
          version: "sound-edition-20220930",
          visualizer: null,
        },
      },
      manifestations: [
        {
          version: "2.0.0",
          uri: "ar://_VUsZCeJQWVD4hFcgb5w39GqekARiN8ff-ptw63lH28",
          mimetype: "audio",
        },
        {
          version: "2.0.0",
          uri: "ar://gB04RsiCpgEbwxKs-tRh5gsycsUo-tGDdWQIkM51f6U",
          mimetype: "image",
        },
      ],
    },
  },
];

Pseudocode

async function consume(from, to) {
  let {
    data: { result },
  } = await axios.post("https://sync1.neume.network/", {
    jsonrpc: "2.0",
    method: "getIdsChanged_fill",
    id: "1",
    params: [from, to],
  });

  for (r of result) {
    const id = r.id;
    const nft = r.value;

    if (db.exists(id)) {
      await db.update(id, value);
    } else {
      await db.create(id, value);
    }
  }
}

for (let i = 0; i < latestBlockNumber; i += 5000) {
  await consome(i, i + 5000);
}
  • The db depends on the database for your application.
  • https://sync1.neume.network/ is an endpoint where the neume daemon is running.

The complete JSON-RPC reference can be found here.

Run your own instance of Neume

The neume tool is exposed as a CLI with the following commands.

Commands

General help can be found using npx neume --help or command specific help can be found using npx neume <command> --help.

crawl

The crawl command crawls the blockchain for data.

filter-contracts

The filter-contracts command finds contracts that will be crawled by the crawl command.

A few platforms use the factory pattern to deploy a new contract for each artist. Therefore, we need to find such contracts and later crawl them for data.

daemon

The daemon command starts the JSON RPC server, and periodically runs the crawl and filter-contracts command. The JSON RPC server is used to consume the data.

Most people will only need this one command.

init

The init command creates the required files and folders required to start the neume instance for the first time.

Requirements

  • Unix-like operating system
  • Access to an archive node
  • Access to an IPFS gateway
  • Access to an Arweave gateway

Setup

Create a new npm project

npm init -y

Install crawler as a dependency.

npm install github:neume-network/crawler

Initialize the project

npx neume init

The init command will create files and folders at the current working directory. The files and folders are required by neume for its configuration and data storage.

Dissecting the init command

config.js

neume stores the configuration common for all commands in a file config.js. This file is read from the current working directory.

It contains configuration such as RPC hosts, IPFS gateway, and concurrency. A sample file can be found at assets/config.sample.js

The init command will create the config.js file with sample values. Please populate it according to your needs. The comments inside the file should guide you.

data directory

The data directory is the database for neume. It should also be present in the current working directory.

Running your first crawl

Once the project is initialized you can use the daemon command to start crawling.

npx neume daemon --port 8080

Check the man page for the daemon command to see what else can be done with it.

Developement

Developers of neume can clone the repo and run the init command inside the repo and start crawling.

node --loader ts-node/esm neume.ts init

In fact, all commands can be run using ts-node.

node --loader ts-node/esm neume.ts daemon

Architecture Overview

Neume heavily depends on a RPC node and avoids centralized servers.

Pseudocode for crawl command

from = process.argv[2]
to = process.argv[4]

contracts = import('./data/contracts.json')

for contract in contracts:
  // use eth_getLog to find all transfer events
  // in the given block range
  logs = getLogs(Transfer, from ,to)
  for log in logs:
    nft = {}
    // extract tokenId from log
    nft.tokenId = decodeLog(log)
    // use eth_call to get
    nft.tokenUri = callTokenUri(tokenId)
    // get the data behind the tokenURI
    nft.tokenUriContent = getTokenUri(tokenUri)
    // transform all the collected data according
    // to neume schema
    nft = transform(nft)

    saveToDB(nft)

crawler's People

Contributors

il3ven avatar reimertz avatar djfnd avatar

Stargazers

Erfan avatar tranqui.eth avatar Kevin Neaton avatar sean avatar Yudi Kubota avatar @estmcmxci avatar netop://ウエハ avatar Michael Demarais avatar Chris avatar ZGQ Inc. avatar  avatar pugson avatar

Watchers

Kevin Neaton avatar  avatar

Forkers

erfan55 vrsex

crawler's Issues

neume 2.1 ?

Our current roadmap for neume is to support decent, lens and make the crawler more generic. The below are few technical changes which I propose for this roadmap.

Save Tracks instead of NFTs

Our schema currently represents an NFT. However, multiple NFTs can represent the song (track). This leads to duplication of data. The consumer of neume has to merge NFTs into tracks.

We stuck with NFTs because it was simpler and levelDB isn't suitable for tracks.

Pros of moving to Tracks

  • It will make the crawler more generic because not every protocol will publish audio as NFTs. For eg. lens.
  • We will save space since multiple NFTs can point to the same track.

Problem with saving tracks in levelDB

LevelDB is a key-value database. Imagine we have the following track in our database. owners is the list of owners for this track.

{
  ...
  "owners": [],
  ...
}

If two threads simultaneously update the owners field they will have to overwrite everything.

// Thread 1
const oldTrack = getTrack(id)
const newTrack = oldTrack.owners.push('0x123')
updateTrack(newTrack)

// Thread 2
const oldTrack = getTrack(id)
const newTrack = oldTrack.owners.push('0xabc')
updateTrack(newTrack)

Let's suppose thread 2 finishes last. We have the following value in our database.

{
  ...
  "owners": ["0xabc"],
  ...
}

Databases like MongoDB allow to insert values into a nested field but unfortunately levelDB doesn't. We can write code and add this functionality in levelDB but it won't be flexible. If we have another field like owner in the future we will have to write more code. Not ideal.

Using sqlite to solve the above LevelDB problem

I propose to give sqlite a try. To save effort we can use ORMs such as sequalize.

We dismissed sqlite before because it was pointed out that it has slow write speed. I argue that speed isn't our top priority and how slow can sqlite be.

Make strategies more generic

To be written...

Discussion about this alternative crawler

I have created a proof of concept crawler to explore a better architecture for neume. Currently it only crawls sound-protocol.

Note: This crawler is not perfect and cuts corner but I hope we can incorporate some of these things into neume.

What are the changes?

Use an imperative programming paradigm

This means directly calling functions like callTokenuri and instead of posting extraction-worker messages we now use async/await to send messages. We are still using extraction-worker and getting the same benefits such as rate limiting and timeout.

We won't have to deal with worker threads, memory leaks in lifecycle and onboarding developers will be easier. The existing architecture has been proven a little difficult to explain.

We still have concurrency by using p-map.

https://github.com/il3ven/music-nft-crawler-poc/blob/3e35181da889387733d26b1c918f3eb726e67d6e/index.mjs#L85-L95

Use sqlite as the database

Flat-files fail us when we need to find data or update it. A database of some kind is necessary to neume. For this POC I have used sqlite as a key-value database where the key is an ID and the value is JSON data. In case, we need to share our data using IPFS we can because essentially it is just JSON.

Miscellaneous

  • We don't have to read from disk every time
  • It is easier to adapt this architecture to to parallelise each step since things are in memory or it is easier to load from sqlite. neume-network/strategies#244

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.