Code Monkey home page Code Monkey logo

tudataset's Issues

Feature description for ENZYMES is missing

I have been playing with ENZYMES dataset and found that there are no feature names for node features provided in README or anywhere else (or I need help finding them). While in the referenced paper (Bogwardt et al. 2005) it could be found that features AA length, Total Waals etc. I do not know which feature name corresponds to which feature value in tudataset.

Please provide feature name for each feature dimension in the dataset. It would be a lot more easier to do anything with dataset, starting from data exploratory analysis, when features are stated in README, e.g., in MUTAG dataset.

How to represent a molecule with the same format as TUDataset from a molecular SMILES?

Hi,

I m trying to train a QSAR model some GNNs. But what I have is a list of molecular SMILES and the corresponding labels, for example:

x = ['CCCC', 'CCCO', 'CCCN' ...]
y = [1,1,0, ...]

Many of GNNs are taking your datasets as examples, which are preprocessed graph described by a list of files. But I did not find any way to convert to this format from molecular SMILES. Is there any function to convert molecular SMILES to the same format?

Thanks

how should I write the class code ?(for processing the data set so that it can be pytorch_geometric)

Analogous to the TUDataset, I have now understood and made the original format consistent with TUDataset, how should I write the class code ???(for processing the data set so that it can be pytorch_geometric)
I have prepared the raw data like this:

(1) XX_A.txt (m lines)
sparse (block diagonal) adjacency matrix for all graphs,
each line corresponds to (row, col) resp. (node_id, node_id)

(2) XX_graph_indicator.txt (n lines)
column vector of graph identifiers for all nodes of all graphs,
the value in the i-th line is the graph_id of the node with node_id i

(3) XX_graph_labels.txt (N lines)
class labels for all graphs in the dataset,
the value in the i-th line is the class label of the graph with graph_id i

(4) XX_node_attributes.txt (n lines)
matrix of node attributes,
the comma seperated values ​​in the i-th line is the attribute vector of the node with node_id i

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.