๐ซ How to reach me: [email protected]
๐ Pronouns: he/him
๐ญ I'm currently working on helping cybersecurity education grow at UCLA!
๐ฌ I'm interested in software development, cybersecurity, and educational technology.
A nice feature to have would be image download on datasets. This can probably go with the Image component. A reference link is below. An enhancement would be to try and zip and download all of the images within a dataset some how. If there are any ideas on tackling multiple image downloads with one button, I am open to suggestions!
Like many dataset contribution websites such as Kaggle, there is some interface allowing users to explore and search for existing datasets. Our dashboard will be the main page where users can look through all possible datasets, search, and select which datasets they would like to view.
Proposed Solution
Action Items
make a request to /api/datasets to retrieve all of the information about all deployed datasets and set a state that contains a list of dataset objects
create a Dataset.jsx component that displays a card showing information corresponding to a particular dataset object
create a map between Dataset objects from the state and Dataset.jsx components. Create a grid displaying the cards for the Dashboard. Ensure that each card is also a link pointing the corresponding DatasetPage to display the specific information about that dataset.
style the page appropriately
(optional) add a search bar that filters the results on the page based on the query. Reference #25.
update the implementation of DatasetPage.jsx to pass the relevant information as a prop or a URL parameter (most likely prop)
Optional Currently, we only have the ability to retrieve information about a database by id which is inefficient and is difficult to try and look up the relevant machine learning dataset that is needed. This is where this endpoint comes in. It allows a query to the backend to find a dataset that matches the query of the user.
Proposed Solution
There are a few ways to implement search depending on preference. The simplest is to determine whether or not the queried string appears as a substring in the relevant field of the database. In SQL, there is the LIKE keyword that allows you to do this.
Action Items
POST is the only method supported for this endpoint. Setup the corresponding router and endpoint in routes/ and app.py.
Support the "options" field in the request. We can search by either the name of the dataset or the description of the dataset. If it isn't one of these options, then simply return an error status and 400.
Support the "query" field in the request. This will be what will actually you will use to interface with the database. Set up the corresponding SQLAlchemy query to the Dataset database to return a list of potential results.
Recommend adjusting the naming of components to be uniform. I believe "Login" and "Registration" are standard but "Sign Up" and "Sign In" are acceptable as well. Currently we have a component titled LoginInForm.jsx which conflicts with the naming convention used by pages.
Remove unused variable and dependencies such as formEntries and removing unused dependencies such as import * as ReactDOM from "react-dom/client"
Add call to handleEnter. Currently this is unused. Recommend adding to the "submit" <input> a onKeyPress DOM event.
For SignIn.jsx, I don't recommend using spacers such as and {" "}. If you are interested in adding space between elements, I recommend adding a class or id to specific elements and handle spacing using margin and padding for each element. HTML and CSS use something called "the Box Model" to help better understand where width, height, margin, and padding all fit together.
A little bit of a complicated feature so we will need a few people to work on this. For this feature we will be using this example from the ART. In order to perform the Activation Defense, we will need to do a few things. ActiviationDefense requires a model and a dataset (ActivationDefence(classifier, x_train, y_train) the reason why there is x_train and y_train is because x are the images and y are the labels). It returns a list identifying which testing data examples are poisoned.
(frontend) add a UI for model uploads so users can upload a saved copy of their model. We will also need a field on the Image component that will be a boolean for whether or not the image is poisoned.
(backend) It will also receive a list of image objects from the frontend as well so that this can be passed into the model. There may be ways to optimize this in the future such as saving a copy of the images on the blockchain locally. However, for now we will assume that we will need to convert base64 strings to image files. You will also need to construct a custom dataloader for the images you download from the frontend. https://keras.io/api/data_loading/
(backend) pass the loaded model into the ActivationDefense and return the list of results to the frontend
(frontend) taking the list of results from the backend, change the boolean of the image object for the poisoned data to true so that the UI clearly indicates that an example is poisoned.
This issue can have a few people working on it. Many of the API handlers in the src/api directory are not implemented and are needed by some of the other pages. Tackle these implementations corresponding to the API design from the backend.
This is the most critical smart contract for the overall application. Dataset is decentralized version of a machine learning dataset. It is responsible with keeping track of images contributed by users and additional metadata about a dataset.
Proposed Solution
This ticket will involve creating one file, contracts/Dataset.sol, and writing this smart contract in Solidity. For the initial implementation, this contract will be relatively simple and involve creating Image structs, adding data to these structs
Action Items
rename State.sol to Dataset.sol. The structure will be similar to the initial implementation of this contract
create an Image struct. This will be used to keep track of individual datapoints contributed to the machine learning dataset. Add the appropriate fields to this struct. Add an additional boolean field called approved to keep track of whether or not the image is approved to be within the dataset. Also add a corresponding mapping called data between an image id (uint) and Image structs.
create the DataCreated and DataVerified events. These will be emitted from their corresponding functions.
implement the createData function. This will take in the parameters needed to create an Image struct and this is added to the datamapping.
implement the approveData function. This will take in the parameter of an image ID and it will change the default value of the approved boolean on an Image struct from false to true.
add an interface to this smart contract using truffle, web3.js, and Metamask. This task should done together with #7 and #18.
(optional) The more sophisticated part of this smart contract involves handling voting from multiple people to approve the dataset. Additionally, some authentication to verify that users are not self approving their own Images. A simple implementation can initial involve seeing if the number of votes for approval surpasses a certain threshold before allowing an Image to be approved.
Testing smart contracts is a bit difficult. For now, make sure it simply compiles with truffle and solc. We will be testing these implementations through application implementation.
The final task of every software project is to ensure that there remains excellent documentation for future developers to try and interact with your code or understand your project. The goal here is to update README.md with relevant information about our project and to ensure that access to relevant links is available. Below are a few other READMEs to reference for looking for what to add.
Make sure to include:
paper abstract
authors
how to run the application locally / install relevant dependencies and frameworks
photo from the paper or application (shows a rough overview of how the application works)
star the repo! Helps show some love for your own project! ๐
To allow for user authentication, we must use a session cookie in order to ensure that we are logged in. This requires a few parts including using state management in React.js as well as react-cookie in order keep track of whether or not a user is logged in.
Proposed Solution
This requires #20. This will require state management and navigation manipulation.
Action Items
add a state to App.js that is a boolean which keeps track of whether or not a user is authenticated. This will cause certain routes to be restricted or not depending on what it's set to. This value will be manipulated throughout the application. One thing to also consider is whether or not to use useContext if you find that you are beginning to pass down your authentication state a lot.
interface with react-cookie on LoginForm.jsx to cause it to set the cookie based on the returned value from /api/session. This should also manipulate the global state for authentication.
While we have a page for users to contribute images to datasets, we don't have a way yet for users to contribute datasets. This page solves this problem.
Proposed Solution
Create a page with a form that allows users to add a dataset. This will require interfacing with the backend and the blockchain to ensure that all data is saved properly.
Action Items
implement a form for information such as name and description. This would require a couple of states and <input> fields.
once the input fields are filled, add a submit button that will make a request to the blockchain first to deploy a Dataset contract. This will require interfacing with the DatasetManager smart contract. This should provide a contract ABI for the new Dataset contract
Once the new contract ABI is received first make a request to POST /api/abi to save the ABI and then take the resulting ABI id and make a request to POST /api/dataset to save information about the new dataset
navigate back to the dashboard once this is completed using useNavigate
Resources
Reference some existing PRs and code for how to implement this.
For our application, we need to be able to easily track metadata about the contributed datasets so that we can easily know where our datasets are. This is because our initial implementation of our blockchain will have the Dataset smart contract for each dataset point to a unique address. We will also need to include some additional information about our datasets such as a description and title so that the dataset is easily searchable. We may extend this feature later on to include more information about datasets.
Proposed Solution
Below is a set of resources and a list of action items that will help with implementing this feature. There are some modifications that need to be made to the examples on the resources below in order to complete this feature. There are three files you will need to create/modify to complete this ticket: app.py, routes/dataset.py, and models/dataset.py. For similar instructions or references, check out the implementation of /api/user in #8.
Action Items
implement the defined schema in the Flask-SQLAlchemy ORM. Create the Dataset class in models/dataset.py to include an id parameter (this can be a db.Integer and be the primary_key of the table, is unique, and is not nullable). Add additional columns for dataset name, description, and address (all strings of an arbitrary size - 80 is an acceptable value). Additionally, add a json() function which returns a Python Dictionary of itself where the keys and values are all of the fields in the table. For creating database ids, you can create a countstatic variable that keeps track of the number of users that are created and increments the static variable every time a new user is created to be used as the value of the newly created user id.
update app.py to include an endpoint for /api/dataset/<id> and make sure to pass dataset(id) to include router(id) handler implemented in routes/dataset.py
implement the GET handler in routes/dataset.py to return all of the information about a user based on the given user ID. get(id) should query the Dataset table based on the id and get the first() dataset object of that query. Once that object is queried, if the dataset object is None(no dataset by that id), then it should return an empty JSON and 404. If the dataset object is found, then return the value of the user.json() method and 200.
implement the POST handler in routes/dataset.py to create a new dataset and return all of the information about a dataset. Check the incoming request's JSON body using data = request.json. If name, description, or address don't exist then return an empty JSON and 400. If a dataset exists in the database with the same name, return an empty dictionary and 409. If this a unique new dataset, then create a new Dataset object with the corresponding name, description, and address and use the corresponding dataset.save_to_db() function. Return the value of dataset.json() and 200.
implement the PUT handler in routes/dataset.py to edit information about an existing dataset. Check the incoming request's JSON body using data = request.json.If name, description, or address don't exist then return an empty JSON and 400. Check if the dataset is in the database. If the dataset is not in the database, then return an empty dictionary and 404. If the dataset is found then update the values of the dataset object and save to the database using the corresponding dataset.save_to_db() function. Return the value of dataset.json() and 200.
implement the DELETE handler in routes/dataset.py to delete an existing dataset. Check if the dataset is in the database. If the dataset is not in the database, then return an empty dictionary and 404. If the dataset is found then use the corresponding dataset.remove_from_db() function and return an empty JSON and 200.
Here are some tests that can be done to verify the implementation of POST, GET, PUT, DELETE.
For our contribution application, similar to websites like Kaggle, we will need to allow users to create accounts and log in (authentication is a more complicated task we will split between this issue and #20). This endpoint handles users so that when an account is created or when information about a user is requested, this handles those requests. Additionally, when the user wishes to modify information about the account, this endpoint is able to update and delete that information.
Proposed Solution
Below is a set of resources and a list of action items that will help with implementing this feature. There are some modifications that need to be made to the examples on the resources below in order to complete this feature. There are three files you will need to modify to complete this ticket: app.py, routes/user.py, and models/user.py.
Action Items
implement the defined schema in the Flask-SQLAlchemy ORM. Modify the User class in models/user.py to include an id parameter (this can be a db.Integer and be the primary_key of the table, is unique, and is not nullable. Make sure to remove the primary_key field from the username column in the starter code). Additionally, add a json() function which returns a Python Dictionary of itself where the keys and values are all of the fields in the table except password. For creating user ids, you can create a countstatic variable that keeps track of the number of users that are created and increments the static variable every time a new user is created to be used as the value of the newly created user id.
update app.py to include the user(id) endpoint uses the router(id) handler implemented in routes/user.py
implement the GET handler in routes/user.py to return all of the information about a user based on the given user ID. get(id) should query the User table based on the id and get the first() user object of that query. Once that object is queried, if the user object is None(no user by that id), then it should return an empty JSON and 404. If the user object is found, then return the value of the user.json() method and 200.
implement the POST handler in routes/user.py to create a new user and return all of the information about a user. Check the incoming request's JSON body using data = request.json. If username or password don't exist then return an empty JSON and 400. If a user exists in the database with the same username, return an empty dictionary and 409. If this a unique new user, then create a new User object with the corresponding username and password and use the corresponding user.save_to_db() function. Return the value of user.json() and 201.
implement the PUT handler in routes/user.py to edit information about an existing user. Check the incoming request's JSON body using data = request.json. If username or password don't exist then return an empty JSON and 400. Check if the user is in the database. If the user is not in the database, then return an empty dictionary and 404. If the user is found then update the values of the user object and save to the database using the corresponding user.save_to_db() function. Return the value of user.json() and 200.
implement the DELETE handler in routes/user.py to delete an existing user. Check if the user is in the database. If the user is not in the database, then return an empty dictionary and 404. If the user is found then use the corresponding user.remove_from_db() function and return an empty JSON and 200.
Here are some tests that can be done to verify the implementation of POST, GET, PUT, DELETE.
This differs slightly from #8. We need a separate endpoint and database to handle sessions. The distinction is that while a session corresponds to one cookie and one user, a user may be able to have multiple sessions. In other words, the session is what determines whether or not a client is logged in (i.e. they have a session cookie).
Proposed Solution
Below is a set of resources and a list of action items that will help with implementing this feature. There are some modifications that need to be made to the examples on the resources below in order to complete this feature. There are three files you will need to create/modify to complete this ticket: app.py, utils.py,routes/session.py, and models/session.py.
Action Items
create a Session table using the SQLAlchemy ORM in models/session.py. Make the fields be a session id (db.Integer and primary key), a session cookie value (a randomly generated db.String of length 24), and a user id (foreign key corresponding to which user this session belongs to in the User database). If you are ambitious, you can think about how to implement a TTL or how to make session tokens expire but for the initial implementation, this is not necessary.
update app.py to include the session() endpoint using the router() handler that you will implement in routes/session.py. Make sure to also update the corresponding endpoints with the @authenticated decorator once that is implemented.
GET will be used to return the username and user id given a session token. The point of this endpoint is to help users easily access their user id to access data on the /api/user endpoint. If the session token is not within the database, then return just a status and 401. Otherwise, return a status and 200.
POST will create a new session. It will require the username and password of a user. Check the User database to see if there is an entry that corresponds to both values. Generate a new session token and set the cookie of the user to be the new session. Return a status with a success message and 200. Otherwise, return a status with an error and 401.
DELETE will delete an existing session. Check if the cookie in the request corresponds to an existing session. If a session is found, delete the session and return a status with a success and 200. Otherwise, return a status with an error and 401.
You will also need to implement a special middleware function to check if other endpoints require authentication. This will require implementing a Python decorator function called @authenticated. Place this function inside utils.py. All this function will do is query grab the session cookie from the Flask request, check if this is in the Session database (i.e. whether or not a SQLAlchemy query does not return None). If it is not in the database, then return a function that returns a not authenticated error and 401. If it is in the database, then return the function that is passed as an argument into the decorator (this will be the endpoint function). Verify this implementation by adding the decorator to endpoints that require authentication.
Note: Unlike other endpoints, there will be no PUT endpoint since the client will not interact with the endpoint in this manner.
This issue is most likely one of the more sophisticated parts of this project. This comes from an idea recommended by @Herefersomepennys involving implement an automatic defense for data poisoning attacks. The goal of this is to introduce adversarial perturbations that are "friendly" or augment the features of the existing data distribution rather than "poison" the dataset. This is quite difficult but was introduced by this paper along with the following code.
Proposed Solution
Implementing data poisoning attacks is quite difficult. It requires setting up an MLOps pipeline to effective train data and also perform adversarial perturbations. Below are a few options on potential solutions we can approach this problem.
we utilize the attacks and defenses from https://github.com/Trusted-AI/adversarial-robustness-toolbox. This library can save us some time with the attack implementation but may require some additional work to preprocess the data to use this library.
we implement a similar attack. Implementing FGSM is relatively simple but it is an evasion based attack. Implement data augmentation for defenses against adversarial examples solves a different problem than data poisoning but it is another useful problem to solve. An example of adversarial data augmentation is shown in a project I worked on here.
For the client side application, we have a few pages implemented that are a bit difficult to access if the source code is not known. Adding a navigation bar helps users easily know which pages are available for exploration and makes them easily available and accessible.
Proposed Solution
In this ticket, you will be implementing a component called NavBar.jsx.
Action Items
create this file in src/components/NavBar.jsx. You will need to import Link from react-router-dom
insert the implemented component into App.jsx so that it is consistent on every page no matter what BrowserRouter renders
add elements in a <ul> so that each <li> has a <Link> pointing to a page on the application
add a prop for authentication (boolean). Have this conditionally render a Sign In/Sign Out link and a Register link. For now, this will be purely interactive but later once sessions are implemented, we will be make this depend on cookies. You may find the JavaScript ternary operator helpful for this implementation
style the navigation bar appropriately so it remains fixed on a location on the page. There are two options, implementing a vertical or horizontal navigation bar. Either is acceptable.
OPTIONAL Ideally, each contributed datapoint would not be automatically added to a dataset and instead there will be a voting mechanism in place to allow a consensus to be formed on whether or not the contributed Image is a reliable datapoint (i.e. data is accurate, complete, and trustworthy - check out this article for more details). This is a bit difficult since it requires interfacing through the Dataset and DatasetManager contracts.
Proposed Solution
The Image smart contract should have the ability not only to store the data of individual images within the dataset, they should also have a simple ballot design to determine whether or not a particular data point should be accepted into a dataset.
Action Items
refactor implementation in Dataset to keep track of enums containing Image addresses
implement fields managing data from image metadata
implement basic ballot voting algorithm in Solidity (initial implementation should just have a simple threshold of 10 votes in order to allow an Image to become part of a Dataset. This might be similar to a "Crowd Fund" shown below where the owner is the Dataset and the Image cannot be claimed unless say 10 or more people contribute their tokens of votes to the Image. There are also some potential benefits of implementing a token similar to ERC20.
We currently have a /api/user endpoint that is available for getting information about a user. We will use this sort of information to
Proposed Solution
The following is needed to coordinate the implementation of the profile page.
Action Items
coordinate with backend to add support for an /api/user/me specific id to return information about the current user using the corresponding session token. This depends on a few PR so we can hold off on implementing this specific part of the other parts are not ready.
write components to display information in <input> fields that belong to a user (i.e. username, password, etc). Write a fetch to the backend to get the default values for this page for a user (based on /api/user/me endpoint)
allow these fields to be modified and add a "Save" button that submits a PUT request to the backend with the update values. Make sure to check that the values are modified or set to default value in the PUT request otherwise this may overwrite the database data for this user
(optional) add a default profile image. We may even be able to support profile images by saving the base64 encoded version of an image but not necessary for now.
Resources
Reference some other PRs and code for how to maybe implement this.
For handling contributions of data points, we need to allow users to be able to upload images on the frontend. This requires not only setting up file uploads but also designing a serialization method for the files to be sent to the server and blockchain. For our use cases, images can be serialized into base64 strings which will be good enough for our purposes.
Proposed Solution
Below is a set of resources and a list of action items that will help with implementing this feature. There are some modifications that need to be made to the examples on the resources below in order to complete this feature.
Action Items
create initial file upload via <input> field for JPGs & PNGs
convert the uploaded image file to a base64 string
create a dropdown menu to allow users to select a dataset to contribute to
extract metadata from the uploaded blob (there are a few libraries out there so we will need to see what works well)
send file upload to smart contract via web3.js, truffle, and Metamask. Tackle this together with #15 and #18.
set up requests to the backend using fetch to keep track of uploads corresponding to datasets. This will need to made to the /api/dataset endpoint defined in #10.
Ethereum smart contracts require a configuration known as an ABI. While we have an existing endpoint for /api/dataset that has information about the deployed smart contract, the amount of data required in an ABI is increasingly large in scale and is difficult to manage at an endpoint with other functionality. For these reasons, we will provide a separate endpoint that purely acts to serve ABI JSON files to the client to allow it to connect to the blockchain. This will also allow changes to be made to the smart contract without significant changes needed to be made to the client.
Proposed Solution
This depends on #10 and is required for #18. This endpoint will simply handle saving JSON files for the ABI.
Action Items
implement the ABI table using Flask-SQLAlchemy. This table will have two columns. One is the ABI id column (this can be implemented the exact same way as other tables with db.Integer, primary key, and not nullable) and a file name column (db.String(80), not nullable). You should also add a new column to the Dataset table which is ABI id (db.Integer) which acts as a foreign key to the ABI table containing the ABI the id of the ABI corresponding to a particular dataset.
GET will have similar error and status messages to other endpoints. This will query the database based on the id and return a status and abi containing the JSON for the ABI.
POST will have similar error and status messages to other endpoints. This will expect a name field, a database id, and an abi field. These will be used to store in the database while the contents of the ABI field will be saved to a file based on the file name + .json. It will also query the Database table to make sure that there is not a duplicate ABI for a particular dataset.
PUT will have similar error and status messages to other endpoints. This will expect a name field, a database id, and an abi contain the JSON for the new ABI along with the abi id. All of these fields are required and will be used to check if this actually corresponds to an existing Dataset and ABI.
DELETE will have similar error and status messages to the other endpoints. This will expect an abi id and delete the corresponding ABI in the database as well as the file on the local file system.
Currently, we need a way to interface with a dataset on the client side with the blockchain. DatasetPage.jsx Will act as a display for a particular dataset. It will display all of the data belonging to a dataset being queried from the Dataset smart contract. This issue should be tackled with #7 and #15. This will also depend on #26.
Proposed Solution
DatasetPage.jsx will involving interfacing mostly with web3.js and React.js to display the items belonging to a particular Dataset smart contract.
Action Items
create an Image.jsx component which renders a singular image based on props based from the smart contract. Make sure to include a button that toggles between Verified and Not Verified for each Image
construct a state that is a list filled with image objects obtained from the smart contract. Create a map between objects in this list and Image.jsx components to render all of the images in the dataset.
style this page appropriately to create a grid of images in the dataset
get contract address and database metadata. This information can be hard coded to start but will eventually be
Use the ABI to instantiate a new contract Dataset object and await until the contract is deployed() and hydrated with values from the blockchain. This should most likely be done in a useEffect.
get smart contract ABI. This is a JSON that can be retrieved from /api/abi/<id> that is the main interface defined for web3.eth.Contract.
(optional) add a drop down menu that allows filtering by whether or not the data is approved or not and whether or not an image belongs to a certain class. We may also add filtering by whether or not the data is augmented once we add that feature later on. This may require using the HTML <option> tag.
(optional) another interesting feature to have would be to allow easy downloading of the entire dataset in a zip file. This may become increasingly difficult as the scale of the dataset grows but maybe adding a download button or looking into how zipping in the browser may work.
This smart contract is a bit more complicated than Dataset. The goal of DatasetManager is to act is the main interface between the client and the blockchain. Not only will this be used to retrieve information about datasets and manage contributions, it will also be used to create new datasets and deployDataset contracts.
Proposed Solution
DatasetManager is a unique contract since it will not only be a contract but also a contract owner. It will also handle deploying contracts which require making payable functions in order to cover the gas of creating these datasets. This will probably follow something similar to the factory design pattern. It will also involve writing interfaces and calling functions from the Dataset contract so this will require interfacing with the Ethereum ABI.
Action items
implement a struct for Dataset this will contain information such as metadata about datasets and addresses pointing to datasets
add relevant events
create a mapping between a dataset id and a Dataset struct/contract.
implement a function called createDataset which takes in the relevant information about a dataset and deploys a Dataset contract based on this.
implement a function called addData which calls the Dataset implementation of addData via the Ethereum ABI.
implement an approve function for handling confirmation and approval of Dataset data.