Vast Challenge 2020 - Mini-Challenge 1: Graph Analysis
The objective of this challenge is to leverage visual analytics to determine a significant dataset provided by the Center for Global Cyber Strategy (CGCS), which comprises anonymized profiles created from data donated by white hat groups. These profiles encapsulate the behavioral and structural characteristics of various groups, one of which has been hypothesized by CGCS sociopsychologists to closely resemble the organization inadvertently responsible for a major internet outage. Our task within this challenge is to engage in a meticulous comparative analysis of the CGCS's provided subgraph template—a representation of the suspect group's structure—against several candidate subgraphs. The aim is to determine which of these subgraphs exhibits the highest degree of congruence with the template, thus identifying the group that most likely matches the profile associated with the outage.
- Clone the project into your local machine using
git clone <project_url>
- Install the "live server" plugin for Visual Studio code
- Click on "Go Live" button to spin up a server on port 5500
- Another to run the project way would be using Python http server
python3 -m http.server
- Use
localhost:<port>
to access the project
- D3.js - Graph visualizations
- Bootstrap - Styling and layout
- virtual-select.js - Dynamic dropdowns
arc-diagram
- Used to identify potential seed nodes structures that match the template.bar-graph
- Used to identify which subgraph eType activity correlates the most with the template.heat-map
- Used to identify which potential seed graph spend activity correlates the most with the template.lollipop
- Used to identify which potential seed graph communication activity correlates the most with the template.multiline
- Used to identify which subgraph activity correlates the most with the template with time.node-link
- Holistic identifier for the matching of template with the subgraphs, shows the subregion with the most similarity to template.scatter-activities-plot
- Used to find out the activity history of each of the nodes using a scatter plot.scatter-travel-history
- Used to find out the travel history of each of the nodes using scatter plot and flag glyphs.
- Memory and Computation overhead: Loading the dataset which was 6GB in size was quite challenging which is why we had to preprocess a lot before hand. Having access to a larger dataset would be
- In depth analysis: While cosine similarity was a great metric for us to work with and provided near-accurate results, working with it gives us a lack of contextual knowledge of the dataset we are working with
- Seed similarity analysis: While working with subgraphs was easier, working with subgraphs and template as the number of records was just about 2000-3000 records whereas the seed graph consists of 2000 child nodes per node, which means evaluating about 5 million node links in our main graph, while those links would’ve been useful we couldn’t evaluate them.
- Create new components in the
components
dir, with it's own js and css files - Use index.js and style.css for global js and css
- data can be found in the
data
dir assest
dir will hold images, fonts etc..
- Darshan Vipresh - [email protected]
- Deep Rodge - [email protected]
- Jayati Goyal - [email protected]
- Prasad Mahalpure - [email protected]
- Kaushal Yadav - [email protected]
- Sravya Thummeti - [email protected]