I'm a software engineer with multiple years of experience developing full-stack data-driven solutions and a PhD in computer science focusing on visual analytics, network science and information retrieval.
๐ฑ Iโm currently working on multiple projects exploring a variety of different techniques, technologies and aiming to solve real world problems.
Is your feature request related to a problem? Please describe.
users should be provided with hints when searching through text fields and provided with dropdowns when searching through categorical values.
Describe the solution you'd like
This will also require that we extract information about categorical values before populating the dataset.
Is your feature request related to a problem? Please describe.
Current user graphs should be stored on backend in a redis instance and accessible for further processing.
Describe the solution you'd like
The graphs should be stored by providing a unique user id (should be generated when the user first opens the frontend) and additional metadata about the dataset, query and the full graph structure together with node and edge properties. This should be accessible from the entire backend.
Is your feature request related to a problem? Please describe.
Users should be discouraged from performing questionable actions in CSX.
Describe the solution you'd like
1. If a user tries to connect an array feature with a single value feature they should not be able to form a 1:1 connection. (similar logic for other connections) 2. Users should only be able to add new properties to single value features in the overview graph.
3. Users should only be able to use appropriate search/filter nodes with appropriate values.
Is your feature request related to a problem? Please describe.
If a user selects a list feature as an anchor and another list feature as its property then the mapping should be 1:1 if there are more values in the property than in the feature values (e.g. more values of author names than there are h-indexes) then a missing prop value should be added.
Is your feature request related to a problem? Please describe.
Users should be able to export their graphs in a format which can be imported directly into Neo4j while preserving all of the graph properties.
Is your feature request related to a problem? Please describe.
Simplify default settings by removing default visible nodes and automatically identifying list features
Describe the solution you'd like
Users should only define the anchor, links and default searchable nodes. Features that are list should be automatically assigned the list type.
Additional context
Should be implemented after #27 and #24
Describe the solution you'd like
The links do not have any influence on the detail schema so they should be removed.
Since the detail schema is a DAG consider removing the requirement of an anchor and directly identify shortest paths from nodes to other nodes. The paths that overlap should be merged together.
After this is implemented the users should not have to define any anchors or links in the detail schema and should be able to just define connections in the schema.
Is your feature request related to a problem? Please describe.
Users should have access to advanced statistics in the selection menu
Describe the solution you'd like
Users should be able to add their own custom graphs to the statistic menu. When a user clicks a plus button they can chose multiple types of visualisations (pie, line, bar chart) and features/properties to visualise. They should also be able to select the width of the chart (e.g. half width or full width).
Additionally there should be an option to show the comparison to the full graph for some of the features (e.g. number of nodes in selection vs number of nodes in entire graph)
Is your feature request related to a problem? Please describe.
Developers (as well as a CICD pipelines) should be able to run frontend unit tests to make sure the entire application works as expected.
Describe the solution you'd like
This issue should serve as a first step towards a full test coverage of the frontend.
As part of this issue jest should be introduced together with the tests for critical parts of CSX frontend (TBD which components exactly need testing)
Is your feature request related to a problem? Please describe.
When uploading a dataset the users should be provided with an option to fill the null values automatically.
Describe the solution you'd like
When a dataset is provided to CSX it should automatically detect columns with null values. Based on the type of the column it should provide a few suggestions for null value alternatives as well as a text field asking the user to enter a specific value they would like to use instead of a suggestion.
Is your feature request related to a problem? Please describe.
User should be able to run community detection algorithms on graphs and view/select those the same way they can do with components.
Describe the solution you'd like
Users should have the option to run Louvain and potentially other reasonable fast algorithms for community detection on their graphs. The community detection should be executed in the background without interfering with the users interactions with the app and only enrich the existing data once the calculation is done. Once the data is enriched the users should also be presented with a list of communities and with a community color schema when exploring the graph.
The community detection algorithm should run on the internal representation of a graph stored in redis (implemented in #40).
Additional context
This feature should be implemented only once #40 is done.
Is your feature request related to a problem? Please describe.
Users should be able to filter the tabular data through direct connection exploration (e.g. looking at direct connections of a keyword node connected to multiple papers should only leave those paper entries in the table on the right side)
Is your feature request related to a problem? Please describe.
Charts with properties such as age should be sortable also by values (i.e. values of age).
Additional context
In case when we have properties which can not be sorted we need to disable this option or sort by something else (e.g. sort by alphabet vs by size)
Is your feature request related to a problem? Please describe.
Developers (as well as a CICD pipelines) should be able to run backend unit tests to make sure the entire application works as expected.
Describe the solution you'd like
This issue should serve as a first step towards a full test coverage of the backend.
As part of this issue pytest (since it is already part of FastAPI) should be introduced together with the tests for critical parts of CSX backend (TBD which components exactly need testing)
Is your feature request related to a problem? Please describe.
CSX users shouldn't have to run python scripts or write config files to get their custom dataset into the tool. There should be a way for users to just drag and drop their dataset and be guided through a set of step which will result in an automatic generation of a configuration for the dataset.
Describe the solution you'd like
Users should be able to perform the following steps:
Drag and drop a .csv / .xlsx file
CSX should ask the user for the dataset name as well as propose the same name as the filename
CSX should automatically detect column data types (e.g. is it a string, list, number, date, etc.) and provide the suggestions to the user
The user should have an option to define a different type for each column or continue
Users should be asked a set of questions for creating the settings file (e.g. default search features etc.)
Users should be asked to define a default schema for the overview graph
Users should be warned if the suggested overview schema crosses a threshold of nodes/edges by calculating the ratio between the unique values in the anchor and unique values in the link and compering it to a threshold
Users should be asked to define a default schema for the overview graph, they should be warned if they try connecting features with invalid cardinalities
The dataset should show up in the list of existing datasets
Expand graph by selecting a few nodes and searching for new results based on these node values (either narrow expansion where all of the properties have to be satisfied or broad expansion where any of the properties can be satisfied) Essentially running a search with AND or OR and expanding existing graph with new nodes and connections. This means the tabular data has to be merged and the graph has to be recalculated.
Also in this case you have to add search nodes to the advanced search that will represent this search.
Is your feature request related to a problem? Please describe.
Users should be able to delete particular nodes from the workflow and save workflows by giving them a name
Is your feature request related to a problem? Please describe.
Users should not have to define anchor and links properties in the detail graph. Explore if there is a possibility of integrating the detail graph generation without the need for a defined anchor node.
Is your feature request related to a problem? Please describe.
When uploading a dataset users should be able to define a default schema for the dataset they are uploading.
Describe the solution you'd like
Users should be asked to define a default schema for the overview graph
Users should be warned if the suggested overview schema crosses a threshold of nodes/edges by calculating the ratio between the unique values in the anchor and unique values in the link and compering it to a threshold
Users should be asked to define a default schema for the overview graph, they should be warned if they try connecting features with invalid cardinalities
Is your feature request related to a problem? Please describe.
The table view should be a first class citizen just like the network view. Therefore, users should be able to load large amounts of data in the table quickly and perform various filtering operations on them.
Describe the solution you'd like
Explore using glide-data-grid for the table view. If it performs better than the existing one use the glide-data-grid and enable users sorting and searching through data entries in the table using it. If the table is filtered down through search the network should be filtered as well.
Is your feature request related to a problem? Please describe.
Users should be able to demand graphs with connections where all link features defined in the overview schema are present in each edge.
Describe the solution you'd like
If a users defines authors and countries as links of the overview schema the overview graph view should present the user with a graph where only edges with both link features are present.
Is your feature request related to a problem? Please describe.
Users should be able to "cut out" parts of graphs and explore only those in either overview or detail view.
Describe the solution you'd like
When a user selects a part of the graph there should be an option to "cut" that part of the graph. This should enable users to narrow down their information need.
Is your feature request related to a problem? Please describe.
Users should be warned if the overview graph has potential to be extremely large if they select a feature that has many unique values as an anchor and features with low number of unique values as links.
Describe the solution you'd like
Users should see a popup / button to visualise the same properties as a detail graph in case they select a combination of link and anchor features which might lead to graphs with (e.g. more than 3k links).
Additional context
The unique values should be pre-calculated and integrated as part of the drag and drop feature described in #1 and this issue should not be done before it.
Is your feature request related to a problem? Please describe.
Users should be able to explore networks of self-referencing nodes. E.g. twitter networks where users follow other users or citation networks where papers reference other papers.
Describe the solution you'd like
Users should chose the dataset feature representing the "relationship" between the current entry and the related entries. This introduces multiple complications:
Which feature is the source and which feature is the destination of this connection (this property should be defined in the overview and detail schema (e.g. if a users selects the feature column "followers" they should also select its type "user_id")
To represent this in the overview schema we need to introduce a new property on the link nodes which expresses that the link is a "same type reference" link.
To represent this in the detail schema we either have to convert our DAG into just a DG (directed graph) or introduce a new property which can be toggled on the nodes
Processing this will most likely be time demanding since we're dealing with a column of links therefore we should ask users when adding their data to specify if any of the columns express "explicit connections to other nodes"
Additional context #1 Should be done first since we need to ask users in advance for the data types
Is your feature request related to a problem? Please describe.
Users should be able to leave the numeric values in their dataset as they are and CSX should be able to process them.
Describe the solution you'd like
Currently CSX does not digest numeric values which limits the visualisations and operations which can be performed (e.g. we can't do binning or scale-based color schemas based on values of nodes)
Is your feature request related to a problem? Please describe.
Users should be able to understand what went wrong on the backend (or frontend) by seeing the error message popup on the frontend (currently that is not the case)
Describe the solution you'd like
For example: if a connection to elastic cannot be established the user should get that as an error message or at least a particular code indicating there is an issue with elastic.
Is your feature request related to a problem? Please describe.
Users should be able to toggle a simple view in which they can only explore predefined schemas, view stats for the entire graph and selection, view the list view and have no advanced toggling and search options. This could serve as a sort of a presentation mode. If possible as part of this issue explore how to introduce a flag in the docker build process to generate a CSX which only enables users to run a simple text search and view networks in a presentation mode.
Is your feature request related to a problem? Please describe.
Users should be able to overwrite datasets by drag and dropping new datasets with the same name while still keeping the configuration of the old dataset with the possibility of changing it if the datset changed.
Is your feature request related to a problem? Please describe.
Users should be able to store a particular state (e.g. particular search, combined with a particular view and settings) of CSX as a finding which must be given a name and can be given a description. Users should be able to click on these findings from the homepage and immediately navigate to the finding.
Additional context
This feature requires user accounts and storage of existing graphs.
Is your feature request related to a problem? Please describe.
When users provide default dataset settings they should also see a detailed graph connected based on these settings.
Describe the solution you'd like
If a user defines feature X as the default anchor and feature Y and Z as the default links then they should see this reflected not only on the overview network but also on the detail network. E.g. the default schema in that case should be feature X connected directly to feature Y and feature Z with appropriate relationships.
Additional context #24 should be done before this since it is needed to infer appropriate relationships.
Is your feature request related to a problem? Please describe.
In addition to searching through the provided datasets users should be able to view the entire dataset.
Describe the solution you'd like
In addition to the three buttons next to each of the datasets there should be a fourth one offering the option to view the entire dataset.
Is your feature request related to a problem? Please describe.
Users should be able to view selected node property distributions as node stats.
Describe the solution you'd like
When users select multiple nodes with properties on them these should show up in a parallel coordinates plot in the "selected" stats on the right sidebar.
Is your feature request related to a problem? Please describe.
Users should see all available datasets as a scrollable list or a scrollable grid.
They should have the option to explore the entire dataset or to delete the dataset from the index.
Is your feature request related to a problem? Please describe.
Users should be able to select different color schemas and not just the default for each feature.
Describe the solution you'd like
Users should see the possible properties which can be used as color schema properties in a dropdown.
Additionally they should be presented with a dropdown / toggle for choosing between different color schemas for a certain color schema property.
Subtasks:
Create a branch called issue-6 from develop and use it for the development of this task.
Add another dropdown with a label called "color schema" and add 3-5 color schemas (from libraries) Note that color schemas for numeric variables and categorical variables are most likely different.
Add option to define color for each categorical features (users should be provided a color picker)
Is your feature request related to a problem? Please describe.
Users should be able to define properties to features with a 1:M mapping in which case each unique property + feature combination should be a new node.
Describe the solution you'd like
For example if a user defines citation count as a property of a paper title where paper title is not unique there should be a node created for each new combination of paper title + citation count.
Describe the bug
Currently the computed features don't have an assigned type which results in connections between such features and other not computed features results in 1:1 connections instead of M:1, 1:M or M:N.
Additionally as part of this bug make sure that charts show also computed features.