floratos-lab / geworkbench-web Goto Github PK

geWorkbench web application - the evolution of geWorkbench project into the age of cloud computing.

Shell 0.19% Java 94.75% JavaScript 1.77% HTML 0.60% CSS 2.69%

geworkbench-web's Introduction

geworkbench-web

geWorkbench platform on the Web

This is a standard Java web application with backend database. To build, you need Java, and maven. To deploy, you need a Java application container, e.g. tomcat (tested), and a database system that supports JDBC, e.g. MySQL (tested).

developers's notes

Java version

The project needs to be built using Java 8. Although the application may be deployed and work under a newer Java, e.g. Java 11, the building must be done using Java 8 for now.

typical testing procedure

build mvn clean && mvn package
- before building, config files src/main/resources/application.properties and src/main/resources/META-INF/persistence.xml need to be created
- the database specified needs to be available
deploy sudo cp target/geworkbench.war ${CATALINA_HOME}/webapps
- assuming the tomcat server is running at ${CATALINA_HOME}/
test sudo tail -f ${CATALINA_HOME}/logs/catalina.out. Check from the browser at http://localhost:8080/geworkbench/

production deployment plan

backup the current deployment
- backup the webapps subdirectory and the war file (this will help diagnose any issues following the deployment or an emergency rollback if anything goes wrong during the deployment).
- backup the backend database as a sql dump
check out the code to be deployed, for a certain tag (a branch if we expect active development continues separately for the deployed code from the master branch)
update the properties files that are not in the github, typically by copying from a specified location. This should include src/main/resources/application.properties and src/main/resources/META-INF/persistence.xml
build the war file by execute mvn package
copy the war file to the destination machine, at $CATALINA_HOME/webapps/
restart tomcat if necessary

functionality dependency

a few web services deployed separated. They might be on other servers, URLs specified application.properties.
- ANOVA analysis, t-test analysis, hierarchical clustering analysis, ARACNe analysis, MS-Viper analysis, PBQDI analysis
gmail email support for the purpose of account registration communication
LINCS web application. For our production deployment, this is on the same tomcat.

Other dependencies, e.g. CNKB query servlet, are indirect via those listed above.

technology dependency

cystoscape.js

geworkbench-web's People

Contributors

Stargazers

Watchers

Forkers

programming-systems-lab shakunbaichoo

geworkbench-web's Issues

slow response on "Set View"

When "Set View" is clicked on, there is no action on large datasets. The delay should be able to be avoided.

affymetrix annotation needs to be re-written

the current implementation of annotation management is based from the desktop version and is fundamentally improper (if barely correct)

Tabular view - "Preference setting" and "Reset" doesn't work

1.Open "Display Preference" and click on "Annotations", the view does not change by setting.
2.Click on "gene symbol/marker id", the view also does not change by setting.
3. click on "Reset", the tabular view is not reset to original.

When change code on tabularview component. Please be careful that the change may break menu selector functions. After change it please not only test on tabular view, but also test on menu selector functions.

retire the older flavor of AnalysisSubmissionEvent

There are two constructors of AnalysisSubmissionEvent: one takes dataset ID only; the other takes bison DSDataSet. The latter is supposed to be replaced, but at this time used more - 5 places. This is related to issue #10.

status log is incorrect for markus analysis

org.geworkbenchweb.plugins.markus.MarkusAnalysis.execute(75) reports FINISHED when it is in fact pending.

re-engineering data flow

Currently, the dataset, mainly just the microarray dataset, is first read and parsed to create a bison object and serialized; when the data is needed, the bison object is deserialized. That does not add value compared with parsing the original file again, but add a major layer of complexity and inefficiency.

The implementation consists of serializeDataSet and deserializedDataSet methods in UserDirUtils. serializeDataSet is referenced in one place and deserialzeDataSet is referenced in many places (10). So it is better to replace the deserializeDataSet one by one first.

The new data flow is basically to replace the serialized object with two mechanisms: part of the data is serialized in JPA to make it more efficient to query and cleaner to manage; at the same time, the original data file is explicitly reserved so we can parse it again when we need to access data not included in the current persistence data schema.

Unreliable logic in CNKB component

This issue is the core code, but directly affects the annotation revamp #3 in this project.
InteractionsConnectionImpl.getInteractionsByEntrezIdOrGeneSymbol_1 and getInteractionsByEntrezIdOrGeneSymbol_2 depend on the behaviors of DSGeneMarker.getGeneName() and DSGeneMarker.getGeneName(), which are decided by the 'current dataset' because that is how annotation is managed. 'Current dataset' is open to change outside of annotation mechanism and CNKB.

Instead, this could be made reliable just by using gene symbol and gene ID directly, not the DSGeneMarker objects.

catch-all exception clause

It has never turned out that the catch-all exception clause is the proper solution for the problem. It is a very bad idea. When it swallows the exception and does output or log anything, it is worse. In a case in AnalysisListener, it does not swallow the exception, but allows the execution to continue - that is not the right thing to do either.

upgrade to vaadin 7

vaadin 6 is out of date. It is getting harder to get supports in documentation and tools.

right-side menu item "Account" does not do anything

CNKB - create network incorrectly when interactom "HGi" is selected

result nodes not showing up

The recent implementation of node renaming feature caused a strange problem of vaddin: the LAST data note's children result nodes do not show up; consequently, a lot of odd behavior occurs after the result nodes are supposed to be created for the last data node. Without the node to add action handler (for renaming), everything works fine.

we should have some basic test cases and tutorials for all the components

slow tabular view for expression data

When you click on "tabluar view" from the menu page for microarray dataset, it takes a few seconds to bring up the first page (25 rows) of the tabular view.

Tabular view - filter menu does not work properly

open filter window from menu, when array sets are selected and submitted, the data displayed on tabular view is wrong.

improve upload dialog

The combo box to choose file type is used in the desktop version to filter out the files by the desired type. In the web version, the filtering is not implemented, so this combo box serves no purpose and add an unnecessary step to operate. It should be removed until we implement a filtering mechanism.

CNKB-retrieve problem with selected gene is not in annotation list

When selected gene is not in annotation list, CNKB will get server response code 500

consolidate geworkbench-web user account and genspace user account

tabular view - search not working

In search box, type a character, exception is thrown: (I switched to a much earlier commit 3242017 and it has the same issue, so it is not caused by the recent data-flowing change)

Caused by: java.lang.IllegalArgumentException: PagedTable can only use containers that implement Container.Indexed
at org.geworkbenchweb.utils.PagedTable.setContainerDataSource(PagedTable.java:236)
at org.geworkbenchweb.plugins.TableMenuSelector$3$1.textChange(TableMenuSelector.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.vaadin.event.ListenerMethod.receiveEvent(ListenerMethod.java:510)
... 28 more

set up a maven project for better building and deployment

So the building and deployment process will be:

more consistent
better documented, especially for the library dependency
more efficient

implement a simple user activity log

Although the requested feature overlaps some of the functionality of genspace component, the purpose is different from that of genspace, and it is better to design this as a separate element in the system.

The activities to be logged include: logging-in, logging-out, loading a dataset, starting an analysis, and a result set having been created.

problem in CNKBUI handling HttpSession

CNKBUI.java needs to get HttpSession directly to support its functionality of "CNKB user" authorization. The implementation had problem of getting null pointer in the past. The problem disappeared later and the cause has not been found out. I saw the same issue again (CNKBUI.java line 303 in method getInteractions) when I am working on other changes.

incorrect warning message

After I register a new user, it shows a warning (cannot re-create user). After dismissing the message, it works fine.

anova does not have correct result show on tabular view when a marker set is selected

load public/private annotation file does not work

The annotation files under public/private should not use file name as item id. It should read from database and use annotation file id as item id. Also the application should not base on file disk to determine if a private or public annotation file exists or not, especially we deploy geworkbench-web to production which is a distributed server. An individual program will be developed to insert a public annotation file to database.

analysis-complete event requested to support genspace feature

freezing the application during data loading

The current implementation chose to freeze the application during potentially long data loading. This is problematic. Instead, it should disable certain part of functionality - even more parts than strictly necessary is OK - and leave the application responsive.

UploadDataUI.java

CNKB - the value of gene type and annotation are not populated

the value of gene type and annotation are not populated when a query is performed.

add export option of SIF format to network viewer

This is a requirement of CPTAC enhancement.

The main change is to add interaction type to the network data model.

For the specification of the SIF format, see http://wiki.c2b2.columbia.edu/workbench/index.php/Cellular_Networks_KnowledgeBase#SIF_format

clean up the serialization implementation of result data.

This is similar to, or arguably part of, issue #10, but is used more at this time and does not necessarily have anything to do with the bison data model.

vulnerable log4j libs

According to the admin group's report, there are 12 vulnerable files in this application.

Tabular view - Preference "precision" setting does not work properly

Tabular view - Preference "precision" setting does not work properly. When you change precision setting, only one page return back to tabular view.

develop new volcano chart and CNKB throttle plot

These two components depend on invient chart,which is no longer available for free in vaadin 7.
Two promising candidates to replace invient chart are:

jfreechat will be perfect if the interactive behavior (tooltip etc) can be worked out easily because this is the same library we are using in the swing version.

new parsing code of .exp file

The current parsing code (from desktop application) is tightly coupled with bison data types and causes unnecessary inefficiency. It also includes some undocumented and likely unintended logic. The new code will handle parsing in a self contained way, and provides conversion mechanism to bison data type for backward compatibility.

remove the redundant mechnism of loading set from marina analysis parameter panel

It was used before the current set loading exits in the set view. It causes unnecessary complexity and is not consistent with other analysis tools.

streamline how geneontolgy obo file is managed

Currently, the latest obo file is downloaded when the application is loaded the first time. The download happens in a background thread started from the static block of class GeneOntologyTree.

Pro: the latest file at the time of GeneOntologyTree being class-loaded is used.
Con: 1) The multi-thread model of coordinating the downloading thread and the code using GennOntologyTree, including the fallback when the download fails, is complicated. 2) The server providing the obo file occasionally (though not very often) does not respond normally - instead of falling back to the existing copy quickly, the application may end up to have a corrupted copy of obo file that appears ready to be used.

Summarily, the benefit of the current model is minimal, and it is not worth of the consequent complexity that has not been totally taken care of. I think we should use a simple process that is separated from the web application to update the obo file when necessary (manually or automatically).