Code Monkey home page Code Monkey logo

graphify's People

Contributors

bryant1410 avatar gautamjeyaraman avatar kbastani avatar rbramley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphify's Issues

[ALERT]Starting Neo4j Server failed: org/neo4j/nlp/ext/PatternRecognitionResource : Unsupported major.minor version 52.0

I did the step 3:Configure Neo4j by adding a line to conf/neo4j-server.properties, and added the command to the bottom of the neo4j-server.properties.

But there is the alert shows "Starting Neo4j Server failed: org/neo4j/nlp/ext/PatternRecognitionResource : Unsupported major.minor version 52.0" when I use the neo4j-community.exe(Neo4j Community\bin).

And when I use the neo4j-desktop-2.1.5.jar, the alert is changed to "Starting Neo4j Server failed: javax.servlet.ServletException: org.neo4j.server.web.NeoServletContainer-7254304@2bf60129==org.neo4j.server.web.NeoServletContainer,-1,false".

could u fix this problem..?

Training step - Meaning of Arrays

Hi,
this more of a question than an "issue": I noticed that during the training step I need to pass an array like:

{
"text": [
"Interoperability is the ability of making systems and organizations work together."
],
"label": [
"Interoperability"
]
}

to the endpoint, but in all of your examples the array contains only one element. I am wondering what it would mean for the classifier when I pass several elements in the "text" array for example. Would they be considered different elements of the same document, or would it see them as two separate documents which have the same label?

Related to this and as some input: It would be great if it would be actually possible to pass several documents with the same label in "one go" during training. That would reduce the amount of http requests drastically in my case and probably speed up training with 100.000s of small documents.

Just an idea :)

Build Failure

Hi Kenny,

I posted as a comment to this issue on a previously closed issue. Basically, still having a build issue for the 1.0.0-M01 branch. Is there a solution to this? My output is below.

mvn assembly:assembly -DdescriptorId=jar-with-dependencies --debug

cutor - Finished task 0.0 in stage 106.0 (TID 105). 1000 bytes result sent to driver
16:58:07.296 [Result resolver thread-1] INFO o.a.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 106.0 (TID 105) in 87 ms on localhost (1/1)
16:58:07.296 [Result resolver thread-1] INFO o.a.s.scheduler.TaskSchedulerImpl - Removed TaskSet 106.0, whose tasks have all completed, from pool
16:58:07.297 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Stage 106 (map at ClassifierUtil.java:42) finished in 0.088 s
16:58:07.297 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - looking for newly runnable stages
16:58:07.297 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - running: Set()
16:58:07.297 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - waiting: Set(Stage 103, Stage 107)
16:58:07.297 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - failed: Set()
16:58:07.297 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Missing parents for Stage 103: List(Stage 107)
16:58:07.298 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Missing parents for Stage 107: List()
16:58:07.298 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting Stage 107 (ShuffledRDD[211] at combineByKey at BinaryClassificationMetrics.scala:101), which is now runnable
16:58:07.298 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - ensureFreeSpace(2752) called with curMem=5161632, maxMem=997699092
16:58:07.298 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - Block broadcast_207 stored as values in memory (estimated size 2.7 KB, free 946.6 MB)
16:58:07.299 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting 1 missing tasks from Stage 107 (ShuffledRDD[211] at combineByKey at BinaryClassificationMetrics.scala:101)
16:58:07.299 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.s.scheduler.TaskSchedulerImpl - Adding task set 107.0 with 1 tasks
16:58:07.299 [sparkDriver-akka.actor.default-dispatcher-2] INFO o.a.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 107.0 (TID 106, localhost, PROCESS_LOCAL, 937 bytes)
16:58:07.299 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 107.0 (TID 106)
16:58:07.300 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - maxBytesInFlight: 50331648, targetRequestSize: 10066329
16:58:07.300 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - Getting 1 non-empty blocks out of 1 blocks
16:58:07.300 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - Started 0 remote fetches in 0 ms
16:58:07.305 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 107.0 (TID 106). 1000 bytes result sent to driver
16:58:07.305 [Result resolver thread-2] INFO o.a.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 107.0 (TID 106) in 6 ms on localhost (1/1)
16:58:07.305 [Result resolver thread-2] INFO o.a.s.scheduler.TaskSchedulerImpl - Removed TaskSet 107.0, whose tasks have all completed, from pool
16:58:07.305 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Stage 107 (combineByKey at BinaryClassificationMetrics.scala:101) finished in 0.006 s
16:58:07.305 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - looking for newly runnable stages
16:58:07.305 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - running: Set()
16:58:07.305 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - waiting: Set(Stage 103)
16:58:07.305 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - failed: Set()
16:58:07.306 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Missing parents for Stage 103: List()
16:58:07.306 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting Stage 103 (MapPartitionsRDD[214] at mapPartitions at BinaryClassificationMetrics.scala:106), which is now runnable
16:58:07.306 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - ensureFreeSpace(3152) called with curMem=5164384, maxMem=997699092
16:58:07.306 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - Block broadcast_208 stored as values in memory (estimated size 3.1 KB, free 946.6 MB)
16:58:07.306 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting 1 missing tasks from Stage 103 (MapPartitionsRDD[214] at mapPartitions at BinaryClassificationMetrics.scala:106)
16:58:07.306 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.s.scheduler.TaskSchedulerImpl - Adding task set 103.0 with 1 tasks
16:58:07.307 [sparkDriver-akka.actor.default-dispatcher-20] INFO o.a.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 103.0 (TID 107, localhost, PROCESS_LOCAL, 948 bytes)
16:58:07.307 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 103.0 (TID 107)
16:58:07.307 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - maxBytesInFlight: 50331648, targetRequestSize: 10066329
16:58:07.307 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - Getting 1 non-empty blocks out of 1 blocks
16:58:07.307 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - Started 0 remote fetches in 0 ms
16:58:07.320 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 103.0 (TID 107). 991 bytes result sent to driver
16:58:07.321 [Result resolver thread-3] INFO o.a.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 103.0 (TID 107) in 15 ms on localhost (1/1)
16:58:07.321 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Stage 103 (collect at BinaryClassificationMetrics.scala:110) finished in 0.015 s
16:58:07.321 [Result resolver thread-3] INFO o.a.s.scheduler.TaskSchedulerImpl - Removed TaskSet 103.0, whose tasks have all completed, from pool
16:58:07.321 [main] INFO org.apache.spark.SparkContext - Job finished: collect at BinaryClassificationMetrics.scala:110, took 0.284472139 s
16:58:07.323 [main] INFO o.a.s.m.e.BinaryClassificationMetrics - Total counts: {numPos: 31, numNeg: 43}
16:58:07.339 [main] INFO org.apache.spark.SparkContext - Starting job: runJob at SlidingRDD.scala:74
16:58:07.341 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.MapOutputTrackerMaster - Size of output statuses for shuffle 1 is 137 bytes
16:58:07.343 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.MapOutputTrackerMaster - Size of output statuses for shuffle 0 is 137 bytes
16:58:07.344 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.MapOutputTrackerMaster - Size of output statuses for shuffle 3 is 137 bytes
16:58:07.344 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.MapOutputTrackerMaster - Size of output statuses for shuffle 2 is 137 bytes
16:58:07.344 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Got job 104 (runJob at SlidingRDD.scala:74) with 2 output partitions (allowLocal=true)
16:58:07.344 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Final stage: Stage 108(runJob at SlidingRDD.scala:74)
16:58:07.344 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Parents of final stage: List(Stage 112)
16:58:07.345 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Missing parents: List()
16:58:07.345 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting Stage 108 (UnionRDD[220] at UnionRDD at BinaryClassificationMetrics.scala:54), which has no missing parents
16:58:07.351 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - ensureFreeSpace(4968) called with curMem=5167536, maxMem=997699092
16:58:07.351 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - Block broadcast_209 stored as values in memory (estimated size 4.9 KB, free 946.5 MB)
16:58:07.352 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting 2 missing tasks from Stage 108 (UnionRDD[220] at UnionRDD at BinaryClassificationMetrics.scala:54)
16:58:07.352 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.s.scheduler.TaskSchedulerImpl - Adding task set 108.0 with 2 tasks
16:58:07.352 [sparkDriver-akka.actor.default-dispatcher-20] INFO o.a.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 108.0 (TID 108, localhost, PROCESS_LOCAL, 1057 bytes)
16:58:07.354 [sparkDriver-akka.actor.default-dispatcher-20] INFO o.a.spark.scheduler.TaskSetManager - Starting task 1.0 in stage 108.0 (TID 109, localhost, PROCESS_LOCAL, 1401 bytes)
16:58:07.354 [Executor task launch worker-0] INFO org.apache.spark.executor.Executor - Running task 1.0 in stage 108.0 (TID 109)
16:58:07.355 [Executor task launch worker-0] INFO org.apache.spark.executor.Executor - Finished task 1.0 in stage 108.0 (TID 109). 723 bytes result sent to driver
16:58:07.356 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 108.0 (TID 108)
16:58:07.356 [Result resolver thread-0] INFO o.a.spark.scheduler.TaskSetManager - Finished task 1.0 in stage 108.0 (TID 109) in 4 ms on localhost (1/2)
16:58:07.356 [Executor task launch worker-1] INFO org.apache.spark.CacheManager - Partition rdd_215_0 not found, computing it
16:58:07.357 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - maxBytesInFlight: 50331648, targetRequestSize: 10066329
16:58:07.357 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - Getting 1 non-empty blocks out of 1 blocks
16:58:07.357 [Executor task launch worker-1] INFO o.a.s.s.BlockFetcherIterator$BasicBlockFetcherIterator - Started 0 remote fetches in 0 ms
16:58:07.361 [Executor task launch worker-1] INFO org.apache.spark.storage.MemoryStore - ensureFreeSpace(6232) called with curMem=5172504, maxMem=997699092
16:58:07.362 [Executor task launch worker-1] INFO org.apache.spark.storage.MemoryStore - Block rdd_215_0 stored as values in memory (estimated size 6.1 KB, free 946.5 MB)
16:58:07.363 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.storage.BlockManagerInfo - Added rdd_215_0 in memory on 172.29.85.226:33829 (size: 6.1 KB, free: 946.8 MB)
16:58:07.363 [Executor task launch worker-1] INFO o.a.spark.storage.BlockManagerMaster - Updated info of block rdd_215_0
16:58:07.364 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 108.0 (TID 108). 1533 bytes result sent to driver
16:58:07.364 [Result resolver thread-1] INFO o.a.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 108.0 (TID 108) in 12 ms on localhost (2/2)
16:58:07.364 [Result resolver thread-1] INFO o.a.s.scheduler.TaskSchedulerImpl - Removed TaskSet 108.0, whose tasks have all completed, from pool
16:58:07.365 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Stage 108 (runJob at SlidingRDD.scala:74) finished in 0.011 s
16:58:07.365 [main] INFO org.apache.spark.SparkContext - Job finished: runJob at SlidingRDD.scala:74, took 0.025972716 s
16:58:07.366 [main] INFO org.apache.spark.SparkContext - Starting job: aggregate at AreaUnderCurve.scala:45
16:58:07.367 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Got job 105 (aggregate at AreaUnderCurve.scala:45) with 3 output partitions (allowLocal=false)
16:58:07.367 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Final stage: Stage 113(aggregate at AreaUnderCurve.scala:45)
16:58:07.367 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Parents of final stage: List(Stage 117)
16:58:07.368 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Missing parents: List()
16:58:07.369 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting Stage 113 (SlidingRDD[221] at RDD at SlidingRDD.scala:47), which has no missing parents
16:58:07.369 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - ensureFreeSpace(5216) called with curMem=5178736, maxMem=997699092
16:58:07.369 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - Block broadcast_210 stored as values in memory (estimated size 5.1 KB, free 946.5 MB)
16:58:07.370 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.spark.scheduler.DAGScheduler - Submitting 3 missing tasks from Stage 113 (SlidingRDD[221] at RDD at SlidingRDD.scala:47)
16:58:07.371 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.a.s.scheduler.TaskSchedulerImpl - Adding task set 113.0 with 3 tasks
16:58:07.371 [sparkDriver-akka.actor.default-dispatcher-19] INFO o.a.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 113.0 (TID 110, localhost, PROCESS_LOCAL, 1805 bytes)
16:58:07.371 [sparkDriver-akka.actor.default-dispatcher-19] INFO o.a.spark.scheduler.TaskSetManager - Starting task 2.0 in stage 113.0 (TID 111, localhost, PROCESS_LOCAL, 1558 bytes)
16:58:07.371 [sparkDriver-akka.actor.default-dispatcher-19] INFO o.a.spark.scheduler.TaskSetManager - Starting task 1.0 in stage 113.0 (TID 112, localhost, ANY, 1599 bytes)
16:58:07.371 [Executor task launch worker-0] INFO org.apache.spark.executor.Executor - Running task 2.0 in stage 113.0 (TID 111)
16:58:07.371 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 113.0 (TID 110)
16:58:07.374 [Executor task launch worker-1] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 113.0 (TID 110). 625 bytes result sent to driver
16:58:07.376 [Executor task launch worker-0] INFO org.apache.spark.executor.Executor - Finished task 2.0 in stage 113.0 (TID 111). 625 bytes result sent to driver
16:58:07.377 [Executor task launch worker-2] INFO org.apache.spark.executor.Executor - Running task 1.0 in stage 113.0 (TID 112)
16:58:07.378 [Result resolver thread-2] INFO o.a.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 113.0 (TID 110) in 7 ms on localhost (1/3)
16:58:07.379 [Result resolver thread-2] INFO o.a.spark.scheduler.TaskSetManager - Finished task 2.0 in stage 113.0 (TID 111) in 8 ms on localhost (2/3)
16:58:07.379 [Executor task launch worker-2] INFO o.apache.spark.storage.BlockManager - Found block rdd_215_0 locally
16:58:07.380 [Executor task launch worker-2] INFO org.apache.spark.executor.Executor - Finished task 1.0 in stage 113.0 (TID 112). 1733 bytes result sent to driver
16:58:07.380 [Result resolver thread-0] INFO o.a.spark.scheduler.TaskSetManager - Finished task 1.0 in stage 113.0 (TID 112) in 9 ms on localhost (3/3)
16:58:07.380 [Result resolver thread-0] INFO o.a.s.scheduler.TaskSchedulerImpl - Removed TaskSet 113.0, whose tasks have all completed, from pool
16:58:07.380 [sparkDriver-akka.actor.default-dispatcher-19] INFO o.a.spark.scheduler.DAGScheduler - Stage 113 (aggregate at AreaUnderCurve.scala:45) finished in 0.009 s
16:58:07.380 [main] INFO org.apache.spark.SparkContext - Job finished: aggregate at AreaUnderCurve.scala:45, took 0.013970632 s
Area under ROC = 0.6766691672918231
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 84.882 sec
Running org.graphify.core.api.extraction.FeaturesTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.008 sec <<< FAILURE!
Running org.graphify.core.api.selection.FeatureSelectorTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.034 sec <<< FAILURE!
Running org.graphify.core.kernel.impl.manager.NodeManagerTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.065 sec
Running org.graphify.core.kernel.impl.util.GraphManagerTest
{0} is known {1}
Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0 sec
Running org.graphify.core.kernel.impl.util.VectorUtilTest
16:58:07.672 [sparkDriver-akka.actor.default-dispatcher-20] INFO o.apache.spark.storage.BlockManager - Removing broadcast 210
16:58:07.672 [sparkDriver-akka.actor.default-dispatcher-20] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_210
16:58:07.672 [sparkDriver-akka.actor.default-dispatcher-20] INFO org.apache.spark.storage.MemoryStore - Block broadcast_210 of size 5216 dropped from memory (free 992520356)
16:58:07.672 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 210
16:58:07.673 [sparkDriver-akka.actor.default-dispatcher-5] INFO o.apache.spark.storage.BlockManager - Removing broadcast 209
16:58:07.673 [sparkDriver-akka.actor.default-dispatcher-5] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_209
16:58:07.673 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.storage.MemoryStore - Block broadcast_209 of size 4968 dropped from memory (free 992525324)
16:58:07.673 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 209
16:58:07.676 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing RDD 215
16:58:07.679 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block rdd_215_0
16:58:07.680 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block rdd_215_0 of size 6232 dropped from memory (free 992531556)
16:58:07.681 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned RDD 215
16:58:07.681 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing broadcast 208
16:58:07.681 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_208
16:58:07.681 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block broadcast_208 of size 3152 dropped from memory (free 992534708)
16:58:07.681 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 208
16:58:07.683 [sparkDriver-akka.actor.default-dispatcher-5] INFO o.apache.spark.storage.BlockManager - Removing broadcast 207
16:58:07.683 [sparkDriver-akka.actor.default-dispatcher-5] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_207
16:58:07.683 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.storage.MemoryStore - Block broadcast_207 of size 2752 dropped from memory (free 992537460)
16:58:07.683 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 207
16:58:07.683 [sparkDriver-akka.actor.default-dispatcher-15] INFO o.apache.spark.storage.BlockManager - Removing broadcast 206
16:58:07.683 [sparkDriver-akka.actor.default-dispatcher-15] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_206
16:58:07.683 [sparkDriver-akka.actor.default-dispatcher-15] INFO org.apache.spark.storage.MemoryStore - Block broadcast_206 of size 43128 dropped from memory (free 992580588)
16:58:07.683 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 206
16:58:07.684 [sparkDriver-akka.actor.default-dispatcher-5] INFO o.apache.spark.storage.BlockManager - Removing broadcast 205
16:58:07.684 [sparkDriver-akka.actor.default-dispatcher-5] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_205
16:58:07.684 [sparkDriver-akka.actor.default-dispatcher-5] INFO org.apache.spark.storage.MemoryStore - Block broadcast_205 of size 4080 dropped from memory (free 992584668)
16:58:07.684 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 205
16:58:07.694 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned shuffle 3
16:58:07.695 [sparkDriver-akka.actor.default-dispatcher-20] INFO o.a.s.storage.ShuffleBlockManager - Deleted all files for shuffle 2
16:58:07.695 [sparkDriver-akka.actor.default-dispatcher-15] INFO o.a.s.storage.ShuffleBlockManager - Deleted all files for shuffle 3
16:58:07.695 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned shuffle 2
16:58:07.695 [sparkDriver-akka.actor.default-dispatcher-5] INFO o.a.s.storage.ShuffleBlockManager - Deleted all files for shuffle 1
16:58:07.695 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned shuffle 1
16:58:07.696 [sparkDriver-akka.actor.default-dispatcher-15] INFO o.a.s.storage.ShuffleBlockManager - Deleted all files for shuffle 0
16:58:07.696 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned shuffle 0
16:58:07.696 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing broadcast 161
16:58:07.696 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_161
16:58:07.696 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block broadcast_161 of size 45056 dropped from memory (free 992629724)
16:58:07.696 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 161
16:58:07.696 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing broadcast 160
16:58:07.696 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_160
16:58:07.696 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block broadcast_160 of size 38960 dropped from memory (free 992668684)
16:58:07.696 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 160
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing broadcast 61
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_61
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block broadcast_61 of size 45056 dropped from memory (free 992713740)
16:58:07.697 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 61
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing broadcast 60
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_60
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block broadcast_60 of size 38960 dropped from memory (free 992752700)
16:58:07.697 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 60
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.apache.spark.storage.BlockManager - Removing broadcast 1
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-17] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_1
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-17] INFO org.apache.spark.storage.MemoryStore - Block broadcast_1 of size 4184 dropped from memory (free 992756884)
16:58:07.697 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 1
16:58:07.697 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing RDD 3
16:58:07.698 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block rdd_3_0
16:58:07.698 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block rdd_3_0 of size 4909480 dropped from memory (free 997666364)
16:58:07.698 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned RDD 3
16:58:07.698 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing broadcast 0
16:58:07.698 [sparkDriver-akka.actor.default-dispatcher-14] INFO o.apache.spark.storage.BlockManager - Removing block broadcast_0
16:58:07.698 [sparkDriver-akka.actor.default-dispatcher-14] INFO org.apache.spark.storage.MemoryStore - Block broadcast_0 of size 32728 dropped from memory (free 997699092)
16:58:07.698 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned broadcast 0
[labels=[positive, negative], type=binary]
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.42 sec <<< FAILURE!

Results :

Tests in error:
testExtractFeatures(org.graphify.core.api.extraction.FeaturesTest): Node 0 not found
testCreateFeatureTarget(org.graphify.core.api.selection.FeatureSelectorTest): Node 0 not found
binaryClassificationTest(org.graphify.core.kernel.impl.util.VectorUtilTest)

Tests run: 13, Failures: 0, Errors: 3, Skipped: 3

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:26.963s
[INFO] Finished at: Tue Jun 16 16:58:08 EDT 2015
[INFO] Final Memory: 13M/215M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project graphify: There are test failures.
[ERROR]
[ERROR] Please refer to /home/ciceromar/Downloads/graphify-1.0.0-M01/graphify-1.0.0-M01/src/extension/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project graphify: There are test failures.

Please refer to /home/ciceromar/Downloads/graphify-1.0.0-M01/graphify-1.0.0-M01/src/extension/target/surefire-reports for the individual test results.
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at org.apache.maven.lifecycle.internal.MojoExecutor.executeForkedExecutions(MojoExecutor.java:365)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:199)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoFailureException: There are test failures.

Please refer to /home/ciceromar/Downloads/graphify-1.0.0-M01/graphify-1.0.0-M01/src/extension/target/surefire-reports for the individual test results.
at org.apache.maven.plugin.surefire.SurefireHelper.reportExecution(SurefireHelper.java:87)
at org.apache.maven.plugin.surefire.SurefirePlugin.writeSummary(SurefirePlugin.java:641)
at org.apache.maven.plugin.surefire.SurefirePlugin.handleSummary(SurefirePlugin.java:615)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:137)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:98)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 23 more
[ERROR]
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

1.0.0-M01 Functional Specification

#1.0.0-M01 Functional Specification

The Graphify alpha release implements its own classifier, in the form of cosine similarity. The cosine similarity classifier will be pulled out and replaced by an assortment of classifiers from Apache Spark MLlib.

Decouple Machine Learning from Feature Extraction

There will be two types of classifiers available, binary classifiers and multi-class classifiers. For binary classifiers, logistic regression will be used. For multi-class classifiers, Naive Bayes will be used.

Graphify's core will become a multi-dimensional feature extraction and selection library. "Training" Graphify's graph model will be disambiguated and replaced with the term "Feature Extraction".

The training of learning models will now be done exclusively in the Apache Spark module of Graphify.

Feature Extraction Module

The feature extraction module learns features using hierarchical pattern recognition, described here: http://www.kennybastani.com/2014/07/using-3d-visualization-to-debug-graph.html

Users will be able to extract features by ingesting text of any length through a REST API endpoint.

Example:

Extract features

URL: http://localhost:7474/services/graphify/features/extract

POST:

{
   "label":[
      "Document classification"
   ],
   "text":[
      "Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach."
   ]
}

The result of feature extraction builds upon previously extracted features, stored in Neo4j. Running this on an empty database yields the following model:

Graphify Data Model 1

Feature selection module

The feature selection module is used to aggregate features and build feature vectors that will be used to create machine learning models. In the alpha release this was relatively easy to do. You passed in your data, the learning model was automatically updated. While this was easy, it didn't provide much configuration. The feature selection module is meant to allow you to specify which feature vectors you'd like to prepare for training a classifier.

Select features

Once features have been extracted, the next step is to select those features into targets in order to build either a binary classifier or a multi-class classifier.

The result of feature selection produces either a binary target or a multi-class target. Targets are used to select features and to build feature vectors that will be used for building learning models.

Create binary feature target

URL: http://localhost:7474/services/graphify/targets/create

POST:

{
  "labels":[
    "positive",
    "negative"
  ],
  "type": "binary"
}

Response:

{
  "targetId": 1
}

Create multi-class feature target

URL: http://localhost:7474/services/graphify/targets/create

POST:

{
  "labels":[
    "invoice",
    "purchase order",
    "credit memo"
  ],
  "type": "multi"
}

Response:

{
  "targetId": 2
}

Training module

Once feature targets have been built, those targets are used to generate machine learning classifiers.

Build learning models

There are two types of learning models that ingest a feature target and generate a model that is used to classify text. Those types are specified when creating a target. The targetId preserves information on the type of classification algorithm to use. For binary classification, logistic regression is used. For multi-class classification, Naive Bayes is used.

Train learning model

URL: http://localhost:7474/services/graphify/models/train/{targetId}

Example POST to http://localhost:7474/services/graphify/models/train/1:

{
  "trainingRatio": .5
}
  • trainingRatio: The percent of the training data from feature extraction you would like to train on, the remaining data will be used to score the model's accuracy.

Response:

{
  "modelId": 1,
  "accuracy": 0.9652241686460808
}

Update learning model

URL: http://localhost:7474/services/graphify/models/update/{modelId}

POST:

{
  "trainingRatio": .6
}

Response:

{
  "modelId": 1,
  "accuracy": 0.981237734532345
}

Classification module

The classification module is used to predict a class from a machine learning model on some unlabled input.

Classify text

URL: http://localhost:7474/services/graphify/classify/{modelId}

Example POST to http://localhost:7474/services/graphify/classify/1:

{
  "text":"it is movies like these that make a jaded movie viewer thankful for the invention of the timex indiglo watch"
}

Response:

[
  {
  "label":"positive",
  "confidence":0.76
  },
  {
    "label":"negative",
    "confidence":0.24
  }
]

Training Step - Internal Server Error (500)

Attempting to train the network using cURL requests. The first few worked, then I start getting status code: 500 and cannot add anymore and get the response below or a timeout.

HTTP/1.1 500 Server Error Content-Type: text/html; charset=ISO-8859-1 Cache-Control: must-revalidate,no-cache,no-store Content-Length: 0 Server: Jetty(9.0.z-SNAPSHOT)

Neo4j server is running fine. Can execute cyphers and view data.

Has this happened to anybody? Do you think it has anything to do with the data already added? Here is what it looks like
untitled-1

Multiple graphs

Hi Kenny,

I'm wondering if there is a way to use a single instance of neo4j + graphify to train multiple graphs. For example, movie reviews and book reviews in a single database but unrelated. During classification you would need to specify which graph you want to use to classify. The current labels look like "class", "pattern" and "data". Presumably these would need to be duplicated for this case. Does this make sense? Any way to do it currently?

Thanks!

"java.lang.IllegalArgumentException: Comparison method violates its general contract!" on classification

This happens for me now when I send a simple query to http://localhost:7474/service/graphify/classify.

java.lang.IllegalArgumentException: Comparison method violates its general contract! [java.util.TimSort.mergeLo(TimSort.java:773), java.util.TimSort.mergeAt(TimSort.java:510), java.util.TimSort.mergeCollapse(TimSort.java:437), java.util.TimSort.sort(TimSort.java:241), java.util.Arrays.sort(Arrays.java:1507), java.util.ArrayList.sort(ArrayList.java:1439), org.neo4j.nlp.impl.util.VectorUtil.similarDocumentMapForVector(VectorUtil.java:210), org.neo4j.nlp.ext.PatternRecognitionResource.classify(PatternRecognitionResource.java:138), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:483), com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60), com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205), com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75), org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:139), com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288), com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147), com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108), com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147), com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84), com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469), com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400), com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349), com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339), com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416), com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537), com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699), javax.servlet.http.HttpServlet.service(HttpServlet.java:848), org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:698), org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:505), org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:211), org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096), org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432), org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175), org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1030), org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136), org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52), org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97), org.eclipse.jetty.server.Server.handle(Server.java:445), org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268), org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229), org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358), org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601), org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532), java.lang.Thread.run(Thread.java:744)

And there is some sort of scaling issue, I got 2.69 GB of data in my neo4j, which is approximately 11000 training docs, the answer up to the exception takes 6.5 minutes.

Dataset

Hi. Graphify is very interesting.
Where can I find a dataset of documents?

Vincenzo

"java.lang.IllegalArgumentException: Vectors must be of equal length. " when sending a classification request

Hey, I toyed around with graphify a little bit today and I broke it. I have no actual experience when it comes to Neo4j so I don't even know how to reset my "index".

I can't really tell what happened, I trained a couple of thousand of documents having multiple labels (the exact number can vary from document to document) and tried to send a classification request:

curl -H "Content-Type: application/json" -d '{"text": "A document is a written or drawn representation of thoughts. Originating from the Latin Documentum meaning lesson - the verb means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence."}' http://localhost:7474/service/graphify/classify
{"error":"java.lang.IllegalArgumentException: Vectors must be of equal length. [org.neo4j.nlp.impl.util.VectorUtil.dotProduct(VectorUtil.java:25), org.neo4j.nlp.impl.util.VectorUtil.cosineSimilarity(VectorUtil.java:49), org.neo4j.nlp.impl.util.VectorUtil.lambda$similarDocumentMapForVector$13(VectorUtil.java:199), org.neo4j.nlp.impl.util.VectorUtil$$Lambda$23/799655682.accept(Unknown Source), java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183), java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1540), java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512), java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290), java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731), java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289), java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:902), java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1689), java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1644), java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)]"}

I know that the example string has no relation to my documents whatsoever, but it happens with real requests as well. I hat a look at the code but as the last time I did vector space word comparison is ten years ago I have no actual clue what is wrong.

Can I help somehow to debug the problem?

Arabic support

Hello
thanks for sharing your work on deep learning with us.

I test graphify on a `classification problem and it works fine (I'll share the résult and the use case later).

Now, I'm trying to test it with arabic text and it doesn't work as I expected. the nodes Data are created and I can read the text in neo4j (arabic text) but there is no Pattern created !

Before to start digging more in your code, I prefer ask you if you have any idea.

Thanks again.

Build Failure - Test failures

Hi,
I tried to build the plugin, but got stuck with a build error because of failed test. Any ideas what could be wrong?

root@neo:/home/graphify/src/extension# mvn assembly:assembly -DdescriptorId=jar-with-dependencies
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.neo4j.nlp:graphify:jar:1.0.0
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 113, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building graphify 1.0.0
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-assembly-plugin:2.2-beta-5:assembly (default-cli) @ graphify >>>
[WARNING] The POM for org.neo4j:neo4j-cypher-commons:jar:2.1.3 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.neo4j:neo4j-cypher-compiler-1.9:jar:2.0.3 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.neo4j:neo4j-cypher-compiler-2.0:jar:2.0.3 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.neo4j:neo4j-cypher-compiler-2.1:jar:2.1.3 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for com.googlecode.concurrentlinkedhashmap:concurrentlinkedhashmap-lru:jar:1.3.1 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.scala-lang:scala-library:jar:2.10.4 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.eclipse.jetty:jetty-server:jar:9.0.5.v20130815 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] The POM for org.eclipse.jetty:jetty-webapp:jar:9.0.5.v20130815 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[INFO]
[INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ graphify ---
[WARNING] Using platform encoding (ANSI_X3.4-1968 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/graphify/src/extension/src/main/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.0.2:compile (default-compile) @ graphify ---
[INFO] Compiling 16 source files to /home/graphify/src/extension/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.3:testResources (default-testResources) @ graphify ---
[WARNING] Using platform encoding (ANSI_X3.4-1968 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/graphify/src/extension/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.0.2:testCompile (default-testCompile) @ graphify ---
[INFO] Compiling 2 source files to /home/graphify/src/extension/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.10:test (default-test) @ graphify ---
[INFO] Surefire report directory: /home/graphify/src/extension/target/surefire-reports


T E S T S

Running org.neo4j.nlp.impl.manager.NodeManagerTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.809 sec
Running org.neo4j.nlp.impl.util.GraphManagerTest
{0} is known {1}
Tests run: 6, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3 sec <<< FAILURE!

Results :

Tests in error:
testBackwardsPropagation(org.neo4j.nlp.impl.util.GraphManagerTest): org/parboiled/scala/Parser
testLearningManager(org.neo4j.nlp.impl.util.GraphManagerTest): org/neo4j/cypher/internal/compiler/v2_1/parser/CypherParser
testCypherJsonResult(org.neo4j.nlp.impl.util.GraphManagerTest): org/neo4j/cypher/internal/compiler/v2_1/parser/CypherParser

Tests run: 8, Failures: 0, Errors: 3, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 21.142s
[INFO] Finished at: Thu Sep 11 15:51:27 EDT 2014
[INFO] Final Memory: 30M/72M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.10:test (default-test) on project graphify: There are test failures.
[ERROR]
[ERROR] Please refer to /home/graphify/src/extension/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

multiple bindings in extension error

Hi,

I seem to have an issue with the build, causing the extension not to load correctly.
the log file is below, and when i try to access /service through CURL, nothing is returned, and when i try through neo4j admin :GET /service/... (as per the examples),
I get a error 400

I am running osx mavericks with java 1.7u55 (I think), and 1.8 jdks both installed.

do i need modify the pom.xml to specifically use one of the jdks etc? and if so how?

I also noticed during the build the following, however the tests ran ok, and the .jar was built.

maven build extract

[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for org.neo4j.nlp:graphify:jar:1.0.0
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 113, column 21
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 

log file

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/system/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/plugins/graphify-1.0.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13:52:54,888 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback.groovy]
13:52:54,889 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
13:52:54,889 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/system/lib/neo4j-server-2.1.3.jar!/logback.xml]
13:52:54,890 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.
13:52:54,891 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/plugins/graphify-1.0.0-jar-with-dependencies.jar!/logback.xml]
13:52:54,891 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/system/lib/neo4j-server-2.1.3.jar!/logback.xml]
13:52:54,939 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@5afa04c - URL [jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/system/lib/neo4j-server-2.1.3.jar!/logback.xml] is not of type file
13:52:55,164 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not set
13:52:55,179 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
13:52:55,196 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - Naming appender as [STDOUT]
13:52:55,270 |-INFO in ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
13:52:55,410 |-INFO in ch.qos.logback.classic.joran.action.RootLoggerAction - Setting level of ROOT logger to INFO
13:52:55,410 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]
13:52:55,411 |-INFO in ch.qos.logback.classic.joran.action.ConfigurationAction - End of configuration.
13:52:55,417 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@6ea12c19 - Registering current configuration as safe fallback point

2014-08-27 03:52:56.456+0000 INFO  [API] Setting startup timeout to: 120000ms based on -1
2014-08-27 03:52:59.540+0000 INFO  [API] Successfully started database
2014-08-27 03:52:59.764+0000 INFO  [API] Starting HTTP on port :7474 with 40 threads available
2014-08-27 03:53:00.255+0000 INFO  [API] Enabling HTTPS on port :7473
2014-08-27 03:53:00.729+0000 INFO  [API] Mounted discovery module at [/]
2014-08-27 03:53:00.751+0000 INFO  [API] Mounted REST API at [/db/data/]
2014-08-27 03:53:00.759+0000 INFO  [API] Mounted management API at [/db/manage/]
2014-08-27 03:53:00.761+0000 INFO  [API] Mounted third-party JAX-RS package [org.neo4j.nlp.ext] at [/service]
2014-08-27 03:53:00.761+0000 INFO  [API] Mounted webadmin at [/webadmin]
2014-08-27 03:53:00.762+0000 INFO  [API] Mounted Neo4j Browser at [/browser]
2014-08-27 03:53:00.978+0000 INFO  [API] Mounting static content at [/webadmin] from [webadmin-html]
2014-08-27 03:53:01.120+0000 INFO  [API] Mounting static content at [/browser] from [browser]
13:53:01.125 [main] WARN  o.e.j.server.handler.ContextHandler - o.e.j.s.ServletContextHandler@4d0b0fd4{/,null,null} contextPath ends with /
13:53:01.126 [main] WARN  o.e.j.server.handler.ContextHandler - Empty contextPath
13:53:01.131 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.0.5.v20130815
13:53:01.211 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.h.MovedContextHandler@393881f0{/,null,AVAILABLE}
13:53:01.419 [main] INFO  o.e.j.w.StandardDescriptorProcessor - NO JSP Support for /webadmin, did not find org.apache.jasper.servlet.JspServlet
13:53:01.459 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext@51dbd6e4{/webadmin,jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/system/lib/neo4j-server-2.1.3-static-web.jar!/webadmin-html,AVAILABLE}
13:53:06.648 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@12abdfb{/service,null,AVAILABLE}
13:53:07.016 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@53b8afea{/db/manage,null,AVAILABLE}
13:53:07.484 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@4743a322{/db/data,null,AVAILABLE}
13:53:07.537 [main] INFO  o.e.j.w.StandardDescriptorProcessor - NO JSP Support for /browser, did not find org.apache.jasper.servlet.JspServlet
13:53:07.540 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.w.WebAppContext@13fed1ec{/browser,jar:file:/usr/local/Cellar/neo4j/2.1.3/libexec/system/lib/neo4j-browser-2.1.3.jar!/browser,AVAILABLE}
13:53:07.725 [main] INFO  o.e.j.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@4d0b0fd4{/,null,AVAILABLE}
13:53:07.776 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@17690e14{HTTP/1.1}{localhost:7474}
13:53:08.232 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@317a118b{SSL-HTTP/1.1}{localhost:7473}
2014-08-27 03:53:08.245+0000 INFO  [API] Server started on: http://localhost:7474/
2014-08-27 03:53:08.246+0000 INFO  [API] Remote interface ready and available at [http://localhost:7474/]

Increase Classification Accuracy

The classification accuracy in the 1.0.0 build maxes out at 70% accuracy for sentiment analysis on movies reviews in the Cornell dataset.

The following feature enhancement is proposed for increasing the accuracy to over 75%.

Add a HAS_AFFINITY relationship to the Neo4j property graph between Pattern nodes.

HAS_AFFINITY

The weight property is incremented each time two patterns are matched within the same input.

Using this new data model it is possible to run a PageRank calculation on the subgraph of features/patterns matched on an input.

Pattern Affinity Subgraph

When extracting features from the following input:

The last word in a sentence is interesting

The following JSON map describes the frequency (number of matches on the input), variance (statistical variance of distribution to all training labels), and affinity (the result of PageRank on affinity relationships in the subgraph).

[
    {
        "feature": "{0} {1}",
        "frequency": 4,
        "variance": 0.08652870591125471,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} word {1}",
        "frequency": 1,
        "variance": 0.12858201014657272,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} a sentence is {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.17241379310344815
    },
    {
        "feature": "{0} word in a sentence {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.17241379310344815
    },
    {
        "feature": "{0} a sentence {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} in a sentence is {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.17241379310344815
    },
    {
        "feature": "{0} in {1}",
        "frequency": 1,
        "variance": 0.08652870591125471,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} in a sentence {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} sentence is {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} word in a sentence is {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} a {1}",
        "frequency": 1,
        "variance": 0.08652870591125471,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} is {1}",
        "frequency": 1,
        "variance": 0.08652870591125471,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} word in {1}",
        "frequency": 1,
        "variance": 0.12858201014657272,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} in a {1}",
        "frequency": 1,
        "variance": 0.08652870591125471,
        "affinity": 0.025862068965517244
    },
    {
        "feature": "{0} word in a {1}",
        "frequency": 1,
        "variance": 0.12858201014657272,
        "affinity": 0.17241379310344815
    },
    {
        "feature": "{0} sentence {1}",
        "frequency": 1,
        "variance": 1,
        "affinity": 0.025862068965517244
    }
]

Neo4j server startup not running

Hi,

Can anyone help me

Neo4j server which was previously working, got this error and server is not getting started once i copy the graphify jar into the plugins directory. And where is the server properties in the conf file?

This is the log generated....

2018-03-21 16:38:01.431+0000 INFO ======== Neo4j 3.3.4 ========
2018-03-21 16:38:01.494+0000 INFO Starting...
2018-03-21 16:38:03.431+0000 INFO Bolt enabled on 127.0.0.1:7687.
2018-03-21 16:38:03.469+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e21e98f' was successfully initialized, but failed to start. Please see the attached cause exception "null". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e21e98f' was successfully initialized, but failed to start. Please see the attached cause exception "null".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e21e98f' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:68)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:220)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:111)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:79)
at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:32)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e21e98f' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:466)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:107)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:212)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory, /home/jarvis/Documents/neo4j-community-3.3.4/data/databases/graph.db
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.newFacade(GraphDatabaseFacadeFactory.java:126)
at org.neo4j.server.CommunityNeoServer.lambda$static$0(CommunityNeoServer.java:58)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:88)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:445)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.extension.KernelExtensions@2663e964' failed to initialize. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:427)
at org.neo4j.kernel.lifecycle.LifeSupport.init(LifeSupport.java:62)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:98)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:207)
... 9 more
Caused by: java.lang.AbstractMethodError
at org.neo4j.kernel.extension.KernelExtensions.newInstance(KernelExtensions.java:78)
at org.neo4j.kernel.extension.KernelExtensions.init(KernelExtensions.java:61)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.init(LifeSupport.java:406)
... 12 more
2018-03-21 16:38:03.479+0000 INFO Neo4j Server shutdown initiated by request

junit test failed - sentimentAnalysisTest

There are some hard code paths for the input files once corrected (pointed to my local resource whihc I assumed was the correct input) test ran but failed with

Training: 598
Training: 599
{all=0.51, negative=0.65, positive=0.36}
junit.framework.AssertionFailedError
at org.neo4j.nlp.impl.util.VectorUtilTest.sentimentAnalysisTest(VectorUtilTest.java:75)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:78)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:212)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)

Process finished with exit code 255
{all=0.51, negative=0.65, positive=0.37}

expected result >.50 for positive and negative

Starting Neo4j Server failed

Getting the following error when adding plugin to neo4j-community 2.2.1 32bit.
Cloned from the master branch.
Java version 7u45 32bit
Windows 64bit.

Starting Neo4j Server failed: javax.servlet.ServletException: org.neo4j.server.web.NeoServletContainer-259749@3b1f74ba==org.neo4j.server.web.NeoServletContainer,-1,false

Empty classification array

Hello,

I setup graphify on an Amazon ec2 server and all seems to be working except I get an empty array when attempting to classify the example text. I have been playing around with it for hours now and can't seem to get it to work.. What am I doing wrong?

curl --user neo4j:PASSWORD -H "Content-Type: application/json" -d '{"label": ["Document classification"], "text": ["Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach."]}' http://localhost:7474/service/graphify/training
{"success":"true"}

Then:
sudo curl --user neo4j:PASSWORD -H "Content-Type: application/json" -d '{"text": "A document is a written or drawn representation of thoughts. Originating from the Latin Documentum meaning lesson - the verb means to teach, and is pronounced similarly, in the past it was usually used as a term for a written proof used as evidence."}' http://localhost:7474/service/graphify/classify
{"classes":[]}

Thank you!

-Dan

Build fails in test stage

Here is the log:

[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.neo4j.nlp:graphify:jar:1.0.0
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-compiler-plugin is missing. @ line 189, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] graphify
[INFO] myproject
[INFO]
[INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building myproject 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] >>> maven-assembly-plugin:2.2-beta-5:assembly (default-cli) @ graphify >>>
[INFO]
[INFO] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[INFO] Forking graphify 1.0.0
[INFO] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ graphify ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/mohan/graphify-master/src/extension/src/main/resources
[INFO]
[INFO] --- maven-scala-plugin:2.15.2:compile (scala-compile-first) @ graphify ---
[INFO] Checking for multiple versions of scala
[WARNING] Expected all dependencies to require Scala version: 2.10.4
[WARNING] org.neo4j:neo4j-cypher:2.2.1 requires scala version: 2.10.5
[WARNING] Multiple versions of scala libraries detected!
[INFO] includes = [/*.java,/.scala,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ graphify ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ graphify ---
[WARNING] Using platform encoding (UTF-8 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /home/mohan/graphify-master/src/extension/src/test/resources
[INFO]
[INFO] --- maven-scala-plugin:2.15.2:testCompile (scala-test-compile) @ graphify ---
[INFO] Checking for multiple versions of scala
[WARNING] Expected all dependencies to require Scala version: 2.10.4
[WARNING] org.neo4j:neo4j-cypher:2.2.1 requires scala version: 2.10.5
[WARNING] Multiple versions of scala libraries detected!
[INFO] includes = [**/
.java,**/*.scala,]
[INFO] excludes = []
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ graphify ---
[INFO] Nothing to compile - all classes are up to date
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ graphify ---
[INFO] Surefire report directory: /home/mohan/graphify-master/src/extension/target/surefire-reports


T E S T S

Running org.neo4j.nlp.impl.util.GraphManagerTest
{0} is known {1}
[{"label":"document","rating":4.1},{"label":"ensemble","rating":2.6},{"label":"sentence","rating":2.6},{"label":"paragraph","rating":2.6}]
[{"label":"ensemble","rating":4.500000000000001},{"label":"sentence","rating":4.500000000000001},{"label":"paragraph","rating":4.500000000000001},{"label":"document","rating":4.5}]
[{"label":"paragraph","rating":7.000000000000001},{"label":"ensemble","rating":5.500000000000001},{"label":"document","rating":5.500000000000001},{"label":"sentence","rating":5.500000000000001}]
FEATURE VECTOR for 'The last word in a sentence is interesting'
[0.21052631578947367, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.0, 0.0, 0.0, 0.0, 0.0, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.05263157894736842, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
COSINE SIMILARITY
0.7419354838709676
COSINE SIMILARITY for v1 and v3
0.7419354838709676
COSINE SIMILARITY for v1 and v4
0.7332355751067664
{"classes":[{"class":"paragraph","similarity":0.7201690119262738},{"class":"sentence","similarity":0.19104118698371517},{"class":"ensemble","similarity":0.180837591224175}]}
{"classes":[{"class":"paragraph","similarity":0.7201690119262738},{"class":"sentence","similarity":0.19104118698371517},{"class":"ensemble","similarity":0.180837591224175}]}
[{"feature":"{0} {1}","frequency":4,"variance":0.08652870591125471,"affinity":0.027027027027026983},{"feature":"{0} in {1}","frequency":1,"variance":0.08652870591125471,"affinity":0.027027027027026983},{"feature":"{0} a {1}","frequency":1,"variance":0.08652870591125471,"affinity":0.027027027027026983},{"feature":"{0} in a sentence {1}","frequency":1,"variance":0.30932024237944566,"affinity":0.027027027027026983},{"feature":"{0} sentence {1}","frequency":1,"variance":0.1398492871960801,"affinity":0.027027027027026983},{"feature":"{0} sentence is {1}","frequency":1,"variance":0.30932024237944566,"affinity":0.027027027027026983},{"feature":"{0} word in a {1}","frequency":1,"variance":0.0817755635771097,"affinity":0.027027027027026983},{"feature":"{0} word {1}","frequency":1,"variance":0.07969562544474496,"affinity":0.027027027027026983},{"feature":"{0} in a {1}","frequency":1,"variance":0.08652870591125471,"affinity":0.027027027027026983},{"feature":"{0} word in a sentence is {1}","frequency":1,"variance":0.30932024237944566,"affinity":0.027027027027026983},{"feature":"{0} word in a sentence {1}","frequency":1,"variance":0.30932024237944566,"affinity":0.027027027027026983},{"feature":"{0} a sentence {1}","frequency":1,"variance":0.1398492871960801,"affinity":0.027027027027026983},{"feature":"{0} in a sentence is {1}","frequency":1,"variance":0.30932024237944566,"affinity":0.027027027027026983},{"feature":"{0} word in {1}","frequency":1,"variance":0.0817755635771097,"affinity":0.027027027027026983},{"feature":"{0} is {1}","frequency":1,"variance":0.08652870591125471,"affinity":0.027027027027026983},{"feature":"{0} a sentence is {1}","frequency":1,"variance":0.30932024237944566,"affinity":0.027027027027026983},{"feature":"{0} a ensemble {1}","frequency":0,"variance":0.21114206209178935,"affinity":0.027027027027026983},{"feature":"{0} word in a paragraph is {1}","frequency":0,"variance":1.0,"affinity":0.027027027027026983},{"feature":"{0} in a document {1}","frequency":0,"variance":0.1414213562373095,"affinity":0.027027027027026983},{"feature":"{0} note in a ensemble {1}","frequency":0,"variance":0.40551750201988135,"affinity":0.027027027027026983},{"feature":"{0} note in a ensemble is {1}","frequency":0,"variance":0.40551750201988135,"affinity":0.027027027027026983},{"feature":"{0} a paragraph is {1}","frequency":0,"variance":1.0,"affinity":0.027027027027026983},{"feature":"{0} in a ensemble {1}","frequency":0,"variance":0.3344847472398428,"affinity":0.027027027027026983},{"feature":"{0} in a paragraph {1}","frequency":0,"variance":0.1384187178777149,"affinity":0.027027027027026983},{"feature":"{0} a ensemble is {1}","frequency":0,"variance":0.3344847472398428,"affinity":0.027027027027026983},{"feature":"{0} word in a document {1}","frequency":0,"variance":0.1414213562373095,"affinity":0.027027027027026983},{"feature":"{0} in a paragraph is {1}","frequency":0,"variance":1.0,"affinity":0.027027027027026983},{"feature":"{0} a paragraph {1}","frequency":0,"variance":0.09701703390049769,"affinity":0.027027027027026983},{"feature":"{0} a document {1}","frequency":0,"variance":0.1414213562373095,"affinity":0.027027027027026983},{"feature":"{0} paragraph {1}","frequency":0,"variance":0.09701703390049769,"affinity":0.027027027027026983},{"feature":"{0} ensemble is {1}","frequency":0,"variance":0.21114206209178935,"affinity":0.027027027027026983},{"feature":"{0} ensemble {1}","frequency":0,"variance":0.14304481760810453,"affinity":0.027027027027026983},{"feature":"{0} word in a paragraph {1}","frequency":0,"variance":0.1384187178777149,"affinity":0.027027027027026983},{"feature":"{0} paragraph is {1}","frequency":0,"variance":0.1384187178777149,"affinity":0.027027027027026983},{"feature":"{0} document is {1}","frequency":0,"variance":0.1414213562373095,"affinity":0.027027027027026983},{"feature":"{0} in a ensemble is {1}","frequency":0,"variance":0.40551750201988135,"affinity":0.027027027027026983},{"feature":"{0} document {1}","frequency":0,"variance":0.0823030195043518,"affinity":0.027027027027026983}]
[{feature={0} paragraph is interesting {1}, affinity=0.5263157894736842}, {feature={0} word in a paragraph is {1}, affinity=0.5263157894736842}, {feature={0} a paragraph is {1}, affinity=0.5263157894736842}, {feature={0} in a paragraph is {1}, affinity=0.5263157894736842}, {feature={0} in a ensemble is {1}, affinity=0.22907454048362486}, {feature={0} note in a ensemble is {1}, affinity=0.22907454048362486}, {feature={0} note in a ensemble {1}, affinity=0.22907454048362486}, {feature={0} a ensemble is {1}, affinity=0.19355816309360557}, {feature={0} in a ensemble {1}, affinity=0.19355816309360557}, {feature={0} a sentence is {1}, affinity=0.180975910663407}, {feature={0} in a sentence is {1}, affinity=0.180975910663407}, {feature={0} word in a sentence {1}, affinity=0.180975910663407}, {feature={0} word in a sentence is {1}, affinity=0.180975910663407}, {feature={0} sentence is {1}, affinity=0.180975910663407}, {feature={0} in a sentence {1}, affinity=0.180975910663407}, {feature={0} interesting {1}, affinity=0.1441669196714421}, {feature={0} ensemble is {1}, affinity=0.13188682051957887}, {feature={0} a ensemble {1}, affinity=0.13188682051957887}, {feature={0} is interesting {1}, affinity=0.10964912280701752}]
[{feature={0} document is interesting {1}, affinity=0.5555555555555556}, {feature={0} a document is {1}, affinity=0.5555555555555556}, {feature={0} word in a document is {1}, affinity=0.5555555555555556}, {feature={0} in a document is {1}, affinity=0.5555555555555556}, {feature={0} the {1}, affinity=0.5555555555555556}, {feature={0} interesting {1}, affinity=0.17340668575331347}, {feature={0} ensemble is {1}, affinity=0.16112658660145024}, {feature={0} a ensemble {1}, affinity=0.16112658660145024}, {feature={0} is interesting {1}, affinity=0.1388888888888889}]
[{feature={0} word in a sentence is interesting {1}, affinity=0.5277777777777778}, {feature={0} a sentence is interesting {1}, affinity=0.5277777777777778}, {feature={0} in a sentence is interesting {1}, affinity=0.5277777777777778}, {feature={0} sentence is interesting {1}, affinity=0.5277777777777778}, {feature={0} in a ensemble is {1}, affinity=0.23053652878771846}, {feature={0} note in a ensemble is {1}, affinity=0.23053652878771846}, {feature={0} note in a ensemble {1}, affinity=0.23053652878771846}, {feature={0} a ensemble is {1}, affinity=0.19502015139769918}, {feature={0} in a ensemble {1}, affinity=0.19502015139769918}, {feature={0} word in a sentence is {1}, affinity=0.18243789896750062}, {feature={0} a sentence is {1}, affinity=0.18243789896750062}, {feature={0} in a sentence is {1}, affinity=0.18243789896750062}, {feature={0} word in a sentence {1}, affinity=0.18243789896750062}, {feature={0} sentence is {1}, affinity=0.18243789896750062}, {feature={0} in a sentence {1}, affinity=0.18243789896750062}, {feature={0} ensemble is {1}, affinity=0.13334880882367245}, {feature={0} a ensemble {1}, affinity=0.13334880882367245}, {feature={0} is interesting {1}, affinity=0.11111111111111112}]
[{feature={0} note in a ensemble is musical {1}, affinity=0.5277777777777778}, {feature={0} in a ensemble is musical {1}, affinity=0.5277777777777778}, {feature={0} a ensemble is musical {1}, affinity=0.5277777777777778}, {feature={0} ensemble is musical {1}, affinity=0.5277777777777778}, {feature={0} musical {1}, affinity=0.5277777777777778}, {feature={0} note in a ensemble is {1}, affinity=0.23053652878771846}, {feature={0} in a ensemble is {1}, affinity=0.23053652878771846}, {feature={0} note in a ensemble {1}, affinity=0.23053652878771846}, {feature={0} a ensemble is {1}, affinity=0.19502015139769918}, {feature={0} in a ensemble {1}, affinity=0.19502015139769918}, {feature={0} a sentence is {1}, affinity=0.18243789896750062}, {feature={0} in a sentence is {1}, affinity=0.18243789896750062}, {feature={0} word in a sentence {1}, affinity=0.18243789896750062}, {feature={0} word in a sentence is {1}, affinity=0.18243789896750062}, {feature={0} sentence is {1}, affinity=0.18243789896750062}, {feature={0} in a sentence {1}, affinity=0.18243789896750062}, {feature={0} ensemble is {1}, affinity=0.13334880882367245}, {feature={0} a ensemble {1}, affinity=0.13334880882367245}]
Tests run: 6, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 6.33 sec
Running org.neo4j.nlp.impl.util.VectorUtilTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.165 sec <<< FAILURE!
sentimentAnalysisTest(org.neo4j.nlp.impl.util.VectorUtilTest) Time elapsed: 0.142 sec <<< ERROR!
java.lang.NullPointerException
at org.neo4j.nlp.impl.util.VectorUtilTest.readLargerTextFile(VectorUtilTest.java:240)
at org.neo4j.nlp.impl.util.VectorUtilTest.train(VectorUtilTest.java:172)
at org.neo4j.nlp.impl.util.VectorUtilTest.sentimentAnalysisTest(VectorUtilTest.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Running org.neo4j.nlp.impl.manager.NodeManagerTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.211 sec
Running org.neo4j.nlp.impl.traversal.DecisionTreeTest
{0=1, 1=1, 9=1, 2=1, 7=1, 3=1}

┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌─┐
│ │ │ │ │ │ │ │ │ │ │ │
│0├──>│1├──>│2├──>│3├──>│7├──>│9│
│ │ │ │ │ │ │ │ │ │ │ │
└─┘ └─┘ └─┘ └─┘ └─┘ └─┘
May 20, 2015 7:53:24 PM org.neo4j.nlp.helpers.GraphManager getOrCreateNode
INFO: Node 0 not found
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.688 sec <<< FAILURE!
testVertexPath(org.neo4j.nlp.impl.traversal.DecisionTreeTest) Time elapsed: 0.086 sec <<< ERROR!
org.neo4j.graphdb.TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:126)
at org.neo4j.nlp.impl.cache.RelationshipCache.getLongs(RelationshipCache.java:46)
at org.neo4j.nlp.impl.cache.RelationshipCache.getRelationships(RelationshipCache.java:33)
at org.neo4j.nlp.helpers.GraphManager.getRelationships(GraphManager.java:112)
at traversal.DecisionTree.loadBranches(DecisionTree.scala:175)
at traversal.DecisionTree.traverseTo(DecisionTree.scala:240)
at traversal.DecisionTree.traverseTo(DecisionTree.scala:229)
at org.neo4j.nlp.impl.traversal.DecisionTreeTest.testVertexPath(DecisionTreeTest.java:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Caused by: org.neo4j.kernel.api.exceptions.TransactionFailureException: Transaction rolled back even if marked as successful
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.close(KernelTransactionImplementation.java:411)
at org.neo4j.kernel.TopLevelTransaction.close(TopLevelTransaction.java:112)
... 36 more

Results :

Tests in error:
sentimentAnalysisTest(org.neo4j.nlp.impl.util.VectorUtilTest)
testVertexPath(org.neo4j.nlp.impl.traversal.DecisionTreeTest): Transaction was marked as successful, but unable to commit transaction so rolled back.

Tests run: 11, Failures: 0, Errors: 2, Skipped: 2

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] graphify .......................................... SKIPPED
[INFO] myproject ......................................... FAILURE [ 10.342 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.716 s
[INFO] Finished at: 2015-05-20T19:53:24+05:30
[INFO] Final Memory: 17M/256M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project graphify: There are test failures.
[ERROR]
[ERROR] Please refer to /home/mohan/graphify-master/src/extension/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Updating to 3.x

I can see that there hasn't been so much activity lately, but are there any plans of updating graphify to Neo4j 3.x?

Classification time proportional to ?

I have observed that as we train graphify more and more, the size of the neo4j database on disk keeps increasing and beyond a point, each classification request takes more than a few minutes and makes it almost unusable.

Is there a way to train graphify for more accuracy but at the same time keep the classification time within usable limits ( like say 30 seconds or under a minute ? )

To understand the slowup, could you tell me which of the following parameters affect the classification time for a text given to it and how ?

  • The number of labels/classes already known to graphify from previous training requests
  • The total volume of text that has been given to graphify for training.
  • The amount of text given to graphify for classification

Insufficient example documentation

I suppose the examples and howtos given in README.md is very brief. For example I wonder how to get the same results as you achieve e.g.:

curl -X POST -d '{ "text": "Interoperability is the ability of making systems work together."}' http://localhost:7474/service/graphify/classify

gives me:

{"classes":[]}

Some data missing? Can you explain prerequisities less briefly? Thank you.

Build Failure in Test Stage

Clean github clone and build:
java jdk1.8.0_20.jdk

[INFO] ------------------------------------------------------------------------
[INFO] Building graphify 1.0.0
[INFO] ------------------------------------------------------------------------

(...)


T E S T S

Running org.neo4j.nlp.impl.manager.NodeManagerTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.549 sec
Running org.neo4j.nlp.impl.util.GraphManagerTest
{0} is known {1}
Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.005 sec
Running org.neo4j.nlp.impl.util.VectorUtilTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 77.715 sec <<< FAILURE!
sentimentAnalysisTest(org.neo4j.nlp.impl.util.VectorUtilTest) Time elapsed: 77.713 sec <<< ERROR!
java.lang.NullPointerException
at org.neo4j.nlp.impl.util.LearningManager.trainInput(LearningManager.java:82)
at org.neo4j.nlp.impl.util.VectorUtilTest.trainOnText(VectorUtilTest.java:112)
at org.neo4j.nlp.impl.util.VectorUtilTest.train(VectorUtilTest.java:102)
at org.neo4j.nlp.impl.util.VectorUtilTest.sentimentAnalysisTest(VectorUtilTest.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

Results :

Tests in error:
sentimentAnalysisTest(org.neo4j.nlp.impl.util.VectorUtilTest)

Tests run: 9, Failures: 0, Errors: 1, Skipped: 3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.