Code Monkey home page Code Monkey logo

spark-redis-ml's Introduction


Notice RedisML is planned to be replaced by RedisAI, adding support for deep learning.


Spark-Redis-ML

A spark package for loading Spark ML models to Redis-ML

Requirments:

Apache Spark 2.0 or later

Redis build from unstable branch

Jedis

Jedis-ml

Installation:

#get and build redis-ml
git clone https://github.com/RedisLabsModules/redis-ml.git
cd redis-ml/src
make 

#get and build jedis
git clone https://github.com/xetorthio/jedis.git
cd jedis
mvn package -Dmaven.test.skip=true

#get and build jedis-ml
cd..
git clone https://github.com/RedisLabs/jedis-ml.git
cd jedis-ml
mkdir lib
cp ../jedis/target/jedis-3.0.0-SNAPSHOT.jar lib/
mvn install 

#get and build spark-jedis-ml
cd.. 
git clone https://github.com/RedisLabs/spark-redis-ml.git
cd spark-redis-ml
cp ../jedis/target/jedis-3.0.0-SNAPSHOT.jar lib/
cp ../jedis-ml/target/jedis-ml-1.0-SNAPSHOT.jar lib/
sbt assembly

Usage:

Run Redis server with redis-ml module:

/path/to/redis-server --loadmodule ./redis-ml.so

From Spark root directory, Run Spark shell with the required jars:

./bin/spark-shell --jars ../spark-redis-ml/target/scala-2.11/spark-redis-ml-assembly-0.1.0.jar,../spark-redis-ml/lib/jedis-3.0.0-SNAPSHOT.jar,../spark-redis-ml/lib/jedis-ml-1.0-SNAPSHOT.jar

On Spark shell:

scala> :load "../spark-redis-ml/scripts/forest-example.scala"
scala> benchmark(10)

spark-redis-ml's People

Contributors

gkorland avatar mrksmb avatar shaynativ avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spark-redis-ml's Issues

SIGSEV on SAVE or DUMP when using spark-redis-ml

Fresh testing install of Redis, and required software for spark-redis-ml. All default configuration options are being used. The example application for random forest works perfectly as documented, both within spark-shell and in redis-cli. Any time there is an attempt to persist the key that is created, however, the redis server fails with SIGSEV. I tried on multiple machines/kernels. Thanks for any thoughts.

(Ubuntu with 4.2.0-16-generic or Centos 4.6.3, Spark 2.0)

Here is the output from the SAVE Failure

GDB Output

Program received signal SIGSEGV, Segmentation fault.
0x000000000044997a in rdbSaveObject (rdb=rdb@entry=0x7fffe3c7ed20, o=o@entry=0x7f98cca28690) at rdb.c:779
779 mt->rdb_save(&io,mv->value);
(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
getDecodedObject (o=0x10000) at object.c:458
458 if (sdsEncodedObject(o)) {
(gdb)

Code Section

This is an area of code for writing module-specific representations
/* Then write the module-specific representation. */
mt->rdb_save(&io,mv->value);
if (io.ctx) {
moduleFreeContext(io.ctx);
zfree(io.ctx);

=== REDIS BUG REPORT START: Cut & paste starting from here ===
9467:M 25 Apr 07:47:00.606 # Redis 999.999.999 crashed by signal: 11
9467:M 25 Apr 07:47:00.606 # Crashed running the instuction at: 0x44997a
9467:M 25 Apr 07:47:00.606 # Accessing address: (nil)
9467:M 25 Apr 07:47:00.606 # Failed assertion: (:0)

------ STACK TRACE ------
EIP:
./redis-server 127.0.0.1:6379(rdbSaveObject+0xaa)[0x44997a]

Backtrace:
./redis-server 127.0.0.1:6379(logStackTrace+0x45)[0x46a3b5]
./redis-server 127.0.0.1:6379(sigsegvHandler+0xb9)[0x46ab79]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10d10)[0x7f98cd75ed10]
./redis-server 127.0.0.1:6379(rdbSaveObject+0xaa)[0x44997a]
./redis-server 127.0.0.1:6379(rdbSaveKeyValuePair+0xc3)[0x449f83]
./redis-server 127.0.0.1:6379(rdbSaveRio+0x32b)[0x44a5cb]
./redis-server 127.0.0.1:6379(rdbSave+0x97)[0x44a9c7]
./redis-server 127.0.0.1:6379(saveCommand+0x2a)[0x44cd6a]
./redis-server 127.0.0.1:6379(call+0xa6)[0x42b4b6]
./redis-server 127.0.0.1:6379(processCommand+0x3a7)[0x42bbb7]
./redis-server 127.0.0.1:6379(processInputBuffer+0x105)[0x43b885]
./redis-server 127.0.0.1:6379(aeProcessEvents+0x128)[0x425418]
./redis-server 127.0.0.1:6379(aeMain+0x2b)[0x4257bb]
./redis-server 127.0.0.1:6379(main+0x495)[0x4224d5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f98cd3a4ac0]
./redis-server 127.0.0.1:6379(_start+0x29)[0x4227b9]

------ INFO OUTPUT ------

Server

redis_version:999.999.999
redis_git_sha1:c861e1e1
redis_git_dirty:0
redis_build_id:bdfe951cb252ef96
redis_mode:standalone
os:Linux 4.2.0-16-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:5.2.1
process_id:9467
run_id:0ad57117e97bac71c0734ef16afbf941865c4862
tcp_port:6379
uptime_in_seconds:346
uptime_in_days:0
hz:10
lru_clock:16711189
executable:/opt/redis/src/./redis-server
config_file:/opt/redis/redis.conf

Clients

connected_clients:3
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

Memory

used_memory:872352
used_memory_human:851.91K
used_memory_rss:11317248
used_memory_rss_human:10.79M
used_memory_peak:872352
used_memory_peak_human:851.91K
used_memory_peak_perc:100.12%
used_memory_overhead:849162
used_memory_startup:765744
used_memory_dataset:23190
used_memory_dataset_perc:21.75%
total_system_memory:270252163072
total_system_memory_human:251.69G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:12.97
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0

Persistence

loading:0
rdb_changes_since_last_save:1
rdb_bgsave_in_progress:0
rdb_last_save_time:1493105851
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

Stats

total_connections_received:3
total_commands_processed:33
instantaneous_ops_per_sec:0
total_net_input_bytes:29619
total_net_output_bytes:1041334
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:19
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0

Replication

role:master
connected_slaves:0
master_replid:71120b6f0fac83e15adc072556ea43bfd30e8e0f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

CPU

used_cpu_sys:0.16
used_cpu_user:0.09
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

Commandstats

cmdstat_keys:calls=1,usec=18,usec_per_call=18.00
cmdstat_multi:calls=1,usec=1,usec_per_call=1.00
cmdstat_exec:calls=11,usec=221,usec_per_call=20.09
cmdstat_command:calls=1,usec=672,usec_per_call=672.00

Cluster

cluster_enabled:0

Keyspace

db0:keys=1,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=2 addr=127.0.0.1:36254 fd=7 name= age=102 idle=102 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=exec
id=3 addr=127.0.0.1:36256 fd=8 name= age=67 idle=67 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ml.forest.run
id=4 addr=127.0.0.1:36258 fd=9 name= age=17 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=save

------ CURRENT CLIENT INFO ------
id=4 addr=127.0.0.1:36258 fd=9 name= age=17 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=save
argv[0]: 'save'

Here is the bug report from the DUMP failure

=== REDIS BUG REPORT START: Cut & paste starting from here ===
23321:M 25 Apr 15:18:01.975 # Redis 999.999.999 crashed by signal: 11
23321:M 25 Apr 15:18:01.975 # Crashed running the instuction at: 0x44997a
23321:M 25 Apr 15:18:01.975 # Accessing address: (nil)
23321:M 25 Apr 15:18:01.975 # Failed assertion: (:0)

------ STACK TRACE ------
EIP:
./redis-server 127.0.0.1:6379(rdbSaveObject+0xaa)[0x44997a]

Backtrace:
./redis-server 127.0.0.1:6379(logStackTrace+0x45)[0x46a3b5]
./redis-server 127.0.0.1:6379(sigsegvHandler+0xb9)[0x46ab79]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10d10)[0x7f791b021d10]
./redis-server 127.0.0.1:6379(rdbSaveObject+0xaa)[0x44997a]
./redis-server 127.0.0.1:6379(createDumpPayload+0x4a)[0x474cfa]
./redis-server 127.0.0.1:6379(dumpCommand+0x3a)[0x474eca]
./redis-server 127.0.0.1:6379(call+0xa6)[0x42b4b6]
./redis-server 127.0.0.1:6379(processCommand+0x3a7)[0x42bbb7]
./redis-server 127.0.0.1:6379(processInputBuffer+0x105)[0x43b885]
./redis-server 127.0.0.1:6379(aeProcessEvents+0x128)[0x425418]
./redis-server 127.0.0.1:6379(aeMain+0x2b)[0x4257bb]
./redis-server 127.0.0.1:6379(main+0x495)[0x4224d5]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f791ac67ac0]
./redis-server 127.0.0.1:6379(_start+0x29)[0x4227b9]

------ INFO OUTPUT ------

Server

redis_version:999.999.999
redis_git_sha1:c861e1e1
redis_git_dirty:0
redis_build_id:6f3d0b6587bdb364
redis_mode:standalone
os:Linux 4.2.0-16-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:5.2.1
process_id:23321
run_id:e69862945e785d3d0d5fe1585b8c4cfb0a0682c5
tcp_port:6379
uptime_in_seconds:725
uptime_in_days:0
hz:10
lru_clock:16738473
executable:/opt/redis/src/./redis-server
config_file:/opt/redis/redis.conf

Clients

connected_clients:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

Memory

used_memory:830624
used_memory_human:811.16K
used_memory_rss:11157504
used_memory_rss_human:10.64M
used_memory_peak:871264
used_memory_peak_human:850.84K
used_memory_peak_perc:95.34%
used_memory_overhead:815446
used_memory_startup:765744
used_memory_dataset:15178
used_memory_dataset_perc:23.39%
total_system_memory:270252163072
total_system_memory_human:251.69G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:13.43
mem_allocator:jemalloc-4.0.3
active_defrag_running:0
lazyfree_pending_objects:0

Persistence

loading:0
rdb_changes_since_last_save:1
rdb_bgsave_in_progress:0
rdb_last_save_time:1493132756
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

Stats

total_connections_received:4
total_commands_processed:20
instantaneous_ops_per_sec:0
total_net_input_bytes:15754
total_net_output_bytes:1046010
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:8
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0

Replication

role:master
connected_slaves:0
master_replid:057c752b353f7347083ea9e7fa7398571eb8ce48
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

CPU

used_cpu_sys:0.35
used_cpu_user:0.16
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

Commandstats

cmdstat_mget:calls=1,usec=4,usec_per_call=4.00
cmdstat_multi:calls=1,usec=1,usec_per_call=1.00
cmdstat_exec:calls=11,usec=228,usec_per_call=20.73
cmdstat_command:calls=1,usec=608,usec_per_call=608.00

Cluster

cluster_enabled:0

Keyspace

db0:keys=1,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=5 addr=127.0.0.1:36398 fd=7 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=dump

------ CURRENT CLIENT INFO ------
id=5 addr=127.0.0.1:36398 fd=7 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=dump
argv[0]: 'dump'
argv[1]: 'forest-test'
23321:M 25 Apr 15:18:01.978 # key 'forest-test' found in DB containing the following object:
23321:M 25 Apr 15:18:01.978 # Object type: 5
23321:M 25 Apr 15:18:01.978 # Object encoding: 0
23321:M 25 Apr 15:18:01.978 # Object refcount: 1

------ REGISTERS ------
23321:M 25 Apr 15:18:01.978 #
RAX:0000000000000009 RBX:00007f791a4340f0
RCX:0000000000000000 RDX:0000000000000008
RDI:00007ffcb3906960 RSI:0000000001ca8810
RBP:00007ffcb3906a10 RSP:00007ffcb3906950
R8 :000000000000002d R9 :00007f791aa000c0
R10:0000000000000000 R11:00007f791a40d5a0
R12:00007f791a428670 R13:0000000000000000
R14:00054dff3acb9f41 R15:000000000000000f
RIP:000000000044997a EFL:0000000000010206
CSGSFS:0000000000000033
23321:M 25 Apr 15:18:01.978 # (00007ffcb390695f) -> 0000000000474cfa
23321:M 25 Apr 15:18:01.978 # (00007ffcb390695e) -> 000000000000000f
23321:M 25 Apr 15:18:01.978 # (00007ffcb390695d) -> 00054dff3acb9f41
23321:M 25 Apr 15:18:01.978 # (00007ffcb390695c) -> 0000000000000000
23321:M 25 Apr 15:18:01.978 # (00007ffcb390695b) -> 0000000000000000
23321:M 25 Apr 15:18:01.978 # (00007ffcb390695a) -> 00007f791a428680
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906959) -> 00007ffcb3906a10
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906958) -> 0000000000001000
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906957) -> bda66c851946f900
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906956) -> 0000000000000000
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906955) -> 00007ffc00000000
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906954) -> 00007f791a4340f0
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906953) -> 00007ffcb3906a10
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906952) -> 0000000000000009
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906951) -> 0000000000000000
23321:M 25 Apr 15:18:01.978 # (00007ffcb3906950) -> 0000000000000001

------ FAST MEMORY TEST ------
23321:M 25 Apr 15:18:01.979 # Bio thread for job type #0 terminated
23321:M 25 Apr 15:18:01.979 # Bio thread for job type #1 terminated
23321:M 25 Apr 15:18:01.979 # Bio thread for job type #2 terminated
*** Preparing to test memory region 755000 (98304 bytes)
*** Preparing to test memory region 1c9b000 (135168 bytes)
*** Preparing to test memory region 7f7918bfe000 (8388608 bytes)
*** Preparing to test memory region 7f79193ff000 (8388608 bytes)
*** Preparing to test memory region 7f7919c00000 (10485760 bytes)
*** Preparing to test memory region 7f791aa00000 (2097152 bytes)
*** Preparing to test memory region 7f791b00d000 (16384 bytes)
*** Preparing to test memory region 7f791b22b000 (16384 bytes)
*** Preparing to test memory region 7f791b939000 (16384 bytes)
*** Preparing to test memory region 7f791b95a000 (16384 bytes)
*** Preparing to test memory region 7f791b960000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------
Symbol: rdbSaveObject (base: 0x4498d0)
Module: ./redis-server 127.0.0.1:6379 (base 0x400000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x4498d0 -D -b binary -m i386:x86-64 /tmp/dump.bin

23321:M 25 Apr 15:18:02.128 # dump of function (hexdump of 298 bytes):
415741564155415455534889fd4883ec4864488b042528000000488944243831c00fb60689c283e20f0f84b100000080fa010f848802000080fa020f84cf00000080fa030f840e03000080fa040f84fd00000080fa050f85ef0400004c8b6608498b1c2448c74424100000000048897c2418c74424280000000048c744243000000000488b3348895c2420e8f0e2ffff83f8ff0f841702000048984801442410488d7c2410498b742408ff5318488b7c24304885ff740fe854900400488b7c2430e89a8bfeff8b44242885c00f85de010000488b442410eb0e0f1f8000000000e84bf3ffff4898488b4c24386448330c25280000000f85400400004883c4485b5d415c415d415e415fc3660f1f44000083e0f03c200f84910300003c600f856c040000488b7e08488974
Function at 0x447c50 is rdbSaveLen
Function at 0x4929e0 is moduleFreeContext
Function at 0x432530 is zfree
Function at 0x448d00 is rdbSaveStringObject

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

Kmeans on Redis

Sorry for asking this there but i can't find any way to create a redis kmeans model. What i want to achive is to build spark Kmeans model, and then store the model on redis (for a almost real time prediction) the prediction will be done on real time data streams.

Thanks for help !

Does not build with original Spark ML library

I get the error of missing getImpurityStats() method in Spark ML package. I checked the Dockerfile because in Docker it was running fine, then I found out that modified Spark distribution is used there in which this method has been added.
The pull request and the corresponding issue is still in progress on Spark JIRA. Nowhere in the steps, it is mentioned. Also, RedisLabs has written a lot about Spark-Redis integration on its website and Databricks gave presentation and tutorial. So I am wondering why nobody discussed it or is it just me for whom it is breaking?

Some minor issues

The last part of the installation needs its own "mkdir lib". Now, at least on Ubuntu, Maven is part of the standard system, but sbt is not and I was off to a wild goose chase for a while to find a good way to install it. The first two methods I tried were defunct, finally I settled for the script https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt , no doubt that script may disappear or become undesirable with time but it would be a courtesy to users of this module to point to a currently working method.

Also, the Spark script invocation needs to be corrected with the name of the example "ml-forest-example.scala" . I am not really a Spark user, just downloaded it for this purpose and got errors

scala> :load "../redis/spark-redis-ml/scripts/ml-forest-example.scala"
Loading ../redis/spark-redis-ml/scripts/ml-forest-example.scala...
import scala.collection.mutable
import scala.language.reflectiveCalls
import org.apache.spark.ml.{Pipeline, PipelineStage}
import org.apache.spark.ml.classification.{RandomForestClassificationModel, RandomForestClassifier}
import org.apache.spark.ml.feature.{StringIndexer, VectorIndexer}
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.regression.{RandomForestRegressionModel, RandomForestRegressor}
import org.apache.spark.ml.tree.{CategoricalSplit, ContinuousSplit, Split}
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.sql.{SparkSession, _}
import redis.clients.jedis.Protocol.Command
import redis.clients.jedis.{Jedis, _}
import com.redislabs.client.redisml.MLClient
import com.redislabs.provider.redis.ml.Forest
loadData: (spark: org.apache.spark.sql.SparkSession, path: String, format: String, expectedNumFeatures: Option[Int])org.apache.spark.sql.DataFrame
loadDatasets: (input: String, dataFormat: String, testInput: String, algo: String, fracTest: Double)(org.apache.spark.sql.DataFrame, org.apache.spark.sql.DataFrame)
defined class Params
params: Params = Params(file:///root/spark/data/mllib/sample_libsvm_data.txt,,libsvm,classification,5,32,1,0.0,10,auto,0.2,false,None,10)
algo: String = classification
RandomForestExample with parameters:
Params(file:///root/spark/data/mllib/sample_libsvm_data.txt,,libsvm,classification,5,32,1,0.0,10,auto,0.2,false,None,10)
org.apache.spark.sql.AnalysisException: Path does not exist: file:/root/spark/data/mllib/sample_libsvm_data.txt;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at loadData(:54)
at loadDatasets(:54)
... 76 elided
stages: scala.collection.mutable.ArrayBuffer[org.apache.spark.ml.PipelineStage] = ArrayBuffer()
labelColName: String = indexedLabel
res4: Any = ArrayBuffer(strIdx_348bb105c92b)
featuresIndexer: org.apache.spark.ml.feature.VectorIndexer = vecIdx_8176e0e50d19
res5: stages.type = ArrayBuffer(strIdx_348bb105c92b, vecIdx_8176e0e50d19)
dt: org.apache.spark.ml.classification.RandomForestClassifier = rfc_4bee75d8596f
res6: stages.type = ArrayBuffer(strIdx_348bb105c92b, vecIdx_8176e0e50d19, rfc_4bee75d8596f)
pipeline: org.apache.spark.ml.Pipeline = pipeline_b4782508197c
startTime: Long = 22579135088679
:45: error: not found: value training
val pipelineModel = pipeline.fit(training)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.