shorrockin / cascal Goto Github PK

View Code? Open in Web Editor NEW

49.0 49.0 19.0 980 KB

a high-level scala based cassandra library

Home Page: http://wiki.github.com/shorrockin/cascal/

License: Apache License 2.0

Java 3.33% Scala 96.67%

cascal's People

Contributors

Stargazers

Watchers

Forkers

imownbey chirino ticktock azinman lannyripple karthik001 tobias-kilian

cascal's Issues

separate out the embedded Cassandra utilities into their own project

seperate out the embedded cassandra utilities into their own project. This would help isolate dependencies and would provide better re-use.

extract a trait for the public Session API so its easier to tweak pooling and compose Cascal code

it would be great if we could separate out the Session API to a trait so that it can be implemented in various ways. e.g. here's a few possible implementations...

use a direct session implementation (like the current Session class)
each operation would use SessionPool.borrow to grab a session from the pool, do the work then return the session to the pool
lazily associate the calling thread a session from the pool using a ThreadLocal (which maybe after an inactive time is then returned to the pool when its not being used). Kinda like the above but basically let methods in the same thread use the same session
share a single session implementation across a number of threads (kinda like option 2)

In some Cascal code I've found the number of sessions used can greatly affect performance (seems lots of sessions seem to slow cassandra down) so it might be nice to make it easy to tinker with how physical sessions are pooled/shared across threads.

Another issue this helps solve is it soon gets tricky to compose lots of code passing around the pool and/or the current session we've grabbed from the pool (in case another thread accidentally reuses a session from another thread). So hiding the pooling bit really helps keep code clean and correct.

e.g. in the Spring Framework there are lots of 'pool' type objects; JmsTemplate, JdbcTemplate, HibernateTemplate, JpaTemplate and many others. All of them hide how the pool works (usually grabbing one from the pool, associating it with a thread until the operation/transaction completes etc).

For example if we had a trait called SessionTemplate which mirrored the public API of Session right now - then different implementations which either used a pool, or used a single session under the covers with locking or whatever - then user Cascal code could just pass the SessionTemplate around everywhere across threads - then the implementation could be chosen to best suit the applications threading model (after performance tuning and experimentation).

It would also let folks implement their own pools if they want.

How to re-run a main using cascal without hitting JMX issues

When running tests and sample programs in sbt (simple-build-tool) I get the following exception despite calling close and clear on the SessionPool

Is there a way we can make the CascalStatistics deal with this exception silently? Or some way to close down the pool which results in this mbean being deleted?

Caused by: javax.management.InstanceAlreadyExistsException: com.shorrockin.cascal:name=CascalStatistics
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:453)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1484)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:963)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
at com.shorrockin.cascal.jmx.CascalStatistics$.(CascalStatistics.scala:15)
at com.shorrockin.cascal.jmx.CascalStatistics$.(CascalStatistics.scala)
at com.shorrockin.cascal.session.SessionPool.(SessionPool.scala:20)
at com.shorrockin.cascal.session.SessionPool.(SessionPool.scala:17)

any way to close pool?

Hello

I'm using the following code:

val hosts  = Host("127.0.0.1", 9160, 250) :: Nil  
val params = new PoolParams(10, ExhaustionPolicy.Fail, 500L, 6, 2)
var pool   = new SessionPool(hosts, params, Consistency.One)

pool.borrow { session => 
  println("Count Value: " + session.count("KS" \ "CF" \ "K"))
}

Output:

Apr 6, 2010 6:43:53 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Count Value: 308
Apr 6, 2010 6:43:56 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Apr 6, 2010 6:43:56 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Apr 6, 2010 6:43:59 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Apr 6, 2010 6:44:02 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)

Stack:

CassandraTestSuite [Scala Application]
org.scalatest.tools.Runner at localhost:46530
Thread main
Thread AWT-Shutdown
Daemon Thread AWT-XAWT
Thread AWT-EventQueue-0
Daemon Thread Timer-0
Daemon Thread Timer-1
/usr/lib/jvm/java-6-sun-1.6.0.15/bin/java (Apr 6, 2010 6:45:16 PM)

The program never ends. Is there any way to shutdown pool threads and stop the process?

This library looks the most promising at least by documentation :)

Thanks!

sessions with timeout should not be returned to the pool

if a session times out currently (1.1) it is returned to the pool, after which the next time it's used it will often throw "unknown result" errors. This generally only occurs under high load, but regardless the session should be removed from circulation under these circumstances.

provide means to access session from pool outside of borrow method

Currently the only way to retrieve a session object from the pool is using the borrow method, and while this is nice in that it automatically cleans up after itself sometimes it may be more convenient to self-manage the sessions.

the list(Seq[Key]) api should take in a column container, and return a sequence.

the list(Seq[Key]) api should take in a column container, and return a sequence. This is currently far too restrictive and prohibits you from list a collection of super columns using the thrift multiget_slice api. Additionally, for ordering purposes it should return a seq of tuples instead of a map so that we retain ordering.

create 2.8 branch taking advantage of unique aspects

create 2.8 branch taking advantage of unique aspects - in particular default param values on case class would help a lot with pool creation.

modify build to work using sbt.

currently the build uses maven, and while great for deployment, and dependency management the build/test cycle is slow. A switch to running sbt would resolve this issue for the standard development task while maintaining maven for deployment and dependency management should grant the best of both worlds.

implement login functionality to cassandra

cassandra supports some basic simple authentication, which can be accessed through the thrift login api. this api should be exposed through the session template interafe.

Timestamps should be microseconds rather than milliseconds

Now timestamps are set milliseconds if I use overloaded com.shorrockin.cascal.model.Column constructors.(In code , they are generated by System.currentTimeMillis or new Date.getTime.)

Buy in Cassandra wiki,it is recommended to use microseconds.
I think they should be microseconds.

http://wiki.apache.org/cassandra/DataModel#Columns
written:
Timestamps can be anything you like, but microseconds since 1970 is a convention.

Attached client "cassandra-cli" sets microseconds timestamp.So if I set or delete column value with cassandra-cli, I can't update that column with cascal!

upgrade cascal to use 0.6 release of cassandra

upgrade cascal to use 0.6 release of cassandra - currently using beta3

Upgrade to Cassandra 0.7.0-beta3

Cassandra 0.6.1 is quite old, it will be great if Cascal supports cassandra 0.7.0 beta version.

Untitled

create object mapping layer to marshel/demarshel seq[columns]

at current, when you list from a column you almost always key a seq[columns] and a keys for which you then manually marshel these into some sort of object that your application layer works with.

It would be very nice to introduce a serialization layer which took in the results of a list (seq[Column]) and marshal that into a class handling common type conversions etc.

I would think you could annotate a scala case class such that you designate which constructor parameters mapped to which columns, define default values, optional values, conversion mechanisms then pass off a seq[columns] to automatically parse and create an instance of this object. For example:

case class Foo(@column("Bar") bar:String, @key id:Long, @column("Fu") fu:Date)

Which would then extract the Bar and Fu columns from the Seq[Columns] along with their key and use these values as the input to create the case class via reflection.

Not sure if this makes sense as part of the session or as an outside serialization layer, so it probably makes sense to do so outside first then move it inside later if it makes sense to do so.

long values are not inserted as appropriate byte values

long values are currently converted to strings, then stored. when using long ordering you need to store the byte value instead.

cascal syntax and use of "\" to delimit paths

The following conversation was extracted from the Akka mailing list (from ikester). Since we didn't want to derail the conversation away from Akka it was taken here:

Thanks for your insights Chris. I haven't looked at your code yet but I'm just curious: do you absolutely need a different delimiter for supercolumns? Isn't there any other way to determine the kind of Column Family we're dealing with? Like number of arguments, context, etc. (i.e. if 1 token after the Key then StandardColumn , if 2 tokens then SuperColumn)? Couldn't that be achieved with a simple overloaded method signature or am I missing something?

For example:
session.get("Test" \ "Standard" \ "Key" \ "ColumnName") // standard column family
session.get("Test" \ "Super" \ "Key" \ "SuperColumn" \ "ColumnName") // super column family

By the way, I think some of the stuff you're doing with the object mapping is really practical. Perhaps we should continue this discussion on the cascal list?

add facilities to use framed transport option.

add facilities to use framed transport option via the thrift interface.

Make annotations optional for object mapping (use smart defaults via reflection)

How about making the annotations optional and use reflection to pick up the field names as default column names? I believe that's how some of the newer frameworks do it (e.g. Play).

For example:
class Foo(id: Long, name: String, email: String, dob: Date)

The ColumnFamily would be "Foo", the key would be the value of "id", "name" and "email" would be StandardColumn names with string values and "dob" would be a StandardColumn name with it's value correctly serialized from a Date object.

when mapping object keys should be able to be non-string values

when mapping object keys should be able to be non-string values - then use the serializer to convert the value into a string.

should be able to map columns where the name is dynamic

should be able to map columns where the name is dynamic, for example in the structure presented like:

Friends = {
'a4a70900-24e1-11df-8924-001ff3591711': {
# friend id: timestamp of when the friendship was added
'10cf667c-24e2-11df-8924-001ff3591711': '1267413962580791',
'343d5db2-24e2-11df-8924-001ff3591711': '1267413990076949',
'3f22b5f6-24e2-11df-8924-001ff3591711': '1267414008133277',
},
}

Upgrade Thrift to 0.5

Or at least 959516.

Maven Repo

Dear all,

seems like the maven repo is offline.

Can somebody double check on that topic?

Many greets and kind regards,

Christoph

might want to make jcl-over-slf4j an optional or provided or test dependency

if you try and use cascal on a project/vm that already has slf4j-jcl on the classpath (like trying to write a cassandra persistence adapter for activemq (ticktock/qsandra ; ) ) you get something like this

Caused by: java.lang.StackOverflowError
at java.util.HashMap.get(HashMap.java:298)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:153)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:289)
at org.slf4j.impl.JCLLoggerFactory.getLogger(JCLLoggerFactory.java:69)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:243)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:289)
at org.slf4j.impl.JCLLoggerFactory.getLogger(JCLLoggerFactory.java:69)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:243)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)

probably want to leave the decision as to if/whether to push JCL of slf4j or slf4j over jcl to the end user of cascal