shorrockin / cascal Goto Github PK
View Code? Open in Web Editor NEWa high-level scala based cassandra library
Home Page: http://wiki.github.com/shorrockin/cascal/
License: Apache License 2.0
a high-level scala based cassandra library
Home Page: http://wiki.github.com/shorrockin/cascal/
License: Apache License 2.0
seperate out the embedded cassandra utilities into their own project. This would help isolate dependencies and would provide better re-use.
it would be great if we could separate out the Session API to a trait so that it can be implemented in various ways. e.g. here's a few possible implementations...
In some Cascal code I've found the number of sessions used can greatly affect performance (seems lots of sessions seem to slow cassandra down) so it might be nice to make it easy to tinker with how physical sessions are pooled/shared across threads.
Another issue this helps solve is it soon gets tricky to compose lots of code passing around the pool and/or the current session we've grabbed from the pool (in case another thread accidentally reuses a session from another thread). So hiding the pooling bit really helps keep code clean and correct.
e.g. in the Spring Framework there are lots of 'pool' type objects; JmsTemplate, JdbcTemplate, HibernateTemplate, JpaTemplate and many others. All of them hide how the pool works (usually grabbing one from the pool, associating it with a thread until the operation/transaction completes etc).
For example if we had a trait called SessionTemplate which mirrored the public API of Session right now - then different implementations which either used a pool, or used a single session under the covers with locking or whatever - then user Cascal code could just pass the SessionTemplate around everywhere across threads - then the implementation could be chosen to best suit the applications threading model (after performance tuning and experimentation).
It would also let folks implement their own pools if they want.
When running tests and sample programs in sbt (simple-build-tool) I get the following exception despite calling close and clear on the SessionPool
Is there a way we can make the CascalStatistics deal with this exception silently? Or some way to close down the pool which results in this mbean being deleted?
Caused by: javax.management.InstanceAlreadyExistsException: com.shorrockin.cascal:name=CascalStatistics
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:453)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1484)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:963)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
at com.shorrockin.cascal.jmx.CascalStatistics$.(CascalStatistics.scala:15)
at com.shorrockin.cascal.jmx.CascalStatistics$.(CascalStatistics.scala)
at com.shorrockin.cascal.session.SessionPool.(SessionPool.scala:20)
at com.shorrockin.cascal.session.SessionPool.(SessionPool.scala:17)
Hello
I'm using the following code:
val hosts = Host("127.0.0.1", 9160, 250) :: Nil
val params = new PoolParams(10, ExhaustionPolicy.Fail, 500L, 6, 2)
var pool = new SessionPool(hosts, params, Consistency.One)
pool.borrow { session =>
println("Count Value: " + session.count("KS" \ "CF" \ "K"))
}
Output:
Apr 6, 2010 6:43:53 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Count Value: 308
Apr 6, 2010 6:43:56 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Apr 6, 2010 6:43:56 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Apr 6, 2010 6:43:59 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Apr 6, 2010 6:44:02 PM com.shorrockin.cascal.session.SessionPool$SessionFactory$ makeSession
FINE: attempting to create connection to: Host(127.0.0.1,9160,250)
Stack:
CassandraTestSuite [Scala Application]
org.scalatest.tools.Runner at localhost:46530
Thread main
Thread AWT-Shutdown
Daemon Thread AWT-XAWT
Thread AWT-EventQueue-0
Daemon Thread Timer-0
Daemon Thread Timer-1
/usr/lib/jvm/java-6-sun-1.6.0.15/bin/java (Apr 6, 2010 6:45:16 PM)
The program never ends. Is there any way to shutdown pool threads and stop the process?
This library looks the most promising at least by documentation :)
Thanks!
if a session times out currently (1.1) it is returned to the pool, after which the next time it's used it will often throw "unknown result" errors. This generally only occurs under high load, but regardless the session should be removed from circulation under these circumstances.
Currently the only way to retrieve a session object from the pool is using the borrow method, and while this is nice in that it automatically cleans up after itself sometimes it may be more convenient to self-manage the sessions.
the list(Seq[Key]) api should take in a column container, and return a sequence. This is currently far too restrictive and prohibits you from list a collection of super columns using the thrift multiget_slice api. Additionally, for ordering purposes it should return a seq of tuples instead of a map so that we retain ordering.
create 2.8 branch taking advantage of unique aspects - in particular default param values on case class would help a lot with pool creation.
currently the build uses maven, and while great for deployment, and dependency management the build/test cycle is slow. A switch to running sbt would resolve this issue for the standard development task while maintaining maven for deployment and dependency management should grant the best of both worlds.
cassandra supports some basic simple authentication, which can be accessed through the thrift login api. this api should be exposed through the session template interafe.
Now timestamps are set milliseconds if I use overloaded com.shorrockin.cascal.model.Column constructors.(In code , they are generated by System.currentTimeMillis or new Date.getTime.)
Buy in Cassandra wiki,it is recommended to use microseconds.
I think they should be microseconds.
http://wiki.apache.org/cassandra/DataModel#Columns
written:
Timestamps can be anything you like, but microseconds since 1970 is a convention.
Attached client "cassandra-cli" sets microseconds timestamp.So if I set or delete column value with cassandra-cli, I can't update that column with cascal!
upgrade cascal to use 0.6 release of cassandra - currently using beta3
Cassandra 0.6.1 is quite old, it will be great if Cascal supports cassandra 0.7.0 beta version.
at current, when you list from a column you almost always key a seq[columns] and a keys for which you then manually marshel these into some sort of object that your application layer works with.
It would be very nice to introduce a serialization layer which took in the results of a list (seq[Column]) and marshal that into a class handling common type conversions etc.
I would think you could annotate a scala case class such that you designate which constructor parameters mapped to which columns, define default values, optional values, conversion mechanisms then pass off a seq[columns] to automatically parse and create an instance of this object. For example:
case class Foo(@column("Bar") bar:String, @key id:Long, @column("Fu") fu:Date)
Which would then extract the Bar and Fu columns from the Seq[Columns] along with their key and use these values as the input to create the case class via reflection.
Not sure if this makes sense as part of the session or as an outside serialization layer, so it probably makes sense to do so outside first then move it inside later if it makes sense to do so.
long values are currently converted to strings, then stored. when using long ordering you need to store the byte value instead.
The following conversation was extracted from the Akka mailing list (from ikester). Since we didn't want to derail the conversation away from Akka it was taken here:
Thanks for your insights Chris. I haven't looked at your code yet but I'm just curious: do you absolutely need a different delimiter for supercolumns? Isn't there any other way to determine the kind of Column Family we're dealing with? Like number of arguments, context, etc. (i.e. if 1 token after the Key then StandardColumn , if 2 tokens then SuperColumn)? Couldn't that be achieved with a simple overloaded method signature or am I missing something?
For example:
session.get("Test" \ "Standard" \ "Key" \ "ColumnName") // standard column family
session.get("Test" \ "Super" \ "Key" \ "SuperColumn" \ "ColumnName") // super column family
By the way, I think some of the stuff you're doing with the object mapping is really practical. Perhaps we should continue this discussion on the cascal list?
add facilities to use framed transport option via the thrift interface.
How about making the annotations optional and use reflection to pick up the field names as default column names? I believe that's how some of the newer frameworks do it (e.g. Play).
For example:
class Foo(id: Long, name: String, email: String, dob: Date)
The ColumnFamily would be "Foo", the key would be the value of "id", "name" and "email" would be StandardColumn names with string values and "dob" would be a StandardColumn name with it's value correctly serialized from a Date object.
when mapping object keys should be able to be non-string values - then use the serializer to convert the value into a string.
should be able to map columns where the name is dynamic, for example in the structure presented like:
Friends = {
'a4a70900-24e1-11df-8924-001ff3591711': {
# friend id: timestamp of when the friendship was added
'10cf667c-24e2-11df-8924-001ff3591711': '1267413962580791',
'343d5db2-24e2-11df-8924-001ff3591711': '1267413990076949',
'3f22b5f6-24e2-11df-8924-001ff3591711': '1267414008133277',
},
}
Or at least 959516.
Dear all,
seems like the maven repo is offline.
Can somebody double check on that topic?
Many greets and kind regards,
Christoph
if you try and use cascal on a project/vm that already has slf4j-jcl on the classpath (like trying to write a cassandra persistence adapter for activemq (ticktock/qsandra ; ) ) you get something like this
Caused by: java.lang.StackOverflowError
at java.util.HashMap.get(HashMap.java:298)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:153)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:289)
at org.slf4j.impl.JCLLoggerFactory.getLogger(JCLLoggerFactory.java:69)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:243)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:289)
at org.slf4j.impl.JCLLoggerFactory.getLogger(JCLLoggerFactory.java:69)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:243)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
probably want to leave the decision as to if/whether to push JCL of slf4j or slf4j over jcl to the end user of cascal
I was wondering how to do a slice of keys which are not strings? KeyRange seems to only use Strings. What if I wanted to do Session.list for a range of byte keys?
Is this a time you've gotta just use the Cassandra client directly?
when listing a supercolumn you lose ordering as a Map[SuperColumn, Seq[Column] is returned. This should be modified to return a List[(SuperColumn, Seq[Column])]
currently if there's errors in the object mapping it can be difficult to debug - the error messages need to be better representative of the problem.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.