alanmarazzi / panthera Goto Github PK
View Code? Open in Web Editor NEWData-frames & arrays on Clojure
License: Eclipse Public License 2.0
Data-frames & arrays on Clojure
License: Eclipse Public License 2.0
The following piped transformation shows a gap that we might think about filling on this library:
(defn properties-per-host [df]
(-> df
(pt/melt {:id-vars :host_id :value-vars [:listing_id]})
(pt/groupby :host_id)
(pt/subset-cols :value)
(pt/n-unique)
(pt/data-frame)
(py. sort_values :by :value :ascending false)
(pt/rename {:columns {:value :num_unique_listings}}))
))
In order to sort the n-unique
count on the :value
column, it was necesssary as things stand to first cast the result as a data frame (it was a series after the group-by and aggregation fn), and then to call the sort_values
method on the :pyobject
.
It would be nice to set things up such that we don't need to do these extra steps.
panthera/src/panthera/pandas/generics.clj
Lines 33 to 69 in a5d13f0
In https://github.com/alanmarazzi/panthera/blob/master/examples/panthera-intro.ipynb the code cell 32 reads:
(require '[panthera.numpy :refer [npy]])
However, no such namespace exists.
panthera/src/panthera/pandas/math.clj
Lines 2 to 3 in a5d13f0
It's required in panthera.panthera
but I don't actually see anything bringing it in or existing within the repo.
I'm getting errors trying to use panthera and I suspect this is the cause.
Sorry for so basic questions, but how do I drop columns?
I've been trying similar things to this: (-> dataset (pt/drop (pt/subset-cols :columnKeyWord)))
(-> dataset (pt/drop (pt/subset-cols [1 2 3 4]))), etc. but get plenty of errors...
In fact, what I miss is a kind of tutorial mapping the pandas methods to the clojure syntax...Does such a thing exist?
I noticed when playing around with the data-frame
function that the following works where the input to data frame is a vector of maps:
(data-frame (mapv #(zipmap [:a :b] %) (partition 2 (range 4))))
;; a b
;; 0 0 1
;; 1 2 3
But where the input is a list things seem to breakdown:
(data-frame (map #(zipmap [:a :b] %) (partition 2 (range 4))))
;; getting a
;; getting b
;; getting a
;; getting b
;; a b
;; 0 None None
;; 1 None None
Off hand it seems to me that both should work.
Keep bools-or-func
, it is much clearer
panthera/src/panthera/pandas/generics.clj
Lines 177 to 181 in a5d13f0
Hi Alan!
Thanks a lot for your nice library!
I was working on tech.ml and libpython integration with pinkgorilla notebook.
This is where i am currently:
https://github.com/pink-gorilla/python-gorilla
https://github.com/pink-gorilla/python-gorilla/blob/master/README.md
I ported a matplotlib renderer (stolen from @gigasquid) (alpha). This is not
relevant to your 3 demo notebooks; it effects the libpythonclj demo notebooks.
I ported your html and vega render functions.
Note I used a dev snapshot version for notebook dependency; will switch this to clojar
version tomorrow.
https://github.com/pink-gorilla/python-gorilla/blob/master/resources/notebooks/panthera-basic-concepts.cljg
https://github.com/pink-gorilla/python-gorilla/blob/master/resources/notebooks/panthera-intro.cljg
https://github.com/pink-gorilla/python-gorilla/blob/master/resources/notebooks/panthera-objects.cljg
I added the pokemon data.
Pinkgorilla can load public notebook indices via a central database; so my plan
would be to move this notebooks back to your repo, when everything works fine,
and then start adding your github user into the index of public notebooks.
FYI: Pinkgorilla has 3 ways of triggering renderers:
^:R this means render as reagent, using already loaded renderers that have :p/xxx
schema; so typically ^:R [:p/vega ...] or ^:R [:p/phtml ...] or ^:R [p:/text ...]
You can do arbitrary hiccup, so say ^:R [:div [:h1 "pokemon distribution"] [:p/vega ...]]
You can implement Renderable for a type. This is needed say for Images or other stuff
that does not have a representation on cljs. It is being used for all clojure core datatypes.
You can do ^{:p/render-as :p/vega} so you dont need to wrap the payload in another wrappper;
this is experimental.
On the html output - perhaps we can finetune the css for them? Do you know anything about that?
Any other visualizers that would make sense for panthera?
In terms of libpythonclj init - this is a very important. Ithink we will be able to extend the
pinkgorilla secret management, so we can allow custom environments. In the notebook
context I also think we need shutdown routines. So that an old session from another
notebook will not effect the eval on a different notebook.
In terms of tech.ml and libpythonclj: I think I solved the issues we had with the notebook
after chatting with chris Nuernberger: we now require:
[net.java.dev.jna/jna "5.2.0"]
[org.ow2.asm/asm "7.0"]
This two dependencies have fucked up core.async and hawk (filesystem change notifications).
For whatever reason libpython only works with this very recent dependencies.
Any other ideas / wishes from your side?
Best Regards
@awb99
panthera/src/panthera/pandas/generics.clj
Lines 167 to 169 in fe81a91
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.