alanmarazzi / panthera Goto Github PK

View Code? Open in Web Editor NEW

187.0 187.0 15.0 424 KB

Data-frames & arrays on Clojure

License: Eclipse Public License 2.0

Clojure 100.00%

array clojure dataframe numpy pandas python

panthera's People

Stargazers

Watchers

Forkers

cnuernber snehaaganesan daslu qicst23 pink-junkjard clojure-pirates victorinacio ezmiller anthony-khong ozialien randomactsofsoftware learnuidev jrsk23 deadghost nihilee

panthera's Issues

Implementation probably needed for `sort-values`

The following piped transformation shows a gap that we might think about filling on this library:

(defn properties-per-host [df]
  (-> df
      (pt/melt {:id-vars :host_id :value-vars [:listing_id]})
      (pt/groupby :host_id)
      (pt/subset-cols :value)
      (pt/n-unique)
      (pt/data-frame)
      (py. sort_values :by :value :ascending false)
      (pt/rename {:columns {:value :num_unique_listings}}))
  ))

In order to sort the n-unique count on the :value column, it was necesssary as things stand to first cast the result as a data frame (it was a series after the group-by and aggregation fn), and then to call the sort_values method on the :pyobject.

It would be nice to set things up such that we don't need to do these extra steps.

Remove `clj?` option

panthera/src/panthera/pandas/generics.clj

Lines 33 to 69 in a5d13f0

    
           (defn index 
        
             [df-or-srs & clj?] 
        
             (if clj? 
        
               (vec (py/get-attr df-or-srs "index")) 
        
               (py/get-attr df-or-srs "index"))) 
        
           (defn values 
        
             [df-or-srs & clj?] 
        
             (if clj? 
        
               (vec (py/get-attr df-or-srs "values")) 
        
               (py/get-attr df-or-srs "values"))) 
        
           (defn dtype 
        
             [df-or-srs] 
        
             (py/get-attr df-or-srs "dtypes")) 
        
           (defn ftype 
        
             [srs] 
        
             (py/get-attr srs "ftypes")) 
        
           (defn shape 
        
             "Returns the shape of the given object. If a 
        
             [[dataframe]] the first value is the count of rows 
        
             and the second one the count of columns. If a 
        
             [[series]] there are no columns. 
        
             ``` 
        
             (shape df) 
        
             ;; [800 12] 
        
             (shape sr) 
        
             ;; 800 
        
             ```" 
        
             [df-or-srs & clj?] 
        
             (if clj? 
        
               (vec (py/get-attr df-or-srs "shape")) 
        
               (py/get-attr df-or-srs "shape")))

panthera.numpy ns missing

In https://github.com/alanmarazzi/panthera/blob/master/examples/panthera-intro.ipynb the code cell 32 reads:

(require '[panthera.numpy :refer [npy]])

However, no such namespace exists.

Add `mod` to exclusions

panthera/src/panthera/pandas/math.clj

Lines 2 to 3 in a5d13f0

    
           (:refer-clojure 
        
            :exclude [any?])

What is tech.parallel.utils?

It's required in panthera.panthera but I don't actually see anything bringing it in or existing within the repo.

I'm getting errors trying to use panthera and I suspect this is the cause.

How do i drop columns?

Sorry for so basic questions, but how do I drop columns?

I've been trying similar things to this: (-> dataset (pt/drop (pt/subset-cols :columnKeyWord)))
(-> dataset (pt/drop (pt/subset-cols [1 2 3 4]))), etc. but get plenty of errors...

In fact, what I miss is a kind of tutorial mapping the pandas methods to the clojure syntax...Does such a thing exist?

`data-frame` should work with lists and vectors?

I noticed when playing around with the data-frame function that the following works where the input to data frame is a vector of maps:

(data-frame (mapv #(zipmap [:a :b] %) (partition 2 (range 4))))
;;    a  b
;; 0  0  1
;; 1  2  3

But where the input is a list things seem to breakdown:

(data-frame (map #(zipmap [:a :b] %) (partition 2 (range 4))))
;; getting a
;; getting b
;; getting a
;; getting b
;;       a     b
;; 0  None  None
;; 1  None  None

Off hand it seems to me that both should work.

Wrong arg referenced in `filter-rows` body

Keep bools-or-func, it is much clearer

panthera/src/panthera/pandas/generics.clj

Lines 177 to 181 in a5d13f0

    
           (defn filter-rows 
        
             [df-or-srs bools-or-func] 
        
             (if (fn? fltr-or-func) 
        
               (py/get-item df-or-srs (fltr-or-func df-or-srs)) 
        
               (py/get-item df-or-srs fltr-or-func)))

pinkgorilla notebook integration

Hi Alan!
Thanks a lot for your nice library!

I was working on tech.ml and libpython integration with pinkgorilla notebook.
This is where i am currently:
https://github.com/pink-gorilla/python-gorilla
https://github.com/pink-gorilla/python-gorilla/blob/master/README.md

I ported a matplotlib renderer (stolen from @gigasquid) (alpha). This is not
relevant to your 3 demo notebooks; it effects the libpythonclj demo notebooks.

I ported your html and vega render functions.

Note I used a dev snapshot version for notebook dependency; will switch this to clojar
version tomorrow.

https://github.com/pink-gorilla/python-gorilla/blob/master/resources/notebooks/panthera-basic-concepts.cljg
https://github.com/pink-gorilla/python-gorilla/blob/master/resources/notebooks/panthera-intro.cljg
https://github.com/pink-gorilla/python-gorilla/blob/master/resources/notebooks/panthera-objects.cljg

I added the pokemon data.

Pinkgorilla can load public notebook indices via a central database; so my plan
would be to move this notebooks back to your repo, when everything works fine,
and then start adding your github user into the index of public notebooks.

FYI: Pinkgorilla has 3 ways of triggering renderers:

^:R this means render as reagent, using already loaded renderers that have :p/xxx
schema; so typically ^:R [:p/vega ...] or ^:R [:p/phtml ...] or ^:R [p:/text ...]
You can do arbitrary hiccup, so say ^:R [:div [:h1 "pokemon distribution"] [:p/vega ...]]

You can implement Renderable for a type. This is needed say for Images or other stuff
that does not have a representation on cljs. It is being used for all clojure core datatypes.

You can do ^{:p/render-as :p/vega} so you dont need to wrap the payload in another wrappper;
this is experimental.

On the html output - perhaps we can finetune the css for them? Do you know anything about that?

Any other visualizers that would make sense for panthera?

In terms of libpythonclj init - this is a very important. Ithink we will be able to extend the
pinkgorilla secret management, so we can allow custom environments. In the notebook
context I also think we need shutdown routines. So that an old session from another
notebook will not effect the eval on a different notebook.

In terms of tech.ml and libpythonclj: I think I solved the issues we had with the notebook
after chatting with chris Nuernberger: we now require:
[net.java.dev.jna/jna "5.2.0"]
[org.ow2.asm/asm "7.0"]

This two dependencies have fucked up core.async and hawk (filesystem change notifications).
For whatever reason libpython only works with this very recent dependencies.

Any other ideas / wishes from your side?

Best Regards
@awb99

Refactor to support serieses as well and rename as names

panthera/src/panthera/pandas/generics.clj

Lines 167 to 169 in fe81a91

    
           (defn col-names 
        
             [df] 
        
             (py/get-attr df "columns"))

alanmarazzi / panthera Goto Github PK

panthera's People

Stargazers

Watchers

Forkers

panthera's Issues

Implementation probably needed for `sort-values`

Remove `clj?` option

panthera.numpy ns missing

Add `mod` to exclusions

What is tech.parallel.utils?

How do i drop columns?

`data-frame` should work with lists and vectors?

Wrong arg referenced in `filter-rows` body

pinkgorilla notebook integration

Refactor to support serieses as well and rename as names

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	(defn index
	[df-or-srs & clj?]
	(if clj?
	(vec (py/get-attr df-or-srs "index"))
	(py/get-attr df-or-srs "index")))

	(defn values
	[df-or-srs & clj?]
	(if clj?
	(vec (py/get-attr df-or-srs "values"))
	(py/get-attr df-or-srs "values")))

	(defn dtype
	[df-or-srs]
	(py/get-attr df-or-srs "dtypes"))

	(defn ftype
	[srs]
	(py/get-attr srs "ftypes"))

	(defn shape
	"Returns the shape of the given object. If a
	[[dataframe]] the first value is the count of rows
	and the second one the count of columns. If a
	[[series]] there are no columns.

	```
	(shape df)
	;; [800 12]

	(shape sr)
	;; 800
	```"
	[df-or-srs & clj?]
	(if clj?
	(vec (py/get-attr df-or-srs "shape"))
	(py/get-attr df-or-srs "shape")))

	(defn filter-rows
	[df-or-srs bools-or-func]
	(if (fn? fltr-or-func)
	(py/get-item df-or-srs (fltr-or-func df-or-srs))
	(py/get-item df-or-srs fltr-or-func)))