cmmorrow / sci-analysis Goto Github PK
View Code? Open in Web Editor NEWAn easy to use and powerful python-based data exploration and analysis tool
Home Page: http://sci-analysis.readthedocs.io/en/latest/
License: MIT License
An easy to use and powerful python-based data exploration and analysis tool
Home Page: http://sci-analysis.readthedocs.io/en/latest/
License: MIT License
Code coverage is currently 98%, but need to check what still needs to be covered to reach 100%.
As a user I want to be able to change the color that I want to use for graphing, as well as be able to determine what statistical tests to use. I will like to also be able to determine what graphs I want to use when comparing variables. For documentation, I will like to have an example of applying the analyze function. I am not sure what the vertical parameter in the analyze function is referencing to.
With the release of pandas 0.25, unit tests that check the order of Series with NaN are now failing.
Using the order arg in analyze() doesn't seem to do anything.
The standard error doesn't appear to be factoring into the calculation. Only variance seems to be driving the circle radius.
This is a known issue that will be addressed in 1.4.5
It would be nice to flatten a pandas Series or numpy array passed as a value of the list of values passed to the highlight argument for scatter plots.
analyze(df['One'], name='Column One', title='Distribution from pandas')
/Users/chrismorrow/sci_analysis_qa_env/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
warnings.warn("The 'normed' kwarg is deprecated, and has been "
If a data point is in the highlighted group, and individual selected, the labels aren't shown on the scatter plot.
Currently, the sci-analysis documentation on readthedocs.io is generated from a single jupyter notebook and converted by sphinx. It would be better to break up the documentation by analysis type on individual notebooks that can be linked together by sphinx. That way, each page size is smaller, and can go into more detail on each analysis type.
Occurs when using the analyze method.
Since several of the unit tests involve generating graphs, and matplotlib is changing rapidly, it would be better to have tests assert that a generated image matches an existing reference image.
Matplotlib current has a way to do this, but it's for testing matplotlib itself. This can serve as a guide for how to accomplish writing the new test function.
df = pd.DataFrame(np.random.randn(100, 2), columns=list('xy'))
df['groups'] = np.random.choice(list('ABC'), len(df)).tolist()
df.at[24, 'groups'] = "D"
analyze(df['x'], df['y'], df['groups'])
returns MinimumSizeError: length of D is less than the minimum size 1, instead of running linear regression or spearman test.
This might be intentional, but it could be useful to still perform the analysis with the size 1 group excluded
R^2 value displayed in Linear Regression is the correlation coefficient, r-value from linregress function. Needs to display r-value ** 2
Not sure where the bug is occurring or how the original Series is able to be mutated. The symptom is passing a bool dtype pandas Series to analyze() initially works, but the original Series is than converted to int, where True = 2 and False = 3.
np.random.seed(987654321)
df = pd.DataFrame({'One' : st.norm.rvs(0.0, 1, size=60),
'Two' : st.norm.rvs(0.0, 3, size=60),
'Three' : st.weibull_max.rvs(1.2, size=60),
'Four' : st.norm.rvs(0.0, 1, size=60),
'Month' : ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] * 5,
'Condition' : ['Group A', 'Group B'] * 30})
df
analyze(df['Three'], groups=df['Condition'])
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.