Code Monkey home page Code Monkey logo

autoviz's Introduction

AutoViz: The One-Line Automatic Data Visualization Library

logo

Unlock the power of AutoViz to visualize any dataset, any size, with just a single line of code! Plus, now you can get a quick assessment of your dataset's quality and fix DQ issues through the FixDQ() function.

Pepy Downloads Pepy Downloads per week Pepy Downloads per month standard-readme compliant Python Versions PyPI Version PyPI License

With AutoViz, you can easily and quickly generate insightful visualizations for your data. Whether you're a beginner or an expert in data analysis, AutoViz can help you explore your data and uncover valuable insights. Try it out and see the power of automated visualization for yourself!

Table of Contents

Latest

The latest updates about autoviz library can be found in Updates page.

ImportantAnnouncement

Starting with version 0.1.807, an important update regarding Python Version Management

  • We're excited to announce we've made significant updates to our `setup.py` script to dynamically manage dependencies based on your Python version. This means that when you install AutoViz, the installation process will automatically select versions of dependencies such as HoloViews, Bokeh, and hvPlot that are best suited to your specific Python environment.

    For Python versions below 3.10, AutoViz will use versions of its dependencies known to be stable and compatible with older Python versions.

    For Python 3.10, the script has been configured to use updated dependencies that address specific fixes and enhancements relevant to this version.

    For Python 3.11 and newer versions, setup.py ensures compatibility by selecting library versions that support the latest Python features and fixes, including critical updates made in HoloViews for Python 3.11 support.

  • What Does This Mean for You?

  • Easier Installation: This approach allows AutoViz to leverage the latest advancements in our dependencies while maintaining robust support for older Python versions. The installation process is seamless—simply run pip install . in the AutoViz directory, and the script takes care of the rest, tailoring the installation to your environment.
  • Tailored Usage: Choose the visualization backend that works best for your environment. AutoViz will handle the rest, importing necessary libraries as needed.
  • Seamless Compatibility: Users on the latest Python versions (like 3.11 and 3.12) can now enjoy a hassle-free AutoViz experience.
  • How to Update?

    Simply pull the latest version of AutoViz (0.1.801 and higher) from the pip repository. The modular dependency system will be automatically applied.

    Feedback

    Your feedback is crucial! If you encounter any issues or have suggestions, please let us know through GitHub Issues

    Thank you for your continued support and happy visualizing!

    Citation

    If you use AutoViz in your research project or paper, please use the following format for citations:

    "Seshadri, Ram (2020). GitHub - AutoViML/AutoViz: Automatically Visualize any dataset, any size with a single line of code. source code: https://github.com/AutoViML/AutoViz"

    Current citations for AutoViz

    Google Scholar

    Motivation

    The motivation behind the creation of AutoViz is to provide a more efficient, user-friendly, and automated approach to exploratory data analysis (EDA) through quick and easy data visualization plus data quality. The library is designed to help users understand patterns, trends, and relationships in the data by creating insightful visualizations with minimal effort. AutoViz is particularly useful for beginners in data analysis as it abstracts away the complexities of various plotting libraries and techniques. For experts, it provides another expert tool that they can use to provide inights into data that they may have missed.

    AutoViz is a powerful tool for generating insightful visualizations with minimal effort. Here are some of its key selling points compared to other automated EDA tools:

    1. Ease of use: AutoViz is designed to be user-friendly and accessible to beginners in data analysis, abstracting away the complexities of various plotting libraries
    2. Speed: AutoViz is optimized for speed and can generate multiple insightful plots with just a single line of code
    3. Scalability: AutoViz is designed to work with datasets of any size and can handle large datasets efficiently
    4. Automation: AutoViz automates the visualization process, requiring just a single line of code to generate multiple insightful plots
    5. Customization: AutoViz provides several options for customizing the visualizations, such as changing the chart type, color palette, etc.
    6. Data Quality: AutoViz now provides data quality assessment by default and helps you fix DQ issues with a single line of code using the FixDQ() function
    ## Installation

    Prerequisites

    Create a new environment and install the required dependencies to clone AutoViz:

    From PyPi:

    cd <AutoViz_Destination>
    git clone [email protected]:AutoViML/AutoViz.git
    # or download and unzip https://github.com/AutoViML/AutoViz/archive/master.zip
    conda create -n <your_env_name> python=3.7 anaconda
    conda activate <your_env_name> # ON WINDOWS: `source activate <your_env_name>`
    cd AutoViz

    For Python versions below 3.10, install dependencies as follows:

    pip install -r requirements.txt
    

    For Python 3.10, please use:

    pip install -r requirements-py310.txt
    

    For Python 3.11 and above, it's recommended to use:

    pip install -r requirements-py311.txt
    

    These requirement files ensure that AutoViz works seamlessly with your Python environment by installing compatible versions of libraries like HoloViews, Bokeh, and hvPlot. Please select the requirement file that corresponds to your Python version to enjoy a smooth experience with AutoViz.

    Usage

    Discover how to use AutoViz in this Medium article.

    In the AutoViz directory, open a Jupyter Notebook or open a command palette (terminal) and use the following code to instantiate the AutoViz_Class. You can simply run this code step by step:

    from autoviz import AutoViz_Class
    AV = AutoViz_Class()
    dft = AV.AutoViz(filename)

    AutoViz can use any input either filename (in CSV, txt, or JSON format) or a pandas dataframe. If you have a large dataset, you can set the max_rows_analyzed and max_cols_analyzed arguments to speed up the visualization by asking autoviz to sample your dataset.

    AutoViz can also create charts in multiple formats using the chart_format setting:

    • If chart_format ='png' or 'svg' or 'jpg': Matplotlib charts are plotted inline.
      • Can be saved locally (using verbose=2 setting) or displayed (verbose=1) in Jupyter Notebooks.
      • This is the default behavior for AutoViz.
    • If chart_format='bokeh': Interactive Bokeh charts are plotted in Jupyter Notebooks.
    • If chart_format='server', dashboards will pop up for each kind of chart on your browser.
    • If chart_format='html', interactive Bokeh charts will be created and silently saved as HTML files under the AutoViz_Plots directory (under working folder) or any other directory that you specify using the save_plot_dir setting (during input).

    API

    Arguments for AV.AutoViz() method:

    • filename: Use an empty string ("") if there's no associated filename and you want to use a dataframe. In that case, using the dfte argument for the dataframe. Otherwise provide a filename and leave dfte argument with an empty string. Only one of them can be used.
    • sep: File separator (comma, semi-colon, tab, or any column-separating value) if you use a filename above.
    • depVar: Target variable in your dataset; set it as an empty string if not applicable.
    • dfte: name of the pandas dataframe for plotting charts; leave it as empty string if using a filename.
    • header: set the row number of the header row in your file (0 for the first row). Otherwise leave it as 0.
    • verbose: 0 for minimal info and charts, 1 for more info and charts, or 2 for saving charts locally without display.
    • lowess: Use regression lines for each pair of continuous variables against the target variable in small datasets; avoid using for large datasets (>100,000 rows).
    • chart_format: 'svg', 'png', 'jpg', 'bokeh', 'server', or 'html' for displaying or saving charts in various formats, depending on the verbose option.
    • max_rows_analyzed: Limit the max number of rows to use for visualization when dealing with very large datasets (millions of rows). A statistically valid sample will be used by autoviz. Default is 150000 rows.
    • max_cols_analyzed: Limit the number of continuous variables to be analyzed. Defaul is 30 columns.
    • save_plot_dir: Directory for saving plots. Default is None, which saves plots under the current directory in a subfolder named AutoViz_Plots. If the save_plot_dir doesn't exist, it will be created.

    Examples

    Here are some examples to help you get started with AutoViz. If you need full jupyter notebooks with code samples they can be found in examples folder.

    Example 1: Visualize a CSV file with a target variable

    from autoviz import AutoViz_Class
    AV = AutoViz_Class()
    
    filename = "your_file.csv"
    target_variable = "your_target_variable"
    
    dft = AV.AutoViz(
        filename,
        sep=",",
        depVar=target_variable,
        dfte=None,
        header=0,
        verbose=1,
        lowess=False,
        chart_format="svg",
        max_rows_analyzed=150000,
        max_cols_analyzed=30,
        save_plot_dir=None
    )

    var_charts

    Example 2: Visualize a Pandas DataFrame without a target variable:

    import pandas as pd
    from autoviz import AutoViz_Class
    
    AV = AutoViz_Class()
    
    data = {'col1': [1, 2, 3, 4, 5], 'col2': [5, 4, 3, 2, 1]}
    df = pd.DataFrame(data)
    
    dft = AV.AutoViz(
        "",
        sep=",",
        depVar="",
        dfte=df,
        header=0,
        verbose=1,
        lowess=False,
        chart_format="server",
        max_rows_analyzed=150000,
        max_cols_analyzed=30,
        save_plot_dir=None
    )

    server_charts

    Example 3: Generate interactive Bokeh charts and save them as HTML files in a custom directory

    from autoviz import AutoViz_Class
    AV = AutoViz_Class()
    
    filename = "your_file.csv"
    target_variable = "your_target_variable"
    custom_plot_dir = "your_custom_plot_directory"
    
    dft = AV.AutoViz(
        filename,
        sep=",",
        depVar=target_variable,
        dfte=None,
        header=0,
        verbose=2,
        lowess=False,
        chart_format="bokeh",
        max_rows_analyzed=150000,
        max_cols_analyzed=30,
        save_plot_dir=custom_plot_dir
    )

    bokeh_charts

    These examples should give you an idea of how to use AutoViz with different scenarios and settings. By tailoring the options and settings, you can generate visualizations that best suit your needs, whether you're working with large datasets, interactive charts, or simply exploring the relationships between variables.

    Maintainers

    AutoViz is actively maintained and improved by a team of dedicated developers. If you have any questions, suggestions, or issues, feel free to reach out to the maintainers:

    Contributing

    We welcome contributions from the community! If you're interested in contributing to AutoViz, please follow these steps:

    • Fork the repository on GitHub.
    • Clone your fork and create a new branch for your feature or bugfix.
    • Commit your changes to the new branch, ensuring that you follow coding standards and write appropriate tests.
    • Push your changes to your fork on GitHub.
    • Submit a pull request to the main repository, detailing your changes and referencing any related issues.

    See the contributing file!

    License

    AutoViz is released under the Apache License, Version 2.0. By using AutoViz, you agree to the terms and conditions specified in the license.

    Tips

    Here are some additional tips and reminders to help you make the most of the library:

    • Make sure to regularly upgrade AutoViz to benefit from the latest features, bug fixes, and improvements. You can update it using pip install --upgrade autoviz.
    • AutoViz is highly customizable, so don't hesitate to explore and experiment with various settings, such as chart_format, verbose, and max_rows_analyzed. This will allow you to create visualizations that better suit your specific needs and preferences.
    • Remember to delete the AutoViz_Plots directory (or any custom directory you specified) periodically if you used the verbose=2 option, as it can accumulate a large number of saved charts over time.
    • For further guidance or inspiration, check out the Medium article on AutoViz, as well as other online resources and tutorials.
    • AutoViz will visualize any sized file using a statistically valid sample.
    • COMMA is the default separator in the file, but you can change it.
    • Assumes the first row as the header in the file, but this can be changed.
    • By leveraging AutoViz's powerful and flexible features, you can streamline your data visualization process and gain valuable insights more efficiently. Happy visualizing!

    DISCLAIMER

    This project is not an official Google project. It is not supported by Google, and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.

    autoviz's People

    Contributors

    autoviml avatar claretnnamocha avatar eltphd avatar emekaborisama avatar hironroy avatar jbednar avatar morenoh149 avatar risenw avatar rsesha avatar satishjasthi avatar yard1 avatar

    Stargazers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    Watchers

     avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

    autoviz's Issues

    incorrect categorical variable assignment

    I have a dataframe that has 32 columns of type object and float.32
    one of the float.32 columns is treated as categorical by AutoViz.
    How can I exclude that column from being treated as categorical?

    Misplaced graph x ylabel

    Hi Ram,
    I have tried this package and found out a potential bug.
    When I tried to do the AV.AutoViz('', ',', 'target', df) to run an autoViz stuff, the x y labels of each graph are misplaced (x label should be placed at y label and vice versa.). I have tried two datasets and it still happened. Please look into this and see if this is a bug or I just did something wrong. Thanks!
    Jeff

    AutoViz Crashes with the Error

    When I try to apply AutoViz to analizing the data of one of the competitions in Kaggle (namely, https://www.kaggle.com/c/lish-moa/data), it crashes.

    Below is the error trap I get

    Imported AutoViz_Class version: 0.0.68. Call using: 
        from autoviz.AutoViz_Class import AutoViz_Class
        AV = AutoViz_Class()
        AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0,
                                lowess=False,chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30)
                
    To remove previous versions, perform 'pip uninstall autoviz'
    Shape of your Data Set: (21948, 876)
    Classifying variables in data set...
        875 Predictors classified...
            This does not include the Target column(s)
        2 variables removed since they were ID or low-information variables
        List of variables removed: ['sig_id', 'cp_type']
    Since Number of Rows in data 21948 exceeds maximum, randomly sampling 2500 rows for EDA...
    872 numeric variables in data exceeds limit, taking top 40 variables
    Number of numeric variables = 872
        Number of variables removed due to high correlation = 227 
        Adding 1 categorical variables to reduced numeric variables  of 645
    Selected No. of variables = 646 
    Finding Important Features...
    Not able to read or load file. Please check your inputs and try again...
    

    My code to reproduce the problem is provided in https://gist.github.com/gvyshnya/7644fd77567051203ad96d95fbc7ef2a

    I run that code on my local machine (not in a Kaggle kernel). The above-mentioned code expects the data files from the competition to be placed in data subfolder (relative to the folder where you place the python script with the code).

    Below are the key details about my OS and Python Environment

    • Windows 10
    • Python 3.7 in Anaconda
    • AutoViz_Class version: 0.0.68

    The trace from pd.show_versions(as_json=False) on my machine is provided below, just in case

    INSTALLED VERSIONS
    ------------------
    commit           : 2a7d3326dee660824a8433ffd01065f8ac37f7d6
    python           : 3.7.0.final.0
    python-bits      : 64
    OS               : Windows
    OS-release       : 10
    Version          : 10.0.18362
    machine          : AMD64
    processor        : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
    byteorder        : little
    LC_ALL           : None
    LANG             : None
    LOCALE           : None.None
    
    pandas           : 1.1.2
    numpy            : 1.19.2
    pytz             : 2018.5
    dateutil         : 2.7.3
    pip              : 20.1
    setuptools       : 49.2.0
    Cython           : 0.28.5
    pytest           : 5.3.2
    hypothesis       : None
    sphinx           : 1.7.9
    blosc            : None
    feather          : None
    xlsxwriter       : 1.1.0
    lxml.etree       : 4.2.5
    html5lib         : 1.0.1
    pymysql          : None
    psycopg2         : None
    jinja2           : 2.11.1
    IPython          : 6.5.0
    pandas_datareader: None
    bs4              : 4.6.3
    bottleneck       : 1.2.1
    fsspec           : None
    fastparquet      : None
    gcsfs            : None
    matplotlib       : 3.1.2
    numexpr          : 2.6.8
    odfpy            : None
    openpyxl         : 2.5.6
    pandas_gbq       : 0.12.0
    pyarrow          : None
    pytables         : None
    pyxlsb           : None
    s3fs             : None
    scipy            : 1.4.1
    sqlalchemy       : 1.2.11
    tables           : 3.4.4
    tabulate         : 0.8.2
    xarray           : None
    xlrd             : 1.1.0
    xlwt             : 1.3.0
    numba            : 0.48.0
    

    HTMl and BOKEH not output all!

    The autoviz work well for chart_format svg, but BOKEH and HTML not all the dataset work well, when running I encounter:

    ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
    3361 return self._engine.get_loc(casted_key)
    3362 except KeyError as err:
    -> 3363 raise KeyError(key) from err
    3364
    3365 if is_scalar(key) and isna(key) and not self.hasnans:

    KeyError: ''

    After searching through the internet, it seems like the problems is with the pandas!
    Sorry but I do not know how to fix that!
    Kind regard,

    autoviz_test.zip

    JSON file example

    Is there an example where I could use a JSON file with AutoVIZ, seem to be running into errors. Would be good to have a sample file that works.

    Not able to read or load file. Please check your inputs and try again...

    pandas ascii encoder does not work for this file. Continuing...
    pandas utf-8 encoder does not work for this file. Continuing...
    pandas iso-8859-1 encoder does not work for this file. Continuing...
    pandas cp1252 encoder does not work for this file. Continuing...
    pandas latin1 encoder does not work for this file. Continuing...
    Not able to read or load file. Please check your inputs and try again...

    Hello Everyone, this is my first time using Autoviz, after reaing the guide i tried to read a csv dataset and i am getting this error. Is there any way to fix this or shall imake any changes to the csv file before using autoviz. Thanks in advance.

    JupyterLab/Pandas Dataframe/Bokeh leads to: KeyError: "[''] not in index"

    First off- thank you for creating this repository and for the latest Jupyter integration. I'm incredibly excited to use it, thank you for all the hard work!

    Brief Description

    Given .csv file test.csv:

    name,some_string,some_boolean,some_number,some_amt
    Kerry Bullock,RFH63GSB6XC,Yes,7,$92.87
    Anika Stokes,BYU27VYT1LW,No,65,$48.20
    Constance Jensen,KBF13GYN3FV,No,5,$14.28
    Malcolm Alvarez,UUK28QNF8BG,No,90,$27.33
    Clarke Hanson,KKT63JHC7KC,No,9,$28.52
    David Ford,EDC73WSO8PU,No,2,$94.31
    Abbot Combs,RRN89HGS1RT,Yes,71,$89.90
    

    (test.csv)

    When this code is ran in Jupyter Lab:

    import pandas as pd
    from autoviz.AutoViz_Class import AutoViz_Class
    
    df = pd.read_csv('test.csv')
    
    AV.AutoViz(
        filename="",
        dfte=df,
        depVar='',
        verbose=0,
        lowess=False,
        chart_format="bokeh",
    )  

    One bokeh chart is generated and two stack traces are displayed. The first one says:

    ...
    
    ~/anaconda3/envs/reporting/lib/python3.9/site-packages/autoviz/AutoViz_Holo.py in select_widget(each_cat)
        531                 width_size=15
        532                 #######  This is where you plot the histogram of categorical variable input as each_cat ####
    --> 533                 conti_df = dft[[dep,each_cat]].groupby(each_cat).mean().reset_index()
        534                 row_ticks = dft[dep].unique().tolist()
        535                 color_list = next(colors)
    
    ...
    
    KeyError: "[''] not in index"
    

    AutoViz_holo.py, line 533

    Then a chart is displayed, followed by the second stack trace:

    ...
    
    ~/anaconda3/envs/reporting/lib/python3.9/site-packages/autoviz/AutoViz_Holo.py in AutoViz_Holo(filename, sep, depVar, dfte, header, verbose, lowess, chart_format, max_rows_analyzed, max_cols_analyzed)
        192         ls_objects.append(drawobj42)
        193     else:
    --> 194         drawobj41 = dfin[dep].hvplot(kind='bar', color='r', title='Histogram of Target variable').opts(
        195                         height=height_size,width=width_size,color='lightgreen', xrotation=70)
        196         drawobj42 = dfin[dep].hvplot(kind='kde', color='g', title='KDE Plot of Target variable').opts(
    
    ...
    
    KeyError: ''
    

    AutoViz_holo.py, line 194

    In both cases it looks like the code is expecting dep to not be an empty string, and is failing when trying to use the empty string to select a column in the DataFrame.

    Detail of the expected change(s) in behaviour

    At first glance it looks like some additional checks of dep would help, but it also looks like the cats variable may have an empty string in it which may be the cause of the first stack trace. I'd need to do a deeper dive to get a clearer idea.

    Rows limit

    Is there a way to overcome this ? I want autoviz to go through the whole data frame regardless of large number of rows

    AutoViz misidentifies my dependent variable as a categorical variable, which is in fact a continuous variable.

    AutoViz misidentifies my dependent variable as a categorical variable, which is in fact a continuous variable.

    My dependent variable is the Loneliness scale score within the range of 1 of 4.
    image

    When I run the basic code of Autoviz() below, I do not get any results regarding my dependent variable.
    report_AV = AV.AutoViz('', dfte=data)

    When I run the code containing depVar argument below, I get the results that appears to regard my dependent variable as a categorical variable. This makes the result useless for my research.
    report_AV = AV.AutoViz('', dfte=data, depVar='Loneliness')

    Here are some examples that I get from the above code.
    image

    I've checked with the datatypes of my dataframe, and my dependent variable column's datatype is float64.

    Is there any way to solve this issue?

    chart_format="server" is not working!

    Dear,
    I set the chart_format="server", the new error occurs as:

    ~\anaconda3\lib\site-packages\autoviz\AutoViz_Holo.py in draw_scatters_hv(dfin, nums, chart_format, problem_type, dep, classes, lowess, mk_dir, verbose)
    410 if chart_format in ['server', 'bokeh_server', 'bokeh-server']:
    411 #server = pn.serve(hv_all, start=True, show=True)
    --> 412 hv_all.show()
    413 elif chart_format == 'html':
    414 save_html_data(hv_all, chart_format, plot_name, mk_dir)

    AttributeError: 'str' object has no attribute 'show'

    In addition, when using html output the scatterplot, it seems unnecessary because the packet output pair_scatters already! Furthermore, I am not sure it is packet error or my PC got problem but scatterplot in HTML output file is blank!
    autoviz_test_server.zip

    Bokeh Option Unavailable

    Per the documentation, when setting chart_format to "bokeh," an interactive dashboard should be created in the Jupyter Notebook.

    However, I am receiving the following error:

    ValueError: Format 'bokeh' is not supported (supported formats: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff)

    Dependency Installation Versioning

    At workshop, we encountered a couple of issues with versions of deps. The following versions and procedure allow for the demo notebook to run.

    # in a brand new conda env
    conda install jupyter pandas=0.23 matplotlib=3.0.2 seaborn=0.9 xlrd=1.2.0 scikit-learn
    pip install xgboost
    

    maximum recursion depth exceeded in comparison

    I have a CSV file with 40,000 records and I was trying to run Autoviz on this data
    from autoviz.AutoViz_Class import AutoViz_Class AV = AutoViz_Class() filename = 'Cleaned_InnerJoinedDataframe.csv' df1 = AV.AutoViz(filename)

    But this fails with error
    ~\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\font_manager.py in findfont(self, prop, fontext, directory, fallback_to_default, rebuild_if_missing)
    1236 return self._findfont_cached(
    1237 prop, fontext, directory, fallback_to_default, rebuild_if_missing,
    -> 1238 rc_params)
    1239
    1240 @lru_cache()

    RecursionError: maximum recursion depth exceeded while calling a Python object

    encoding issue

    I am trying to use AutoViz on a large data set with a shape of (1362132, 83)
    THIS READS THE DATA SET

    df = pd.read_csv("./Desktop/mgh_multi_gifts_desc_joined_tb_CSV.csv", error_bad_lines=False, engine='python', sep=",", encoding='cp1252'

    THIS IS MY NEXT STEP

    encoding = "cp1252 error_bad_lines = False engine ='python' sep = ',' target = 'gift_amount' datapath = './Desktop/' filename = 'mgh_multi_gifts_desc_joined_tb_CSV.csv' df = pd.read_csv(datapath+filename,sep=sep,index_col=None, error_bad_lines = error_bad_lines,engine = engine,encoding=encoding)

    WHEN TRYING TO EXECUTE THIS NEXT STEP
    dft = AV.AutoViz(datapath+filename, sep=sep, depVar=target, dfte=None, header=0, verbose=0, lowess=False,chart_format='svg',max_rows_analyzed=1500,max_cols_analyzed=30)

    GETTING THE FOLLOWING MASSAGE

    File encoding decoder utf-8 does not work for this file
    File encoding decoder iso-8859-11 does not work for this file
    File encoding decoder cpl252 does not work for this file
    File encoding decoder latin1 does not work for this file
    None of the decoders work...
    Not able to read or load file. Please check your inputs and try again...

    NOT SURE WHAT TO DO NEXT , ANY HELP WOULD BE MUCH APPRECIATED.

    some variables in data removed automatically

    Hi,
    I gave input csv contains 20 variables,while preprocessing it removed all important columns,may i know the reason?.
    note:- removed columns contains fill data without null values

    No plot visible in local Jupiter

    from autoviz.AutoViz_Class import AutoViz_Class
    %matplotlib
    AV = AutoViz_Class()
    df = AV.AutoViz(filename='',dfte=train,depVar='Species',verbose=1)

    Using matplotlib backend: Qt5Agg
    Shape of your Data Set loaded: (150, 5)
    ############## C L A S S I F Y I N G V A R I A B L E S ####################
    Classifying variables in data set...
    Number of Numeric Columns = 4
    Number of Integer-Categorical Columns = 0
    Number of String-Categorical Columns = 0
    Number of Factor-Categorical Columns = 0
    Number of String-Boolean Columns = 0
    Number of Numeric-Boolean Columns = 0
    Number of Discrete String Columns = 0
    Number of NLP String Columns = 0
    Number of Date Time Columns = 0
    Number of ID Columns = 0
    Number of Columns to Delete = 0
    4 Predictors classified...
    No variables removed since no ID or low-information variables found in data set

    ################ Multi_Classification VISUALIZATION Started #####################
    Data Set Shape: 150 rows, 5 cols
    Data Set columns info:

    • SepalLengthCm: 0 nulls, 35 unique vals, most common: {5.0: 10, 5.1: 9}
    • SepalWidthCm: 0 nulls, 23 unique vals, most common: {3.0: 26, 2.8: 14}
    • PetalLengthCm: 0 nulls, 43 unique vals, most common: {1.5: 14, 1.4: 12}
    • PetalWidthCm: 0 nulls, 22 unique vals, most common: {0.2: 28, 1.3: 13}
    • Species: 0 nulls, 3 unique vals, most common: {'Iris-setosa': 50, 'Iris-versicolor': 50}

    Columns to delete:
    ' []'
    Boolean variables %s
    ' []'
    Categorical variables %s
    ' []'
    Continuous variables %s
    " ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']"
    Discrete string variables %s
    ' []'
    Date and time variables %s
    ' []'
    ID variables %s
    ' []'
    Target variable %s
    ' Species'
    Total Number of Scatter Plots = 10
    No categorical or boolean vars in data set. Hence no pivot plots...
    No categorical or numeric vars in data set. Hence no bar charts.
    Time to run AutoViz = 2 seconds

    ###################### AUTO VISUALIZATION Completed ########################

    but no plot.

    In kaggle it' was working fine

    https://www.kaggle.com/gauravduttakiit/multi-classification-problem-iris/notebook

    exporting the report

    Similar project to AutoViz are Sweetviz and Pandas Profiling.

    They could export the report as a HTML file.
    I wonder if this library also has this function?

    AutoViz is recognizing int numbers as categorical vars, not numerical

    When plotting charts, me and my team could see that numerical vars was being plotted as categorical vars. This means that an "age" data was being plotted as categories, like it was the same type as "card category" information.
    We have dealt with that only transforming the "age" information, that was int64 type, to float numbers, which seemed to have worked and treated correctly as numerical var.
    We used this dataset: https://www.kaggle.com/sakshigoyal7/credit-card-customers
    

    [bug] problem with time series charts

    Here is minimal reproducible example with google colab:

    1. Date time column is no recognized, when input is file:
    !pip install autoviz
    from autoviz.AutoViz_Class import AutoViz_Class
    
    import pandas as pd
    
    AV = AutoViz_Class()
    
    df = pd.DataFrame({'time': ['2020-01-15', '2020-02-15', '2020-03-15', '2020-04-15', '2020-05-15'], 'values': [1.0,2.5,3.2,4.2,5.6]})
    df['time'] = pd.to_datetime(df['time'])
    df.to_csv('ts.csv', index=False)
    
    dft = AV.AutoViz("ts.csv", verbose=2)
    
    hape of your Data Set loaded: (5, 2)
    ############## C L A S S I F Y I N G  V A R I A B L E S  ####################
    Classifying variables in data set...
    Data Set Shape: 5 rows, 2 cols
    Data Set columns info:
    * time: 0 nulls, 5 unique vals, most common: {'2020-05-15': 1, '2020-03-15': 1}
    * values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
    --------------------------------------------------------------------
        Numeric Columns: ['values']
        Integer-Categorical Columns: []
        String-Categorical Columns: []
        Factor-Categorical Columns: []
        String-Boolean Columns: []
        Numeric-Boolean Columns: []
        Discrete String Columns: []
        NLP text Columns: []
        Date Time Columns: []
        ID Columns: ['time']
        Columns that will not be considered in modeling: []
        2 Predictors classified...
            This does not include the Target column(s)
            1 variables removed since they were ID or low-information variables
        List of variables removed: ['time']
    No categorical or numeric vars in data set. Hence no bar charts.
    Time to run AutoViz (in seconds) = 0.562
    
    
    1. When input is dataframe - chart is not generated, but date time column is recognized:
    dft = AV.AutoViz("", dfte=df, verbose=2)
    Shape of your Data Set loaded: (5, 2)
    ############## C L A S S I F Y I N G  V A R I A B L E S  ####################
    Classifying variables in data set...
    Data Set Shape: 5 rows, 2 cols
    Data Set columns info:
    * time: 0 nulls, 5 unique vals, most common: {Timestamp('2020-05-15 00:00:00'): 1, Timestamp('2020-04-15 00:00:00'): 1}
    * values: 0 nulls, 5 unique vals, most common: {3.2: 1, 5.6: 1}
    --------------------------------------------------------------------
        Numeric Columns: ['values']
        Integer-Categorical Columns: []
        String-Categorical Columns: []
        Factor-Categorical Columns: []
        String-Boolean Columns: []
        Numeric-Boolean Columns: []
        Discrete String Columns: []
        NLP text Columns: []
        Date Time Columns: ['time']
        ID Columns: []
        Columns that will not be considered in modeling: []
        2 Predictors classified...
            This does not include the Target column(s)
            No variables removed since no ID or low-information variables found in data set
    Could not draw Date Vars
    No categorical or numeric vars in data set. Hence no bar charts.
    Time to run AutoViz (in seconds) = 0.408
    
    

    Expected result: chart with date on x-axis, and value on y-axis.

    `html` format file output

    Most of the other EDA tools support html format file output.
    Do you have any plans to support html format file output function, taking the output path of the file as an argument?
    Thank you.

    Error thrown while running Autoviz in python file

    I have a file called Autoviz.py with the following lines of code

    from autoviz.AutoViz_Class import AutoViz_Class
    
    AV = AutoViz_Class()
    
    filename = "Iris.csv"
    sep = ","
    dft = AV.AutoViz(
        filename,
        sep=",",
        depVar="",
        dfte=None,
        header=0,
        verbose=2,
        lowess=False,
        chart_format="png",
    )
    

    Now when I run this python file from the terminal with the python Autoviz.py I get the following error.

    (saana) E:\on_nine_ai\testing data>python Autoviz.py
    Traceback (most recent call last):
      File "Autoviz.py", line 1, in <module>
        from autoviz.AutoViz_Class import AutoViz_Class
      File "C:\Users\LENOVO\anaconda3\envs\saana\lib\site-packages\autoviz\__init__.py", line 2, in <module>
        from autoviz.AutoViz_Class import AutoViz_Class
      File "C:\Users\LENOVO\anaconda3\envs\saana\lib\site-packages\autoviz\AutoViz_Class.py", line 61, in <module>
        from autoviz.AutoViz_Holo import AutoViz_Holo
      File "C:\Users\LENOVO\anaconda3\envs\saana\lib\site-packages\autoviz\AutoViz_Holo.py", line 31, in <module>
        get_ipython().magic('matplotlib inline')
    NameError: name 'get_ipython' is not defined
    

    But this does not happen if I run the same lines of code in a python notebook. I understand that when verbose is set to 0 or 1, the plots are generated interactively in python notebooks. But when I set verbose to 2 and run a python file, I expect a folder to be created and all the result images to be stored inside that. Please help me out with this.

    Save plots as png images.

    Can we save all visualizations generated by Autoviz as png image files in the current working directory?

    Could not draw ...

    I'm trying to run autoviz on my pandas DataFrame and, oddly, sometimes it works, sometimes is does not and displays the following:

    Shape of your Data Set loaded: (68, 7)
    ############## C L A S S I F Y I N G  V A R I A B L E S  ####################
    Classifying variables in data set...
        6 Predictors classified...
            No variables removed since no ID or low-information variables found in data set
    
    ################ Multi_Classification VISUALIZATION Started #####################
    Total Number of Scatter Plots = 10
    Could not draw Distribution Plots
    Could not draw Pivot Charts against Dependent Variable
    Time to run AutoViz = 3 seconds 
    
     ###################### AUTO VISUALIZATION Completed ########################
    

    As far as I can tell, nothing changes between the times when it works and when it does not. What can trigger this kind of error?

    DataFrame as input

    Hey just wondering if you're thinking about the ability to just pass a dataframe to AutoViz instead of the file.

    I can help by creating a PR for it

    IndexError: list index out of range

    Hi. We have recently integrated AutoViz into PyCaret and I think I found an edge case here that needs to be fixed in AutoViz. This problem only happens when the dataset only has 1 numeric feature. My guess is that it needs at least 2 variables for the scatter plot. The expected fix will basically involve some kind of exception handling.

    To reproduce the error:

    pip install pycaret
    
    from pycaret.datasets import get_data
    data = get_data('cancer')
    
    from pycaret.classification import *
    s = setup(data, target = 'Class', session_id = 123, silent = True)
    
    eda()
    

    IndexError Traceback (most recent call last)
    in
    ----> 1 eda()

    ~\pycaret\pycaret\classification.py in eda(data, target, display_format, **kwargs)
    2946 None
    2947 """
    -> 2948 return pycaret.internal.tabular.eda(
    2949 data=data, target=target, display_format=display_format, **kwargs
    2950 )

    ~\pycaret\pycaret\internal\tabular.py in eda(data, target, display_format, **kwargs)
    10397
    10398 AV = AutoViz_Class()

    10399 AV.AutoViz(
    10400 filename="", dfte=data, depVar=target, chart_format=display_format, **kwargs
    10401 )

    ~\anaconda3\envs\pycaret-dev\lib\site-packages\autoviz\AutoViz_Class.py in AutoViz(self, filename, sep, depVar, dfte, header, verbose, lowess, chart_format, max_rows_analyzed, max_cols_analyzed, save_plot_dir)
    236 ####################################################################################
    237 if chart_format.lower() in ['bokeh','server','bokeh_server','bokeh-server', 'html']:
    --> 238 dft = AutoViz_Holo(filename, sep, depVar, dfte, header, verbose,
    239 lowess,chart_format,max_rows_analyzed,
    240 max_cols_analyzed, save_plot_dir)

    ~\anaconda3\envs\pycaret-dev\lib\site-packages\autoviz\AutoViz_Holo.py in AutoViz_Holo(filename, sep, depVar, dfte, header, verbose, lowess, chart_format, max_rows_analyzed, max_cols_analyzed, save_plot_dir)
    175 ### You can draw pair scatters only if there are 2 or more numeric variables ####
    176 if len(nums) >= 2:
    --> 177 drawobj2 = draw_pair_scatters_hv(dfin, nums, problem_type, chart_format, dep,
    178 classes, lowess, mk_dir, verbose)
    179 ls_objects.append(drawobj2)

    ~\anaconda3\envs\pycaret-dev\lib\site-packages\autoviz\AutoViz_Holo.py in draw_pair_scatters_hv(dfin, nums, problem_type, chart_format, dep, classes, lowess, mk_dir, verbose)
    521 quantileable = [x for x in nums if len(dft[x].unique()) > 20]
    522
    --> 523 x = pnw.Select(name='X-Axis', value=quantileable[0], options=quantileable)
    524 y = pnw.Select(name='Y-Axis', value=quantileable[1], options=quantileable)
    525 size = pnw.Select(name='Size', value='None', options=['None'] + quantileable)

    IndexError: list index out of range

    The code inside PyCaret that integrates AutoViz is as follows:

        from autoviz.AutoViz_Class import AutoViz_Class
    
        AV = AutoViz_Class()
        AV.AutoViz(
            filename="", dfte=data, depVar=target, chart_format=display_format, **kwargs
        )
    

    Unable to Hide Plots

    Setting verbose=2 does not hide the plots from being shown in either python script or python notebook.

    Data Viz for training data after making the split

    We should explore data after making a train-test split to avoid data leakage.
    How can I supply a data frame (training data only) to df.Autoviz() function? I tried supplying dataframe and leaving filename as an empty string but it's not giving me charts.

    My Code:

    import pandas as pd
    from sklearn.model_selection import train_test_split
    
    df = pd.read_csv('https://raw.githubusercontent.com/arora123/Data/master/WA_Fn-UseC_-Telco-Customer-Churn.csv')
    
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state =1)
    
    !pip install autoviz
    # To import AutoViz_Class from autoviz-AutoViz_Class
    
    from autoviz.AutoViz_Class import AutoViz_Class 
    
    #To initialize class
    av = AutoViz_Class()
    
    av.AutoViz('', sep=',', depVar='Churn', dfte=pd.DataFrame(x, y), 
               header=1, verbose=1, lowess=False, chart_format='svg',)
    

    Output

    Shape of your Data Set: (7043, 20)
    ############## C L A S S I F Y I N G V A R I A B L E S ####################
    Classifying variables in data set...
    Not able to read or load file. Please check your inputs and try again...

    AutoViz not working with scikit-learn >= 0.24 on large datasets

    Starting from version 0.24, in scikit-learn it is raised an error (instead of a warning) when in KFold and StratifiedKFold it is passed a random_state without setting shuffle to True.
    When using AutoViz with a large dataset, in the function find_top_features_xgb, the KFold defined as kf = KFold(n_splits=n_splits, random_state=33) raises a ValueError and the overall auto visualization terminates with the message Not able to read or load file. Please check your inputs and try again....
    If the intent is to shuffle the data in the KFold it should be added explicitly shuffle=True, because otherwise the data is not shuffled; on the other hand, if the intent is to not shuffle the data, the parameter random_state should be removed.

    A simple dataset to use to reproduce the issue can be found on Kaggle at this URL.

    Suggesting Updated for Wordcloud

    1. Updating Stopwords List

    Currently, I can see that Stopwords are defined as a list and I can see that it is missing a few stop words like "themselves".

    def return_stop_words():
        STOP_WORDS = ['it', "this", "that", "to", 'its', 'am', 'is', 'are', 'was', 'were', 'a',
                    'an', 'the', 'and', 'or', 'of', 'at', 'by', 'for', 'with', 'about', 'between',
                     'into','above', 'below', 'from', 'up', 'down', 'in', 'out', 'on', 'over',
                      'under', 'again', 'further', 'then', 'once', 'all', 'any', 'both', 'each',
                       'few', 'more', 'most', 'other', 'some', 'such', 'only', 'own', 'same', 'so',
                        'than', 'too', 'very', 's', 't', 'can', 'just', 'd', 'll', 'm', 'o', 're',
                        've', 'y', 'ain', 'ma']
        add_words = ["s", "m",'you', 'not',  'get', 'no', 'via', 'one', 'still', 'us', 'u','hey','hi','oh','jeez',
                    'the', 'a', 'in', 'to', 'of', 'i', 'and', 'is', 'for', 'on', 'it', 'got','aww','awww',
                    'not', 'my', 'that', 'by', 'with', 'are', 'at', 'this', 'from', 'be', 'have', 'was',
                    '', ' ', 'say', 's', 'u', 'ap', 'afp', '...', 'n', '\\']
        stop_words = list(set(STOP_WORDS+add_words))
        return sorted(stop_words)

    Isn't it better to use NLTK stop words list??

    from nltk.corpus import stopwords
    
    for lang in langs:
      stopwords.words(lang)

    Copied from: https://gist.github.com/sebleier/554280

    2. Lemmatization before plotting

    I think it is better if we lemmatize the data before we plot then words like "reads", "reading" will count as the same, which will give us a better word cloud.

    How do we see output using a script file a terminal?

    Hi AutoViML,

    Firstly, congratulations and thanks for this wonderful package.
    This works perfectly fine with Jupyter notebooks but how do I use the same if I am using an IDE let say Spyder?

    Thanks in advance.
    Mohit

    Image size is too large error. Autoviz creating enormous image sizes

    I tried to use Autoviz on the following dataset: https://www.kaggle.com/c/house-prices-advanced-regression-techniques

    Using the following code to call Autoviz:
    dftc = AV.AutoViz('../input/house-prices-advanced-regression-techniques/train.csv', depVar='SalePrice', verbose=0, chart_format='bokeh')

    It was unable to display the charts without error giving the following error:


    KeyError Traceback (most recent call last)
    /tmp/ipykernel_133/4057966485.py in
    ----> 1 dftc = AV.AutoViz('../input/house-prices-advanced-regression-techniques/train.csv', verbose=0, chart_format='bokeh')

    /opt/conda/lib/python3.7/site-packages/autoviz/AutoViz_Class.py in AutoViz(self, filename, sep, depVar, dfte, header, verbose, lowess, chart_format, max_rows_analyzed, max_cols_analyzed, save_plot_dir)
    238 dft = AutoViz_Holo(filename, sep, depVar, dfte, header, verbose,
    239 lowess,chart_format,max_rows_analyzed,
    --> 240 max_cols_analyzed, save_plot_dir)
    241 else:
    242 dft = self.AutoViz_Main(filename, sep, depVar, dfte, header, verbose,

    /opt/conda/lib/python3.7/site-packages/autoviz/AutoViz_Holo.py in AutoViz_Holo(filename, sep, depVar, dfte, header, verbose, lowess, chart_format, max_rows_analyzed, max_cols_analyzed, save_plot_dir)
    193 ls_objects.append(drawobj6)
    194 if len(date_vars) > 0:
    --> 195 drawobj7 = draw_date_vars_hv(dfin,dep,date_vars, nums, chart_format, problem_type, mk_dir, verbose)
    196 ls_objects.append(drawobj7)
    197 if len(nums) > 0 and len(cats) > 0:

    /opt/conda/lib/python3.7/site-packages/autoviz/AutoViz_Holo.py in draw_date_vars_hv(df, dep, datevars, num_vars, chart_format, modeltype, mk_dir, verbose)
    940 if modeltype == 'Regression' or dep == None or dep == '':
    941 kind = 'line'
    --> 942 hv_plot = dft[num_vars+[dep]].hvplot( height=400, width=600,kind=kind,
    943 title='Time Series Plot of all Numeric variables and Target').opts(legend_position='top_left')
    944 hv_panel = pn.Row(pn.WidgetBox( kind), hv_plot)

    /opt/conda/lib/python3.7/site-packages/pandas/core/frame.py in getitem(self, key)
    3462 if is_iterator(key):
    3463 key = list(key)
    -> 3464 indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
    3465
    3466 # take() does not accept boolean indexers

    /opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis)
    1312 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
    1313
    -> 1314 self._validate_read_indexer(keyarr, indexer, axis)
    1315
    1316 if needs_i8_conversion(ax.dtype) or isinstance(

    /opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis)
    1375
    1376 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
    -> 1377 raise KeyError(f"{not_found} not in index")
    1378
    1379

    KeyError: "[''] not in index"

    Error in callback <function install_repl_displayhook..post_execute at 0x7f01919304d0> (for post_execute):


    ValueError Traceback (most recent call last)
    /opt/conda/lib/python3.7/site-packages/matplotlib/pyplot.py in post_execute()
    136 def post_execute():
    137 if matplotlib.is_interactive():
    --> 138 draw_all()
    139
    140 try: # IPython >= 2

    /opt/conda/lib/python3.7/site-packages/matplotlib/_pylab_helpers.py in draw_all(cls, force)
    135 for manager in cls.get_all_fig_managers():
    136 if force or manager.canvas.figure.stale:
    --> 137 manager.canvas.draw_idle()
    138
    139

    /opt/conda/lib/python3.7/site-packages/matplotlib/backend_bases.py in draw_idle(self, *args, **kwargs)
    2058 if not self._is_idle_drawing:
    2059 with self._idle_draw_cntx():
    -> 2060 self.draw(*args, **kwargs)
    2061
    2062 @Property

    /opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py in draw(self)
    429 def draw(self):
    430 # docstring inherited
    --> 431 self.renderer = self.get_renderer(cleared=True)
    432 # Acquire a lock on the shared font cache.
    433 with RendererAgg.lock, \

    /opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py in get_renderer(self, cleared)
    445 and getattr(self, "_lastKey", None) == key)
    446 if not reuse_renderer:
    --> 447 self.renderer = RendererAgg(w, h, self.figure.dpi)
    448 self._lastKey = key
    449 elif cleared:

    /opt/conda/lib/python3.7/site-packages/matplotlib/backends/backend_agg.py in init(self, width, height, dpi)
    91 self.width = width
    92 self.height = height
    ---> 93 self._renderer = _RendererAgg(int(width), int(height), dpi)
    94 self._filter_renderers = []
    95

    ValueError: Image size of 2000x81750 pixels is too large. It must be less than 2^16 in each direction.

    [Minor] AutoViz Crashes on the analysis of a dataset without any significant variables

    If AV is fed with a dataset where it does not find any significant variable to analize (vs. the target variable specified), it crashes.

    The code to reproduce the issue is provided in https://gist.github.com/gvyshnya/c53321dbe947cc55fec91ccf6ae07294

    The environment to reproduce the problem is the same as indicated in #26

    The expected behaviour would be to gracefully finish the analysis session with a comprehensive inforamtion message to a user and without a crash.

    Installation instructions and sample code not working

    You need from autoviz import ... in the sample code. Preferably you should give a sample that can be just copy&pasted and run, and provide pictures of how it looks, so that one could evaluate whether to install this instead of the many other plotting libraries.

    The dependencies are extremely heavy. Is it absolutely necessary to install Jupyter? Something inside also depends on sklearn, which was not included in pip deps.

    As for CSV reading; if you are not able to autodetect/guess separators and date formats, do not bother "including" it in your library. It is just two lines of code to first load the data with pandas and then use another library for plotting, and in most cases one needs to do something in between anyway (data preprocessing).

    An ideal plotting library would have API alike this:

    from fictionalplot import Figure  # if possible, keep it to just one simple import
    
    fig = Figure()   # Internally holds graphics context, Qt window, websocket to browser or whatever
    fig.plot(df)  # display the graph and return instantly, try to auto-guess suitable format based on df
    

    If using a Qt window, spawn a new process that does not terminate when the Python program ends, and that is automatically shared by all figures of all running programs (don't block execution of the program like Matplotlib does). If using Notebook/browser, you don't need separate process because browser already does that.

    For true interactive plots (e.g. receive user input on scaling changes to recalculate new data in Python), use async/await to avoid blocking Python from executing while waiting for user input (but stay away from import asyncio which is utter crap -- instead use trio if you must).

    Good luck with your plotting library. We could certainly use some good options (I am not entirely happy with either Matplotlib nor Plotly, and everything else is just bad).

    DataFrame

    Dear all,
    I couldn't figure out how to pass a dataframe (instead of a csv file) to AV.AutoViz.
    Could somebody please give me a short hint?

    thanks in advance.

    Normed Histogram plot with negative y value?

    Hi, the plots I have all has negative y values. How to interpret this?
    Screen Shot 2021-09-08 at 10 04 39 PM

    I think the following code generates the plots.
    sns.distplot(dft.loc[dft[dep]==target_var][each_conti],bins=binsize, ax= ax1,
    label=target_var, hist=False, kde=True,
    color=color2)
    legend_flag += 1

    Read CSV file with different encodings

    Hi. I'm trying to use the library with a CSV file that uses "ISO-8859-1" encoding, and the log says:

    pandas ascii encoder does not work for this file. Continuing...
    pandas utf-8 encoder does not work for this file. Continuing...
    pandas iso-8859-1 encoder does not work for this file. Continuing...

    After checking the source code, I found that there is a bug in the AutoViz_Utils.py file:

    image

    Here there is a for loop to try with different encodings but, as it can be seen, the encoding parameter of the pd.read_csv function is always set to None.

    Please, check this, maybe I'm missing something.
    Thanks in advance.

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.