Comments (6)
🤖
Thank you for bringing this to our attention. The error message suggests that the test is failing due to an InvalidEncodingError
when trying to aggregate the 'temp_diff' column with the 'Avg' function. This could be due to changes in the way Pandas 2.0.0 handles aggregation functions.
The test_graphbase_get_x_y_values
test is designed to verify the functionality of the _get_x_y_values
method of the GraphBase
class. This method is used to extract x and y values from a DataFrame based on the provided encoding. If this test is failing, it means that the _get_x_y_values
method is not correctly extracting x and y values from the DataFrame based on the provided encoding, or it is not correctly raising an InvalidEncodingError
when it should.
To debug this issue, we need to understand the specific changes in Pandas 2.0.0 that could be causing this behavior. Unfortunately, I couldn't find specific details about these changes within the repository. However, I would recommend checking the release notes or changelog for Pandas 2.0.0 to see if there are any changes related to aggregation functions that could be causing this issue.
Once we have more information about the specific changes in Pandas 2.0.0, we can update the _get_x_y_values
method or the test itself to ensure compatibility with the new version of Pandas.
In the meantime, pinning the version of Pandas to <2.0.0 is a good workaround to prevent the CI from failing. I understand that this is not a long-term solution, and I appreciate your patience as we work to resolve this issue.
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
from sparkmagic.
🤖
Hello,
Thank you for providing the detailed error logs and context. The test_graphbase_get_x_y_values
test in the sparkmagic codebase is designed to test the functionality of the get_x_y_values
method in the GraphBase
class. This method is responsible for aggregating data based on the provided aggregation function and column names.
The error you're encountering is due to the upgrade to Pandas 2.0.0. The 'temp_diff' column, which is of integer data type, is being aggregated using the 'Avg' function. However, the error logs indicate an InvalidEncodingError
, which suggests that there might be an issue with the data type compatibility with the aggregation function.
In the Pandas 2.0.0 release, there were several changes that could potentially affect this operation. One of the significant changes was the introduction of the new pandas.NA
value, which is used to represent scalar missing values. This could potentially affect the aggregation operation if the 'temp_diff' column contains any missing values.
Another change that could potentially affect this operation is the enhancements to the groupby
operation. In the new release, the groupby
operation has been optimized for better performance and memory usage. However, this could potentially introduce some compatibility issues with certain data types and aggregation functions.
To resolve this issue, you could try the following:
-
Check if the 'temp_diff' column contains any missing values. If it does, you might need to handle these missing values before performing the aggregation operation.
-
Try using a different aggregation function to see if the issue persists. This could help determine if the issue is specific to the 'Avg' function or if it's a more general issue with the
groupby
operation. -
If the issue persists, you might need to downgrade to a previous version of Pandas until a fix is available.
I hope this helps! If you have any further questions or need more assistance, please don't hesitate to ask.
Best regards,
Dosu
Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
from sparkmagic.
@dosu-bot Can you help debug? Here are the error logs
=================================== FAILURES ===================================
________________________ test_graphbase_get_x_y_values _________________________
self = <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fedee223e50>
how = 'mean'
values = array([['str', 'str', 'str', 'str', 'str', 'str']], dtype=object)
ndim = 2, alt = <function GroupBy.mean.<locals>.<lambda> at 0x7fedee[29](https://github.com/jupyter-incubator/sparkmagic/actions/runs/6251559522/job/16972931665#step:7:30)0ea0>
def _agg_py_fallback(
self, how: str, values: ArrayLike, ndim: int, alt: Callable
) -> ArrayLike:
"""
Fallback to pure-python aggregation if _cython_operation raises
NotImplementedError.
"""
# We get here with a) EADtypes and b) object dtype
assert alt is not None
if values.ndim == 1:
# For DataFrameGroupBy we only get here with ExtensionArray
ser = Series(values, copy=False)
else:
# We only get here with values.dtype == object
# TODO: special case not needed with ArrayManager
df = DataFrame(values.T)
# bc we split object blocks in grouped_reduce, we have only 1 col
autovizwidget/autovizwidget/plotlygraphs/graphbase.py:1[38](https://github.com/jupyter-incubator/sparkmagic/actions/runs/6251559522/job/16972931665#step:7:39): InvalidEncodingError
=========================== short test summary info ============================
FAILED autovizwidget/autovizwidget/tests/test_plotlygraphs.py::test_graphbase_get_x_y_values - autovizwidget.widget.invalidencodingerror.InvalidEncodingError: Cannot aggregate column 'temp_diff' with aggregation function 'Avg' because the type
cannot be aggregated over.
from sparkmagic.
@dosu-bot Here are changes from Pandas 2.0.0
Removal of prior version deprecations/changes
Removed Int64Index, UInt64Index and Float64Index. See also [here] for more information
Removed deprecated Timestamp.freq, Timestamp.freqstr and argument freq from the [Timestamp] constructor and [Timestamp.fromordinal]
Removed deprecated CategoricalBlock, Block.is_categorical, require datetime64 and timedelta64 values to be wrapped in DatetimeArray or TimedeltaArray before passing to Block.make_block_same_class, require DatetimeTZBlock.values to have the correct ndim when passing to the BlockManager constructor, and removed the “fastpath” keyword from the SingleBlockManager constructor
Removed deprecated global option use_inf_as_null in favor of use_inf_as_na
Removed deprecated module pandas.core.index
Removed deprecated alias pandas.core.tools.datetimes.to_time, import the function directly from pandas.core.tools.times instead
Removed deprecated alias pandas.io.json.json_normalize, import the function directly from pandas.json_normalize instead
Removed deprecated Categorical.to_dense, use np.asarray instead
Removed deprecated Categorical.take_nd
Removed deprecated Categorical.mode, use Series.mode instead
Removed deprecated Categorical.is_dtype_equal and CategoricalIndex.is_dtype_equal
Removed deprecated CategoricalIndex.take_nd
Removed deprecated Index.is_type_compatible
Removed deprecated Index.is_mixed, check index.inferred_type directly instead
Removed deprecated pandas.api.types.is_categorical; use [pandas.api.types.is_categorical_dtype] instead
Removed deprecated Index.asi8
Enforced deprecation changing behavior when passing datetime64[ns] dtype data and timezone-aware dtype to [Series], interpreting the values as wall-times instead of UTC times, matching [DatetimeIndex] behavior
Enforced deprecation changing behavior when applying a numpy ufunc on multiple non-aligned [DataFrame] that will now align the inputs first
Removed deprecated DataFrame._AXIS_NUMBERS, DataFrame._AXIS_NAMES, Series._AXIS_NUMBERS, Series._AXIS_NAMES
Removed deprecated Index.to_native_types, use obj.astype instead
Removed deprecated Series.iteritems, DataFrame.iteritems, use obj.items instead
Removed deprecated DataFrame.lookup
Removed deprecated Series.append, DataFrame.append, use [concat] instead
Removed deprecated Series.iteritems, DataFrame.iteritems and HDFStore.iteritems use obj.items instead
Removed deprecated DatetimeIndex.union_many
Removed deprecated weekofyear and week attributes of DatetimeArray, [DatetimeIndex] and dt accessor in favor of isocalendar.week
Removed deprecated RangeIndex._start, RangeIndex._stop, RangeIndex._step, use start, stop, step instead
Removed deprecated DatetimeIndex.to_perioddelta, Use dtindex - dtindex.to_period.to_timestamp instead
Removed deprecated Styler.hide_index and Styler.hide_columns
Removed deprecated Styler.set_na_rep and Styler.set_precision
Removed deprecated Styler.where
Removed deprecated Styler.render
Removed deprecated argument col_space in [DataFrame.to_latex]
Removed deprecated argument null_color in [Styler.highlight_null]
Removed deprecated argument check_less_precise in testing.assert_frame_equal, testing.assert_extension_array_equal, testing.assert_series_equal, testing.assert_index_equal
Removed deprecated null_counts argument in [DataFrame.info]. Use show_counts instead
Removed deprecated Index.is_monotonic, and Series.is_monotonic; use obj.is_monotonic_increasing instead
Removed deprecated Index.is_all_dates
Enforced deprecation disallowing passing a timezone-aware [Timestamp] and dtype="datetime64[ns]" to [Series] or [DataFrame] constructors
Enforced deprecation disallowing passing a sequence of timezone-aware values and dtype="datetime64[ns]" to to [Series] or [DataFrame] constructors
Enforced deprecation disallowing numpy.ma.mrecords.MaskedRecords in the [DataFrame] constructor; pass "{name: data[name] for name in data.dtype.names} instead
Enforced deprecation disallowing unit-less “datetime64” dtype in [Series.astype] and [DataFrame.astype]
Enforced deprecation disallowing using .astype to convert a datetime64[ns] [Series], [DataFrame], or [DatetimeIndex] to timezone-aware dtype, use obj.tz_localize or ser.dt.tz_localize instead
Enforced deprecation disallowing using .astype to convert a timezone-aware [Series], [DataFrame], or [DatetimeIndex] to timezone-naive datetime64[ns] dtype, use obj.tz_localize or obj.tz_convert.tz_localize instead
Enforced deprecation disallowing passing non boolean argument to sort in [concat]
Removed Date parser functions parse_date_time, parse_date_fields, parse_all_fields and generic_parser
Removed argument index from the core.arrays.SparseArray constructor
Remove argument squeeze from [DataFrame.groupby] and [Series.groupby]
Removed deprecated apply, apply_index, __call__, onOffset, and isAnchored attributes from DateOffset
Removed keep_tz argument in [DatetimeIndex.to_series]
Remove arguments names and dtype from [Index.copy] and levels and codes from [MultiIndex.copy]
Remove argument inplace from [MultiIndex.set_levels] and [MultiIndex.set_codes]
Removed arguments verbose and encoding from [DataFrame.to_excel] and [Series.to_excel]
Removed argument line_terminator from [DataFrame.to_csv] and [Series.to_csv], use lineterminator instead
Removed argument inplace from [DataFrame.set_axis] and [Series.set_axis], use obj = obj.set_axis instead
Disallow passing positional arguments to [MultiIndex.set_levels] and [MultiIndex.set_codes]
Disallow parsing to Timedelta strings with components with units “Y”, “y”, or “M”, as these do not represent unambiguous durations
Removed MultiIndex.is_lexsorted and MultiIndex.lexsort_depth
Removed argument how from PeriodIndex.astype, use [PeriodIndex.to_timestamp] instead
Removed argument try_cast from [DataFrame.mask], [DataFrame.where], [Series.mask] and [Series.where]
Removed argument tz from [Period.to_timestamp], use obj.to_timestamp.tz_localize instead
Removed argument sort_columns in [DataFrame.plot] and [Series.plot]
Removed argument is_copy from [DataFrame.take] and [Series.take]
Removed argument kind from [Index.get_slice_bound], [Index.slice_indexer] and [Index.slice_locs]
Removed arguments prefix, squeeze, error_bad_lines and warn_bad_lines from [read_csv]
Removed arguments squeeze from [read_excel]
Removed argument datetime_is_numeric from [DataFrame.describe] and [Series.describe] as datetime data will always be summarized as numeric data
Disallow passing list key to [Series.xs] and [DataFrame.xs], pass a tuple instead
Disallow subclass-specific keywords in the [Index] constructor
Removed argument inplace from Categorical.remove_unused_categories
Disallow passing non-round floats to [Timestamp] with unit="M" or unit="Y"
Remove keywords convert_float and mangle_dupe_cols from [read_excel]
Remove keyword mangle_dupe_cols from [read_csv] and [read_table]
Removed errors keyword from [DataFrame.where], [Series.where], [DataFrame.mask] and [Series.mask]
Disallow passing non-keyword arguments to [read_excel] except io and sheet_name
Disallow passing non-keyword arguments to [DataFrame.drop] and [Series.drop] except labels
Disallow passing non-keyword arguments to [DataFrame.fillna] and [Series.fillna] except value
Disallow passing non-keyword arguments to StringMethods.split and StringMethods.rsplit except for pat
Disallow passing non-keyword arguments to [DataFrame.set_index] except keys
Disallow passing non-keyword arguments to Resampler.interpolate except method
Disallow passing non-keyword arguments to [DataFrame.reset_index] and [Series.reset_index] except level
Disallow passing non-keyword arguments to [DataFrame.dropna] and [Series.dropna]
Disallow passing non-keyword arguments to ExtensionArray.argsort
Disallow passing non-keyword arguments to Categorical.sort_values
Disallow passing non-keyword arguments to [Index.drop_duplicates] and [Series.drop_duplicates]
Disallow passing non-keyword arguments to [DataFrame.drop_duplicates] except for subset
Disallow passing non-keyword arguments to [DataFrame.sort_index] and [Series.sort_index]
Disallow passing non-keyword arguments to [DataFrame.interpolate] and [Series.interpolate] except for method
Disallow passing non-keyword arguments to [DataFrame.any] and [Series.any]
Disallow passing non-keyword arguments to [Index.set_names] except for names
Disallow passing non-keyword arguments to [Index.join] except for other
Disallow passing non-keyword arguments to [concat] except for objs
Disallow passing non-keyword arguments to [pivot] except for data
Disallow passing non-keyword arguments to [DataFrame.pivot]
Disallow passing non-keyword arguments to [read_html] except for io
Disallow passing non-keyword arguments to [read_json] except for path_or_buf
Disallow passing non-keyword arguments to [read_sas] except for filepath_or_buffer
Disallow passing non-keyword arguments to [read_stata] except for filepath_or_buffer
Disallow passing non-keyword arguments to [read_csv] except filepath_or_buffer
Disallow passing non-keyword arguments to [read_table] except filepath_or_buffer
Disallow passing non-keyword arguments to [read_fwf] except filepath_or_buffer
Disallow passing non-keyword arguments to [read_xml] except for path_or_buffer
Disallow passing non-keyword arguments to [Series.mask] and [DataFrame.mask] except cond and other
Disallow passing non-keyword arguments to [DataFrame.to_stata] except for path
Disallow passing non-keyword arguments to [DataFrame.where] and [Series.where] except for cond and other
Disallow passing non-keyword arguments to [Series.set_axis] and [DataFrame.set_axis] except for labels
Disallow passing non-keyword arguments to [Series.rename_axis] and [DataFrame.rename_axis] except for mapper
Disallow passing non-keyword arguments to [Series.clip] and [DataFrame.clip] except lower and upper
Disallow passing non-keyword arguments to [Series.bfill], [Series.ffill], [DataFrame.bfill] and [DataFrame.ffill]
Disallow passing non-keyword arguments to [DataFrame.replace], [Series.replace] except for to_replace and value
Disallow passing non-keyword arguments to [DataFrame.sort_values] except for by
Disallow passing non-keyword arguments to [Series.sort_values]
Disallow passing non-keyword arguments to [DataFrame.reindex] except for labels
Disallow [Index.reindex] with non-unique [Index] objects
Disallowed constructing [Categorical] with scalar data
Disallowed constructing [CategoricalIndex] without passing data
Removed Rolling.validate, Expanding.validate, and ExponentialMovingWindow.validate
Removed Rolling.win_type returning "freq"
Removed Rolling.is_datetimelike
Removed the level keyword in [DataFrame] and [Series] aggregations; use groupby instead
Removed deprecated Timedelta.delta, Timedelta.is_populated, and Timedelta.freq
Removed deprecated NaT.freq
Removed deprecated Categorical.replace, use [Series.replace] instead
Removed the numeric_only keyword from Categorical.min and Categorical.max in favor of skipna
Changed behavior of [DataFrame.median] and [DataFrame.mean] with numeric_only=None to not exclude datetime-like columns THIS NOTE WILL BE IRRELEVANT ONCE numeric_only=None DEPRECATION IS ENFORCED
Removed is_extension_type in favor of is_extension_array_dtype
Removed .ExponentialMovingWindow.vol
Removed Index.get_value and Index.set_value
Removed Series.slice_shift and DataFrame.slice_shift
Remove DataFrameGroupBy.pad and DataFrameGroupBy.backfill
Remove numpy argument from [read_json]
Disallow passing abbreviations for orient in [DataFrame.to_dict]
Disallow partial slicing on an non-monotonic [DatetimeIndex] with keys which are not in Index. This now raises a KeyError
Removed get_offset in favor of to_offset
Removed the warn keyword in [infer_freq]
Removed the include_start and include_end arguments in [DataFrame.between_time] in favor of inclusive
Removed the closed argument in [date_range] and [bdate_range] in favor of inclusive argument
Removed the center keyword in [DataFrame.expanding]
Removed the truediv keyword from [eval]
Removed the method and tolerance arguments in [Index.get_loc]. Use index.get_indexer instead
Removed the pandas.datetime submodule
Removed the pandas.np submodule
Removed pandas.util.testing in favor of pandas.testing
Removed Series.str.__iter__
Removed pandas.SparseArray in favor of [arrays.SparseArray]
Removed pandas.SparseSeries and pandas.SparseDataFrame, including pickle support.
Enforced disallowing passing an integer fill_value to [DataFrame.shift] and Series.shift` with datetime64, timedelta64, or period dtypes
Enforced disallowing a string column label into times in [DataFrame.ewm]
Enforced disallowing passing True and False into inclusive in [Series.between] in favor of "both" and "neither" respectively
Enforced disallowing using usecols with out of bounds indices for read_csv with engine="c"
Enforced disallowing the use of **kwargs in [ExcelWriter]; use the keyword argument engine_kwargs instead
Enforced disallowing a tuple of column labels into DataFrameGroupBy.__getitem__
Enforced disallowing missing labels when indexing with a sequence of labels on a level of a [MultiIndex]. This now raises a KeyError
Enforced disallowing setting values with .loc using a positional slice. Use .loc with labels or .iloc with positions instead
Enforced disallowing positional indexing with a float key even if that key is a round number, manually cast to integer instead
Enforced disallowing using a [DataFrame] indexer with .iloc, use .loc instead for automatic alignment
Enforced disallowing set or dict indexers in __getitem__ and __setitem__ methods
Enforced disallowing indexing on a [Index] or positional indexing on a [Series] producing multi-dimensional objects e.g. obj[:, None], convert to numpy before indexing instead
Enforced disallowing dict or set objects in suffixes in [merge]
Enforced disallowing [merge] to produce duplicated columns through the suffixes keyword and already existing columns
Enforced disallowing using [merge] or join on a different number of levels
Enforced disallowing value_name argument in [DataFrame.melt] to match an element in the [DataFrame] columns
Enforced disallowing passing showindex into **kwargs in [DataFrame.to_markdown] and [Series.to_markdown] in favor of index
Removed setting Categorical._codes directly
Removed setting Categorical.categories directly
Removed argument inplace from Categorical.add_categories, Categorical.remove_categories, Categorical.set_categories, Categorical.rename_categories, Categorical.reorder_categories, Categorical.set_ordered, Categorical.as_ordered, Categorical.as_unordered
Enforced Rolling.count with min_periods=None to default to the size of the window
Renamed fname to path in [DataFrame.to_parquet], [DataFrame.to_stata] and [DataFrame.to_feather]
Enforced disallowing indexing a [Series] with a single item list with a slice . Either convert the list to tuple, or pass the slice directly instead
Changed behavior indexing on a [DataFrame] with a [DatetimeIndex] index using a string indexer, previously this operated as a slice on rows, now it operates like any other column key; use frame.loc[key] for the old behavior
Enforced the display.max_colwidth option to not accept negative integers
Removed the display.column_space option in favor of df.to_string
Removed the deprecated method mad from pandas classes
Removed the deprecated method tshift from pandas classes
Changed behavior of empty data passed into [Series]; the default dtype will be object instead of float64
Changed the behavior of DatetimeIndex.union, DatetimeIndex.intersection, and DatetimeIndex.symmetric_difference with mismatched timezones to convert to UTC instead of casting to object dtype
Changed the behavior of [to_datetime] with argument “now” with utc=False to match Timestamp
Changed the behavior of indexing on a timezone-aware [DatetimeIndex] with a timezone-naive datetime object or vice-versa; these now behave like any other non-comparable type by raising KeyError
Changed the behavior of [Index.reindex], [Series.reindex], and [DataFrame.reindex] with a datetime64 dtype and a datetime.date object for fill_value; these are no longer considered equivalent to datetime.datetime objects so the reindex casts to object dtype
Changed behavior of SparseArray.astype when given a dtype that is not explicitly SparseDtype, cast to the exact requested dtype rather than silently using a SparseDtype instead
Changed behavior of [Index.ravel] to return a view on the original [Index] instead of a np.ndarray
Changed behavior of [Series.to_frame] and [Index.to_frame] with explicit name=None to use None for the column name instead of the index’s name or default 0
Changed behavior of [concat] with one array of bool-dtype and another of integer dtype, this now returns object dtype instead of integer dtype; explicitly cast the bool object to integer before concatenating to get the old behavior
Changed behavior of [DataFrame] constructor given floating-point data and an integer dtype, when the data cannot be cast losslessly, the floating point dtype is retained, matching [Series] behavior
Changed behavior of [Index] constructor when given a np.ndarray with object-dtype containing numeric entries; this now retains object dtype rather than inferring a numeric dtype, consistent with [Series] behavior
Changed behavior of Index.__and__, Index.__or__ and Index.__xor__ to behave as logical operations instead of aliases for set operations
Changed behavior of [DataFrame] constructor when passed a list whose first element is a [Categorical], this now treats the elements as rows casting to object dtype, consistent with behavior for other types
Changed behavior of [DataFrame] constructor when passed a dtype that the data cannot be cast to; it now raises instead of silently ignoring the dtype
Changed the behavior of [Series] constructor, it will no longer infer a datetime64 or timedelta64 dtype from string entries
Changed behavior of [Timestamp] constructor with a np.datetime64 object and a tz passed to interpret the input as a wall-time as opposed to a UTC time
Changed behavior of [Timestamp.utcfromtimestamp] to return a timezone-aware object satisfying Timestamp.utcfromtimestamp.timestamp == val
Changed behavior of [Index] constructor when passed a SparseArray or SparseDtype to retain that dtype instead of casting to numpy.ndarray
Changed behavior of setitem-like operations on an object with [DatetimeTZDtype] when using a value with a non-matching timezone, the value will be cast to the object’s timezone instead of casting both to object-dtype
Changed behavior of [Index], [Series], [DataFrame] constructors with floating-dtype data and a [DatetimeTZDtype], the data are now interpreted as UTC-times instead of wall-times, consistent with how integer-dtype data are treated
Changed behavior of [Series] and [DataFrame] constructors with integer dtype and floating-point data containing NaN, this now raises IntCastingNaNError
Changed behavior of [Series] and [DataFrame] constructors with an integer dtype and values that are too large to losslessly cast to this dtype, this now raises ValueError
Changed behavior of [Series] and [DataFrame] constructors with an integer dtype and values having either datetime64 or timedelta64 dtypes, this now raises TypeError, use values.view instead
Removed the deprecated base and loffset arguments from [pandas.DataFrame.resample], [pandas.Series.resample] and [pandas.Grouper]. Use offset or origin instead
Changed behavior of [Series.fillna] and [DataFrame.fillna] with timedelta64[ns] dtype and an incompatible fill_value; this now casts to object dtype instead of raising, consistent with the behavior with other dtypes
Change the default argument of regex for [Series.str.replace] from True to False. Additionally, a single character pat with regex=True is now treated as a regular expression instead of a string literal.
Changed behavior of [DataFrame.any] and [DataFrame.all] with bool_only=True; object-dtype columns with all-bool values will no longer be included, manually cast to bool dtype first
Changed behavior of [DataFrame.max], [DataFrame.min], [DataFrame.mean], [DataFrame.median], [DataFrame.skew], [DataFrame.kurt] with axis=None to return a scalar applying the aggregation across both axes
Changed behavior of comparison of a [Timestamp] with a datetime.date object; these now compare as un-equal and raise on inequality comparisons, matching the datetime.datetime behavior
Changed behavior of comparison of NaT with a datetime.date object; these now raise on inequality comparisons
Enforced deprecation of silently dropping columns that raised a TypeError in [Series.transform] and [DataFrame.transform] when used with a list or dictionary
Changed behavior of [DataFrame.apply] with list-like so that any partial failure will raise an error
Changed behaviour of [DataFrame.to_latex] to now use the Styler implementation via [Styler.to_latex]
Changed behavior of Series.__setitem__ with an integer key and a Float64Index when the key is not present in the index; previously we treated the key as positional , now we treat it is a label , consistent with Series.__getitem__` behavior
Removed na_sentinel argument from [factorize], [Index.factorize], and [ExtensionArray.factorize]
Changed behavior of [Series.diff] and [DataFrame.diff] with ExtensionDtype dtypes whose arrays do not implement diff, these now raise TypeError rather than casting to numpy
Enforced deprecation of calling numpy “ufunc”s on [DataFrame] with method="outer"; this now raises NotImplementedError
Enforced deprecation disallowing passing numeric_only=True to [Series] reductions with non-numeric dtype
Changed behavior of [DataFrameGroupBy.apply] and [SeriesGroupBy.apply] so that group_keys is respected even if a transformer is detected
Comparisons between a [DataFrame] and a [Series] where the frame’s columns do not match the series’s index raise ValueError instead of automatically aligning, do left, right = left.align before comparing
Enforced deprecation numeric_only=None in DataFrame reductions that would silently drop columns that raised; numeric_only now defaults to False
Changed default of numeric_only to False in all DataFrame methods with that argument
Changed default of numeric_only to False in [Series.rank]
Enforced deprecation of silently dropping nuisance columns in groupby and resample operations when numeric_only=False
Enforced deprecation of silently dropping nuisance columns in Rolling, Expanding, and ExponentialMovingWindow ops. This will now raise a [errors.DataError]
Changed behavior in setting values with df.loc[:, foo] = bar or df.iloc[:, foo] = bar, these now always attempt to set values inplace before falling back to casting
Changed default of numeric_only in various DataFrameGroupBy methods; all methods now default to numeric_only=False
Changed default of numeric_only to False in Resampler methods
Using the method [DataFrameGroupBy.transform] with a callable that returns DataFrames will align to the input’s index
When providing a list of columns of length one to [DataFrame.groupby], the keys that are returned by iterating over the resulting DataFrameGroupBy object will now be tuples of length one
Removed deprecated methods ExcelWriter.write_cells, ExcelWriter.save, ExcelWriter.cur_sheet, ExcelWriter.handles, ExcelWriter.path
The [ExcelWriter] attribute book can no longer be set; it is still available to be accessed and mutated
Removed unused *args and **kwargs in Rolling, Expanding, and ExponentialMovingWindow ops
Removed the deprecated argument line_terminator from [DataFrame.to_csv]
Removed the deprecated argument label from [lreshape]
Arguments after expr in [DataFrame.eval] and [DataFrame.query] are keyword-only
Removed Index._get_attributes_dict
Removed Series.__array_wrap__
Changed behavior of [DataFrame.value_counts] to return a [Series] with [MultiIndex] for any list-like but an [Index] for a single label
from sparkmagic.
From here #861 (comment), @devstein @ljubon I'm wondering if there's any progress for supporting pandas 2.0 or if an ETA is available for this?
from sparkmagic.
pandas 2.0 is out for more than one year now.
from sparkmagic.
Related Issues (20)
- Sparkmagic Kerberos authentication issue HOT 1
- Plotly scala HOT 6
- Publish sparkmagic Docker images regularly HOT 1
- Run Tests Nightly to Catch Upstream Dependency Issues Earlier
- [BUG] Sparkmagic errors out using iPython 7.33.0 HOT 1
- pip deprecation warning when installing hdijupyterutils and autovizwidget HOT 1
- [QST] How to automatically load sparkmagic.magics when open a new ipython kernel tab HOT 1
- Document extending SparkMagic HOT 2
- how to pass python variable to %%sql cell ?
- [BUG] Default Docker container got broken HOT 4
- [BUG] error when first client connects HOT 1
- Jupyterlab 4.0.2 python 3.10 HOT 1
- [BUG] Cannot build Dockerfile.jupyter HOT 3
- [BUG] SparkMagic pyspark kernel magic(%%sql) hangs when running with Papermill. HOT 18
- Use variables in %%configure HOT 4
- [BUG] %%send-to-spark fails for dataframes with '\n' or ' characters HOT 2
- [BUG] launcher issue using jupyterlab 3.6.3 / sparkmagic 0.21.0 HOT 5
- Support notebook >= 7 HOT 1
- Does sparkmagic support dual scala/python spark session? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparkmagic.