Code Monkey home page Code Monkey logo

xorbits's People

Contributors

aa452948257 avatar aresnow1 avatar bojun-feng avatar chengjieli28 avatar codingl2k1 avatar dependabot[bot] avatar fengsxy avatar flying-tom avatar hank0626 avatar hoarjour avatar jiayaobo avatar jiayini1119 avatar lipengsh avatar luweizheng avatar matrixji avatar onesuper avatar pangyoki avatar qianduoduo0904 avatar qinxuye avatar randomy-2 avatar rayji01 avatar shark-21 avatar sighingnow avatar traderbxy avatar uranusseven avatar xprobebot avatar yibinliu666 avatar yifeis7 avatar zhou1213cn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xorbits's Issues

BUG: K8s starts with confusing warning information

Describe the bug

K8s starts with confusing warning information: warning Readiness probe failed: dial tcp 172.31.32.96:15031: connect: connection refused.

Consider extending initialDelaySeconds or use startupProbe.

ENH: execute named variables eagerly in interactive environment

Is your feature request related to a problem? Please describe

In interactive environment, users tend to do exploratory analysis and there's no so called "final result". Thus, deferred execution may cause duplicated execution. In the following example, line 0 and line 1 are executed twice:

[0]: df = pd.DataFrame({"foo": (1, 2, 3), "bar": (4, 5, 6)})
[1]: df["baz"] = df["bar"] + 3
[2]: df.sum(axis=0)
[3]: df.describe()

Describe the solution you'd like

Execute named variables eagerly in interactive env.

ENH: implement plot method for groupby types

Is your feature request related to a problem? Please describe

Here's an example:

>>> import pandas as pd
>>> import numpy as np

>>> df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                                          'foo', 'bar', 'foo', 'foo'],
                       'B': ['one', 'one', 'two', 'three',
                             'two', 'two', 'one', 'three'],
                       'C': np.random.randn(8),
                       'D': np.random.randn(8)})
>>> df.groupby('A').plot()
A
bar    AxesSubplot(0.125,0.11;0.775x0.77)
foo    AxesSubplot(0.125,0.11;0.775x0.77)
dtype: object

ENH: Make it more friendly when iterating `DataFrame.columns`

I'm thinking to add a property own_data to indicate whether xorbits object holds the data directly, for those objects created from pandas or numpy, iterating and printing them doesn't need execution, just iterating or printing the pandas dataframe or numpy ndarray.

Here list the methods that can skip execution for "own-data" entity:

  • items
  • iterrows
  • itertuples
  • __str__
  • __repr__

As Mars hasn't implemented __iter__, so does xorbits.

[BUG] Incorrect dependency specifiers for scipy

Describe the bug

I install xorbits by pip, and try to import xorbits.pandas, which got below error:

Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xorbits.pandas as pd 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/__init__.py", line 17, in <module>
    from .core import run
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/core/__init__.py", line 16, in <module>
    from .execution import run
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/core/execution.py", line 17, in <module>
    from .adapter import MarsEntity, mars_execute
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/core/adapter.py", line 26, in <module>
    from .._mars import dataframe as mars_dataframe
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/_mars/dataframe/__init__.py", line 17, in <module>
    from .initializer import DataFrame, Series, Index
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/_mars/dataframe/initializer.py", line 20, in <module>
    from ..tensor import tensor as astensor, stack
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/_mars/tensor/__init__.py", line 297, in <module>
    from . import special
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/_mars/tensor/special/__init__.py", line 18, in <module>
    from .err_fresnel import (
  File "/home/bji/.local/lib/python3.8/site-packages/xorbits/_mars/tensor/special/err_fresnel.py", line 221, in <module>
    @implement_scipy(spspecial.voigt_profile)
AttributeError: module 'scipy.special' has no attribute 'voigt_profile'

I've checked the installed packages in my python env. The root cause seems to: I've previously installed the scipy==1.3.3
After I manually update scipy to 1.4.1(Seems, for load xorbits.pandas at least needs 1.4.0), then it could work.

To Reproduce

Could simply reproduce it by:

pip3 install 'scipy<1.4.0'
pip3 install xorbits
python3 -c 'import xorbits.pandas as pd'

Expected behavior

Additional context

I'd like to provide the PR to fix it, later.

BUG: `DataFrameLoc` did not support item assignment

Describe the bug

A clear and concise description of what the bug is.

import xorbits.pandas as pd
s = pd.Series([1, 2, 3])
s.loc[0] = 111
TypeError                                 Traceback (most recent call last)
Cell In [3], line 1
----> 1 s.loc[0] = 111

TypeError: 'DataFrameLoc' object does not support item assignment
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df.loc[0] = [11, 22]
TypeError                                 Traceback (most recent call last)
Cell In [5], line 1
----> 1 df.loc[0] = [11, 22]

TypeError: 'DataFrameLoc' object does not support item assignment

Other Indexing methods like iloc, at and iat have the same problem

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xorbits you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

BUG: DataFrame.iloc is not properly handled

Describe the bug

DataFrame.iloc returns a mars object.

To Reproduce

>>> import xorbits.pandas as pd
>>> dates = pd.date_range("20130101", periods=6)
>>> df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
>>> df.iloc[3]
Series(op=DataFrameIlocGetItem)

Expected behavior

>>> import xorbits.pandas as pd
>>> dates = pd.date_range("20130101", periods=6)
>>> df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
>>> df.iloc[3]
A    0.721555
B   -0.706771
C   -1.039575
D    0.271860
Name: 2013-01-04 00:00:00, dtype: float64

ENH: Make progress bar more flexible

Is your feature request related to a problem? Please describe

Now xorbits uses tqdm to show progress of execution, it will leave too many bars in console if there are multiple tasks, more flexible way should be supported in the future.

[BUG] np.sort failed for GPU

Describe the bug

np.sort failed for GPU.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xorbits you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.
In [10]: %time print(np.sort(np.random.rand(100_000_000, gpu=True)))
  0%|                                                                                                                                                                               |   0.00/100 [00:01<?, ?it/s]2023-01-30 09:27:22,974 xorbits._mars.services.scheduling.worker.execution 2867073 ERROR    Failed to run subtask FvEe9WQxwQ3wEr5zzgnGcQe8 on band gpu-0
Traceback (most recent call last):
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/services/subtask/worker/processor.py", line 203, in _execute_operand
    return execute(ctx, op)
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/core/operand/core.py", line 491, in execute
    result = executor(results, op)
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/tensor/base/psrs.py", line 525, in execute
    res = ctx[op.outputs[0].key] = _sort(a, op, xp)
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/tensor/base/psrs.py", line 425, in _sort
    assert xp is cp
AssertionError
2023-01-30 09:27:22,976 xorbits._mars.services.task.execution.mars.stage 2867073 ERROR    Subtask FvEe9WQxwQ3wEr5zzgnGcQe8 errored
Traceback (most recent call last):
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/services/subtask/worker/processor.py", line 203, in _execute_operand
    return execute(ctx, op)
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/core/operand/core.py", line 491, in execute
    result = executor(results, op)
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/tensor/base/psrs.py", line 525, in execute
    res = ctx[op.outputs[0].key] = _sort(a, op, xp)
  File "/home/xuyeqin/projects/xorbits/python/xorbits/_mars/tensor/base/psrs.py", line 425, in _sort
    assert xp is cp
AssertionError
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.00/100 [00:01<00:00, 73.21it/s]
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
File <timed eval>:1

File ~/projects/xorbits/python/xorbits/utils.py:33, in safe_repr_str.<locals>.inn(self, *args, **kwargs)
     31     return getattr(object, f.__name__)(self)
     32 else:
---> 33     return f(self, *args, **kwargs)

File ~/projects/xorbits/python/xorbits/core/data.py:223, in DataRef.__str__(self)
    221     return self.data._mars_entity.op.data.__str__()
    222 else:
--> 223     run(self)
    224     return self.data.__str__()

File ~/projects/xorbits/python/xorbits/core/execution.py:42, in run(obj)
     40 if isinstance(obj, DataRef):
     41     if need_to_execute(obj):
---> 42         mars_execute(_get_mars_entity(obj))
     43 else:
     44     refs_to_execute = [_get_mars_entity(ref) for ref in obj if need_to_execute(ref)]

File ~/projects/xorbits/python/xorbits/_mars/deploy/oscar/session.py:1875, in execute(tileable, session, wait, new_session_kwargs, show_progress, progress_update_interval, *tileables, **kwargs)
   1873     session = get_default_or_create(**(new_session_kwargs or dict()))
   1874 session = _ensure_sync(session)
-> 1875 return session.execute(
   1876     tileable,
   1877     *tileables,
   1878     wait=wait,
   1879     show_progress=show_progress,
   1880     progress_update_interval=progress_update_interval,
   1881     **kwargs,
   1882 )

File ~/projects/xorbits/python/xorbits/_mars/deploy/oscar/session.py:1669, in SyncSession.execute(self, tileable, show_progress, warn_duplicated_execution, *tileables, **kwargs)
   1667 fut = asyncio.run_coroutine_threadsafe(coro, self._loop)
   1668 try:
-> 1669     execution_info: ExecutionInfo = fut.result(
   1670         timeout=self._isolated_session.timeout
   1671     )
   1672 except KeyboardInterrupt:  # pragma: no cover
   1673     logger.warning("Cancelling running task")

File ~/miniconda3/envs/mars/lib/python3.9/concurrent/futures/_base.py:446, in Future.result(self, timeout)
    444     raise CancelledError()
    445 elif self._state == FINISHED:
--> 446     return self.__get_result()
    447 else:
    448     raise TimeoutError()

File ~/miniconda3/envs/mars/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
    389 if self._exception:
    390     try:
--> 391         raise self._exception
    392     finally:
    393         # Break a reference cycle with the exception in self._exception
    394         self = None

File ~/projects/xorbits/python/xorbits/_mars/deploy/oscar/session.py:1855, in _execute(session, wait, show_progress, progress_update_interval, cancelled, *tileables, **kwargs)
   1852     else:
   1853         # set cancelled to avoid wait task leak
   1854         cancelled.set()
-> 1855     await execution_info
   1856 else:
   1857     return execution_info

File ~/projects/xorbits/python/xorbits/_mars/deploy/oscar/session.py:106, in ExecutionInfo._ensure_future.<locals>.wait()
    105 async def wait():
--> 106     return await self._aio_task

File ~/projects/xorbits/python/xorbits/_mars/deploy/oscar/session.py:954, in _IsolatedSession._run_in_background(self, tileables, task_id, progress, profiling)
    948         logger.warning(
    949             "Profile task %s execution result:\n%s",
    950             task_id,
    951             json.dumps(task_result.profiling, indent=4),
    952         )
    953     if task_result.error:
--> 954         raise task_result.error.with_traceback(task_result.traceback)
    955 if cancelled:
    956     return

File ~/projects/xorbits/python/xorbits/_mars/services/task/supervisor/processor.py:373, in TaskProcessor.run(self)
    371     async with self._executor:
    372         async for stage_args in self._iter_stage_chunk_graph():
--> 373             await self._process_stage_chunk_graph(*stage_args)
    374 except Exception as ex:
    375     self.result.error = ex

File ~/projects/xorbits/python/xorbits/_mars/services/task/supervisor/processor.py:250, in TaskProcessor._process_stage_chunk_graph(self, stage_id, stage_profiler, chunk_graph)
    244 tile_context = await asyncio.to_thread(
    245     self._get_stage_tile_context,
    246     {c for c in chunk_graph.result_chunks if not isinstance(c.op, Fetch)},
    247 )
    249 with Timer() as timer:
--> 250     chunk_to_result = await self._executor.execute_subtask_graph(
    251         stage_id, subtask_graph, chunk_graph, tile_context
    252     )
    253 stage_profiler.set("run", timer.duration)
    255 self._preprocessor.post_chunk_graph_execution()

File ~/projects/xorbits/python/xorbits/_mars/services/task/execution/mars/executor.py:208, in MarsTaskExecutor.execute_subtask_graph(self, stage_id, subtask_graph, chunk_graph, tile_context, context)
    206 curr_tile_progress = self._tile_context.get_all_progress() - prev_progress
    207 self._stage_tile_progresses.append(curr_tile_progress)
--> 208 return await stage_processor.run()

File ~/projects/xorbits/python/xorbits/_mars/services/task/execution/mars/stage.py:231, in TaskStageProcessor.run(self)
    227     if self.subtask_graph.num_shuffles() > 0:
    228         # disable scale-in when shuffle is executing so that we can skip
    229         # store shuffle meta in supervisor.
    230         await self._scheduling_api.disable_autoscale_in()
--> 231     return await self._run()
    232 finally:
    233     if self.subtask_graph.num_shuffles() > 0:

File ~/projects/xorbits/python/xorbits/_mars/services/task/execution/mars/stage.py:251, in TaskStageProcessor._run(self)
    249 if self.error_or_cancelled():
    250     if self.result.error is not None:
--> 251         raise self.result.error.with_traceback(self.result.traceback)
    252     else:
    253         raise asyncio.CancelledError()

File ~/projects/xorbits/python/xorbits/_mars/services/subtask/worker/processor.py:203, in _execute_operand()
    198 @enter_mode(build=False, kernel=True)
    199 def _execute_operand(
    200     self, ctx: Dict[str, Any], op: OperandType
    201 ):  # noqa: R0201  # pylint: disable=no-self-use
    202     try:
--> 203         return execute(ctx, op)
    204     except BaseException as ex:
    205         # wrap exception in execution to avoid side effects
    206         raise ExecutionError(ex).with_traceback(ex.__traceback__) from None

File ~/projects/xorbits/python/xorbits/_mars/core/operand/core.py:491, in execute()
    487 else:
    488     # Cast `UFuncTypeError` to `TypeError` since subclasses of the former is unpickleable.
    489     # The `UFuncTypeError` was introduced by numpy#12593 since v1.17.0.
    490     try:
--> 491         result = executor(results, op)
    492         succeeded = True
    493         if op.stage is not None:

File ~/projects/xorbits/python/xorbits/_mars/tensor/base/psrs.py:525, in execute()
    522 if not op.return_indices:
    523     if op.kind is not None:
    524         # sort
--> 525         res = ctx[op.outputs[0].key] = _sort(a, op, xp)
    526     else:
    527         # do not sort, prepare for sample by `xp.partition`
    528         kth = xp.linspace(
    529             max(w - 1, 0), a.shape[op.axis] - 1, num=n, endpoint=False
    530         ).astype(int)

File ~/projects/xorbits/python/xorbits/_mars/tensor/base/psrs.py:425, in _sort()
    422     return method(axis=axis, kind=kind, order=order)
    423 else:  # pragma: no cover
    424     # cupy does not support structure type
--> 425     assert xp is cp
    426     assert order is not None
    427     method = a.sort if inplace else partial(cp.sort, a)

AssertionError: 

[BUG] xorbits.pandas skiprows keyword argument ignored

Describe the bug

Using xorbits.pandas.read_csv() does not honor the skiprows keyword argument.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version: 3.8.10
  2. The version of Xorbits you use: 0.1.2
  3. Versions of crucial packages, such as numpy, scipy and pandas: numpy: 1.23.1, scipy: 1.8.1, pandas: 1.4.3
  4. Full stack of the error. stack_trace.txt
  5. Minimized code to reproduce the error.
import xorbits.pandas as pd
df = pd.read_csv('file.txt', 
    sep=' ',
    names=('val1', 'val2'),
    dtype = {'val1': 'int', 'val2': 'float'},
    skiprows=1)

Contents of file.txt:

# This is a comment line
1 2.2 
2 4.4 
3 6.6 
4 8.8 
5 11.0
6 13.2
7 15.4
8 17.6
9 19.8
10 22.0

Expected behavior

For a dataframe to be generated with the data without errors.

Additional context

I checked to see if the number of fields in the comment line mattered, and they don't, but the error appears to fail on the second to last field. Changing the data separator doesn't impact this behavior either. Changing the import to vanilla pandas allows the code to run fine.

ENH: Don't trigger execution in exception traceback

In [9]: def raise_repr(arg):
   ...:     raise TypeError(f'Unknown arg {repr(arg)}')
   ...:

In [10]: raise_repr(DataFrame([1,2,3]))
/Users/hekaisheng/Documents/projects/xorbits/python/xorbits/_mars/deploy/oscar/session.py:2049: UserWarning: No existing session found, creating a new local session now.
  warnings.warn(warning_msg)
2022-12-12 15:21:07,638 xorbits._mars.deploy.oscar.local 22244 WARNING  Web service started at http://0.0.0.0:21856
100%|██████████████████████████████████████████████████████████████████████████████| 100.00/100 [00:00<00:00, 288.28it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [10], line 1
----> 1 raise_repr(DataFrame([1,2,3]))

Cell In [9], line 2, in raise_repr(arg)
      1 def raise_repr(arg):
----> 2     raise TypeError(f'Unknown arg {repr(arg)}')

TypeError: Unknown arg    0
0  1
1  2
2  3

BUG: DataFrame.at is not properly handled

Describe the bug

DataFrame.at returns a mars object.

To Reproduce

>>> import xorbits.pandas as pd
>>> df = pd.DataFrame((1, 2, 3))
>>> df.at[0, 0]
Tensor <op=DataFrameLocGetItem, shape=(), key=f5caa0e174b3142b0aa648f305532703_0>

Expected behavior

>>> import xorbits.pandas as pd
>>> df = pd.DataFrame((1, 2, 3))
>>> df.at[0, 0]
1

ENH: Support installing third-party libraries when creating cluster on kubernetes

When creating kubernetes cluster, xorbits uses latest xorbits image which installed required packages, it needs a customized image if users want to use some third-party packages. It's quite useful to support specifying some packages and install them automatically before creating cluster, code would be like this, new_cluster(pip_list=["tensorflow"], conda_list=["numba", "lightgbm"]).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.