Code Monkey home page Code Monkey logo

Comments (9)

orlp avatar orlp commented on May 27, 2024

I'm marking this as an enhancement rather than as a bug since the docs clearly state "Series only support the vertical strategy." for pl.concat.

That said, I personally would be okay with allowing named series on pl.concat for horizontal.

@stinodego What do you think?

from polars.

mcrumiller avatar mcrumiller commented on May 27, 2024

I personally would be okay with allowing named series on pl.concat for horizontal.

Is a named series a series whose name is not ""? How do we know that the series was not explicitly named this?

I would expect concat to fail if we end up with multiple columns with the same name (i.e. the user attempts to concatenate multiple Series named ""), but otherwise succeed, as in:

>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> s = pl.Series([4, 5, 6])
>>> pl.concat((df, s), how="horizontal)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆     │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 5   │
│ 3   ┆ 6   │
└─────┴─────┘

but:

>>> pl.concat((s, s), how="horizontal")
polars.exceptions.DuplicateError: unable to hstack, column with name "" already exists

from polars.

orlp avatar orlp commented on May 27, 2024

Is a named series a series whose name is not ""?

Yes.

How do we know that the series was not explicitly named this?

I don't really care to discuss the scenario where someone intentionally names their columns the empty string. The vast majority of unnamed series are due to people simply not naming them, which is not something I'd like to support in concat.

from polars.

mickvangelderen avatar mickvangelderen commented on May 27, 2024

There is Series.to_frame which makes the conversion explicit and allows for naming the series. I'm not sure an implicit conversion is good for the API because in some cases, concatenating a DataFrame with a Series or vice versa may be accidental.

from polars.

mcrumiller avatar mcrumiller commented on May 27, 2024

@mickvangelderen I'm not sure I follow; Series in principle should be interchangeable with eager DataFrame columns, and hstacking a (named) Series onto an existing df makes perfect logical sense:

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6],
})
s = pl.Series("c", [7, 8, 9])

# currently, we must df.hstack(s.to_frame())
print(df.hstack(s))
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 7   │
│ 2   ┆ 5   ┆ 8   │
│ 3   ┆ 6   ┆ 9   │
└─────┴─────┴─────┘

from polars.

mcrumiller avatar mcrumiller commented on May 27, 2024

I'm fine closing this if we want to only strictly allow dataframes in concat operations, as calling to_frame() is fairly easy. But--it's something pandas supports, and something that feels logical since there is no ambiguity in what should happen.

from polars.

mickvangelderen avatar mickvangelderen commented on May 27, 2024

It is not immediately clear to me what "Combine multiple DataFrames, LazyFrames, or Series into a single object." means exactly in the concat docs. The type of items is items: Iterable[PolarsType], where PolarsType = TypeVar("PolarsType", "DataFrame", "LazyFrame", "Series", "Expr"). Does that mean that each item in the iterable has to be of the same concrete type? That would mean that you can concat a DataFrame with a DataFrame, and a Series with a Series, but not necessarily a DataFrame with a Series.

from polars.

stinodego avatar stinodego commented on May 27, 2024

I'm marking this as an enhancement rather than as a bug since the docs clearly state "Series only support the vertical strategy." for pl.concat.

That said, I personally would be okay with allowing named series on pl.concat for horizontal.

@stinodego What do you think?

I don't see why we couldn't support horizontal concatenation of Series. The user must make sure the Series names are unique, otherwise we raise an error.

from polars.

mcrumiller avatar mcrumiller commented on May 27, 2024

@mickvangelderen I agree that it's ambiguous, we should rework the language on that, although it will depend on the decision made in this issue.

I am not sure about mixing eager and lazy frames. We do allow for mixing lazy frames with series, as the Series is simply considered as input into the lazy query plan:

>>> pl.LazyFrame().with_columns(
    pl.Series("a", [1, 2, 3])
).collect()
shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 2   │
│ 3   │
└─────┘

However, join, concat, hstack, etc. do not work with eager and lazy frame combinations. I feel that allowing joint operations on lazy and eager dataframes opens a Pandora's box that we should leave closed.

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.