Comments (5)
import databricks.koalas as ks
from pyspark import SparkConf
from pyspark.sql import SparkSession
if __name__ == '__main__':
conf = SparkConf().setAppName("test")
spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()
sdf = spark.sql("select uid,vr_id, gender, follow_count, byfollow_count, is_click "
"from database.table where data_date=20220726 "
"and uid=249462081764458496 limit 5")
sdf.show(n=20)
print("=======================to_koalas===============================")
df = sdf.to_koalas()
# Explicitly cast columns to the desired data types
df["uid"] = df["uid"].astype("int64")
df["vr_id"] = df["vr_id"].astype("int64")
df["gender"] = df["gender"].astype("int32")
df["follow_count"] = df["follow_count"].astype("int32")
df["byfollow_count"] = df["byfollow_count"].astype("int32")
df["is_click"] = df["is_click"].astype("int32")
category_features_df = df[["uid", "vr_id", "gender"]].fillna(0)
dense_features_df = df[["follow_count", "byfollow_count"]].fillna(0)
y = df["is_click"].values
print("category_features_df: {}".format(category_features_df))
print("dense_features_df: {}".format(dense_features_df))
total_uids = category_features_df["uid"].unique().tolist()
total_vids = category_features_df["vr_id"].unique().tolist()
uid_id2index = {uid: i for i, uid in enumerate(total_uids)}
uid_index2id = {i: uid for uid, i in uid_id2index.items()}
vid_id2index = {vid: i for i, vid in enumerate(total_vids)}
vid_index2id = {i: vid for vid, i in vid_id2index.items()}
print(f"uid_id2index: {uid_id2index}")
print(f"vid_id2index: {vid_id2index}")
from koalas.
can I take it ?
from koalas.
@tsafacjo could you open a ticket on Spark JIRA and made a fix for pyspark.pandas instead of Koalas? This repository is no longer maintained since Koalas has been migrated into Apache Spark.
from koalas.
ok, thanks @itholic
from koalas.
np! please feel free to ping me if you want to any help for contributing Apache Spark.
from koalas.
Related Issues (20)
- convert_dtypes support HOT 2
- Koalas vs Pandas HOT 3
- Series.to_json(orient='records') does not return records-based JSON HOT 3
- Write custom metadata to output files with dataframe.to_parquet? HOT 1
- read_excel's parameter - mangle_dupe_cols is used to handle duplicate columns but fails if the duplicate columns are case sensitive. HOT 2
- Predicate Pushdown not Working HOT 3
- AttributeError: module 'databricks.koalas' has no attribute 'DateOffset' HOT 1
- Whether the `apply` function is implemented using the pandas_udf function? HOT 1
- missing function `koalas.series.apply` HOT 1
- Joining koalas frame with spark HOT 2
- fillna does not work with decimals HOT 1
- pyspark is not required when install koalas
- data type conversion error HOT 1
- Spammed with FutureWarnings that are unfilterable
- Koalas.idxmin() is not picking the minimum value from a dataframe, but pandas.idxmin() gives HOT 1
- Attribute Error: module 'numpy' has no attribute 'bool' HOT 3
- Is koalas still being worked on? or is the project on pause at the moment? HOT 2
- gotImport Error
- Erro XVPL formula!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from koalas.