databrickslabs / pylint-plugin Goto Github PK
View Code? Open in Web Editor NEWDatabricks Plugin for PyLint
Home Page: https://pypi.org/project/databricks-labs-pylint/
License: Other
Databricks Plugin for PyLint
Home Page: https://pypi.org/project/databricks-labs-pylint/
License: Other
In the current version of Spark Connect, the Spark Connect implementations of the Column
and DataFrame
classes aren't inherited from the corresponding classes, so the following checks will fail:
from pyspark.sql import DataFrame, Column
if isinstance(cl, Column):
...
if isinstance(df, DataFrame):
....
Making it harder to port code to UC Shared clusters and serverless notebooks/workflows...
Is there a way to customize what is marked as incompatible with UC and/or a way to mark a line to be skipped during the check?
One of our tasks imports boto3 to remove files from S3, but it uses secrets set on the cluster to create the session, so it's totally compatible. Come to think of it, I'm not actually sure what legacy behavior this is checking for, was boto3 used for hive metastore operations?
PS: Love the idea of this project, I was recently bit several times by inadvertent global spark variables and have been checking things manually since then, but I'll be adding this to my CI pipelines for this going forward.
Suggest rewrites with df.transform(x)
iterators cannot be consumed twice, which may lead to bugs if iterators are passed as arguments
Strings like 'foo ' 'bar'
resolve to 'foo bar'
, but are very confusing to read:
'The use of default dbfs: references is deprecated: ' '/mnt/things/e/f/g',
Passing an empty collection is better than passing an optional collection
%pip install pytest
%pip install databricks-labs-pylint...whl
%sh pylint ...
S3, boto3, etc
It would be good to be consistent across detection and remediation recommendations between the newly released https://pypi.org/project/databricks-labs-pylint/ and the UCX.
Review pylint & UCX detection capabilities & recommendations, research with Product and Field Eng and change assessment.md, detection queries & views or identify changes to the databricks-labs-pylint project.
No response
Allow list of good locations
Less readable:
self.reader(table_query)
.options(**self._get_jdbc_reader_options(jdbc_reader_options) | self._get_timestamp_options())
More readable:
options = self._get_jdbc_reader_options(jdbc_reader_options) | self._get_timestamp_options()
self.reader(table_query).options(**options)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.