Code Monkey home page Code Monkey logo

Comments (11)

RichieBzzzt avatar RichieBzzzt commented on May 18, 2024 1

Ah it was this issue I was thinking about with the encoding not being available in PowerShell 5.1 So yes set-content with encoding would fix the encoding.

from azure.databricks.cicd.tools.

pmooij avatar pmooij commented on May 18, 2024 1

herewith I confirm that with #151 the both the line-endings & encoding are both as expected again, great!

compared to the previous approach it's also removing the trailing white line now, this does effects all notebooks once, but it seems cleaner this way.

from azure.databricks.cicd.tools.

simondmorias avatar simondmorias commented on May 18, 2024 1

Awesome. I will upgrade the release from preview to public this weekend.

from azure.databricks.cicd.tools.

RichieBzzzt avatar RichieBzzzt commented on May 18, 2024

Opened #139 with potential fix.

from azure.databricks.cicd.tools.

pmooij avatar pmooij commented on May 18, 2024

@RichieBzzzt as I commented with you latest commit your latest changes are causing us normalization issues we didn't had before.
I'd request you to undo the normalisation to LF and put back in the normalisation to CRLF.

also I don't see how #139 should fix the issue you're describing, as the normalisation happens in the RegEx Replace and therefore creating the file using New-Item vs. Set-Content shouldn't make a difference, right?

btw, great work on fixing the phantom header line!

from azure.databricks.cicd.tools.

pmooij avatar pmooij commented on May 18, 2024

furthermore we're getting issues on Unicode characters
image

afaik Set-Content is implicitly using ANSI encoding, causing this Unicode mismatch
while New-Item is using UTF8, handling Unicode characters properly

from azure.databricks.cicd.tools.

RichieBzzzt avatar RichieBzzzt commented on May 18, 2024

@pmooij we are getting issues on our machines where users are using macos, ubuntu and windows. We have build that run tests on windows and ubuntu. I am testing and fixing issues I encounter from other users via pull requests to the module which they then confirm as fixed.

The whole reason we go through this merry dance of altering content is because the databricks api inserts a comment on the first row of every notebook, and we aim to remove it. Part of me would be much happier to accept this comment and and then we can remove most of this [magic].(

$Response = @()
$Response = Get-Content $tempLocalExportPath -Encoding UTF8
$NewResponse = $Response -ne "# Databricks notebook source"
Remove-Item $tempLocalExportPath
if ($Format -eq "SOURCE") {
$ResponseString = ($NewResponse.replace("[^`r]`n", "`n") -Join "`n")
}
if ((Test-Path -PathType Leaf -Path $LocalExportPath) -eq $true) {
Set-Content -path $LocalExportPath -value $ResponseString | Out-Null
}
else {
New-Item -force -path $LocalExportPath -value $ResponseString -type file | Out-Null
}
). Then we don't have to account for any line endings or the minutia of encoding, as frankly it has burnt up a lot of my time trying to make this work consistently across various OS and versions of PowerShell: ie Set-Content can be set to UTF8 but only in later versions of PowerShell, and some users are on corporate laptops that are stuck on 5.1. The disadvantage would be that EVERYONE using this module would be affected by that change first time of downloading notebooks, and then it will work as normal after that.

I'll have a chat with the maintainer, see what he thinks. But in the meantime I suggest you pin yourself to a version that works for you whilst we figure out a solution that works for everyone using the module, or someone else adds a PR that fixes all the issues before we get there.

from azure.databricks.cicd.tools.

pmooij avatar pmooij commented on May 18, 2024

thanks for you reply Richie. don't get me wrong, I really appreciate all the effort you guys are putting into this module!
skipping the whole top comment line removal and the therewith associated line ending & encoding normalization hustle sounds good to me, sounds good to discuss with Simon.

regarding the disadvantage that EVERYONE would be hit by this change: that's something EVERYONE is experiencing with the move from CRLF to LF normalization as well, so to speak for myself: it wouldn't be a big issue for me to deal with once.

for now i'll add the normalization to CRLF on top in our own scripts so we can stay with the latest version of the module.

for the encoding I don't understand why you could use UTF8 on Get-Content but why you couldn't use UTF8 on Set-Content. I'd say this should still be in place given the current approach.

from azure.databricks.cicd.tools.

RichieBzzzt avatar RichieBzzzt commented on May 18, 2024

OK I've got a potential fix here but I'm going to do some manual testing on a couple of different OS's to double check it works.

from azure.databricks.cicd.tools.

pmooij avatar pmooij commented on May 18, 2024

OK I've got a potential fix here but I'm going to do some manual testing on a couple of different OS's to double check it works.

using [Environment]::NewLine & [system.io.file]::WriteAllText() with the correct UTF8 encoding looks very neat

from azure.databricks.cicd.tools.

RichieBzzzt avatar RichieBzzzt commented on May 18, 2024

Although fix with #151 is merged, suggest we keep issue open for a while, see if anything unexpected happens.

from azure.databricks.cicd.tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.