knrafto / language-bash Goto Github PK
View Code? Open in Web Editor NEWParse and pretty-print Bash shell scripts
License: BSD 3-Clause "New" or "Revised" License
Parse and pretty-print Bash shell scripts
License: BSD 3-Clause "New" or "Revised" License
For example, "x\012y" will be parsed as [Char 'x',Escape '0',Char '1',Char '2',Char 'y']
.
For other situation, I could deal with the span individually (like from [Char 'l', Char 's'] to ExtCmd ls
).
But for the above example, I'd have to make a special case for Escape '0'
. Why not make it Escape '\n'
(I am not sure if \012 is \n), since actually "\012" is a char in system.
On the other hand, one might say "parser should return what it is, not what it means". But then I should not get Escape '0'
, either. I should get Char '\', Char '0'
.
Consider the following bash snippet:
cat <<EOF
here document
EOF
If one parses it with Language.Bash.Parse.parse and pretty prints it with Language.Bash.Pretty.prettyText it looks like that:
cat <<EOF
here document
EOF
;
Note the semicolon at the end: Bash will refuse the script with "line 4: syntax error near unexpected token `;'".
parse "" "ls && ls"
Right (List [Statement (Last (Pipeline {timed = False, timedPosix = False, inverted = False, commands = [Command (SimpleCommand [] [[Char 'l',Char 's'],[Char '&',Char '&'],[Char 'l',Char 's']]) []]})) Sequential])
Minor leftover point from #33.
IMO I don't think "conditional command" is accurate, maybe "block command" would be better terminology.
It would be useful for me if comments were put in the AST in a "Comment" construct (I plan to translate them into Comments in my new language).
Prelude> :m + Language.Bash.Parse
Prelude Language.Bash.Parse> parse "Test" "ls > log"
Loading package array-0.5.0.0 ... linking ... done.
Loading package deepseq-1.3.0.2 ... linking ... done.
Loading package bytestring-0.10.4.0 ... linking ... done.
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
Loading package text-1.1.0.0 ... linking ... done.
Loading package parsec-3.1.5 ... linking ... done.
Loading package pretty-1.1.1.1 ... linking ... done.
Loading package language-bash-0.3.0 ... linking ... done.
Left "Test" (line 1, column 6):
unexpected ">"
expecting redirection, |, |&, &&, ||, list terminator or pipeline
It would be nice if we had Monoid
and Semigroup
instances for the List
type.
Test case:
module Main (main) where
import Language.Bash.Parse (parse)
main = print (parse "<input>" "(((")
This program uses up lots of memory and does not terminate for me. (I'm using language-bash version 0.6.1 from Hackage.)
[
is not parsed into Cond, Binary, etc, as [[
is. Based on [1], [2], and the code for test
and [[
, I'm guessing this is an oversight rather than a design decision.
Happy to code up a fix, which would reuse the test
code, but I wanted to check that it would be accepted before I did.
[1] https://www.gnu.org/software/bash/manual/html_node/Bourne-Shell-Builtins.html#Bourne-Shell-Builtins
[2] https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html#Bash-Conditional-Expressions,
https://hackage.haskell.org/package/language-bash-0.9.0/docs/Language-Bash-Parse.html#v:parse is only String. That is really unfortunate. String is both a weird (not even UTF-8) format and is unefficient. I generally don't use it at all.
Nothing else from the module is exposed, so it's hard to build my own version without forking the library.
Looks like arithmetic becomes an ArithSubst String
, and it isn't parsed any further. What's the plan there? Should I instead try to parse the arithmetic strings as C?
> let Right e = parse "<stdin>" "while true; do :; done;" in putStrLn $ prettyText e
while true; do
:;
done;
For reference, the output from bash
's declare -f
is:
$ f() { while true; do :; done }
$ declare -f f
f ()
{
while true; do
:;
done
}
It seems like doDone l
binds too tightly that the whole block is now beside the while p;
text.
e.g.
echo 'foo' |
cat
Right now we can't handle a newline after |
.
Sorry, I do not know how to open issue 4 again. I found it worse.
$ cat /tmp/test.sh
echo -e 'x\064y'
$ echo -e 'x\064y'
x4y
λ>bash <- readFile "/tmp/test.sh"
λ>parse "" bash
Right (List [Statement (Last (Pipeline {timed = False, timedPosix = False, inverted = False, commands = [Command (SimpleCommand [] [[Char 'e',Char 'c',Char 'h',Char 'o'],[Char '-',Char 'e'],[Single [Char 'x',Char '\\',Char '0',Char '6',Char '4',Char 'y']]]) []]})) Sequential])
One option: round-trip testing by generating syntax trees, pretty-printing them, and parsing them. But this doesn't test one element at a time or every possible variation of Bash syntax.
> let Right s = parse "" "ls | tail"
> prettyText s
"ls tail;"
QuickCheck complains with "Use --quickcheck-replay=735772 to reproduce."
Again, wondering if this is a design decision?
When I call parseTestExpr on a list of words (the '[' and ']' have already been stripped off), the words are not parsed and double-quoted strings with command substitutions in them are not processed.
How were you thinking this should be accomplished?
convertTest :: [W.Word] -> Expr
convertTest ws = case condExpr of
Left err -> Debug $ "doesn't parse" ++ (show err) ++ (show strs)
Right e -> convertCommand e
where condExpr = C.parseTestExpr strs
strs = (map W.unquote ws)
I tried to get all top-level assignments and ended up with:
extractAssignments :: List -> [Assign]
extractAssignments (List stms) = join $ fmap getAssign $ getCommands stms
where
getCommands :: [Statement] -> [Command]
getCommands = join . fmap commands . catMaybes . fmap findPipes
where
findPipes (Statement (Last p@(Pipeline{})) Sequential) = Just p
findPipes _ = Nothing
getAssign :: Command -> [Assign]
getAssign (Command (SimpleCommand ass _) _) = ass
getAssign _ = []
I believe this would be easier with optics. I would propose the optics package, because it has better documentation, an opaque interface with useful type errors and safer behavior in some corner cases (e.g. less implicit monoidal behavior) than lens.
While working on #24 it turned out that the current test suite performed poorly.
Proposal:
This is of course only for those tests involving pretty printing; we can keep the existing unit tests as they are.
Additionally we could fuzzy test the implementation with QuickCheck.
I use language-bash in ghcup to parse variable assignments from files like /etc/os-release
: https://gitlab.haskell.org/haskell/ghcup-hs/-/blob/master/lib/GHCup/Utils/Bash.hs
This works relatively well, but when I profile ghcup on the simplest command ghcup list
, I get this:
Thu Jun 25 21:56 2020 Time and Allocation Profiling Report (Final)
ghcup +RTS -p -RTS list
total time = 0.05 secs (52 ticks @ 1000 us, 1 processor)
total alloc = 56,666,312 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
word Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:300:1-43 15.4 40.2
jstring Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:316:1-32 11.5 5.5
try Control.Monad.Catch src/Control/Monad/Catch.hs:789:1-47 9.6 9.4
skipSpace Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:(81,1)-(86,50) 5.8 1.5
<?> Data.Aeson.Types.Internal Data/Aeson/Types/Internal.hs:532:1-74 3.8 0.4
unfoldDirContents Streamly.External.Posix.DirStream src/Streamly/External/Posix/DirStream.hs:(50,1)-(59,69) 3.8 1.1
dirContentsStream Streamly.External.Posix.DirStream src/Streamly/External/Posix/DirStream.hs:(70,1)-(72,83) 3.8 2.0
lexeme Text.Megaparsec.Lexer Text/Megaparsec/Lexer.hs:78:1-23 3.8 3.2
pathParser' URI.ByteString.Internal src/URI/ByteString/Internal.hs:(567,1)-(569,82) 3.8 1.6
decimal Data.Attoparsec.ByteString.Char8 Data/Attoparsec/ByteString/Char8.hs:(447,1)-(448,49) 1.9 0.1
copy Data.HashMap.Array Data/HashMap/Array.hs:(328,1)-(333,30) 1.9 0.0
object_' Data.Aeson.Parser.Internal Data/Aeson/Parser/Internal.hs:(135,49)-(137,22) 1.9 0.5
catchAll Control.Monad.Catch src/Control/Monad/Catch.hs:744:1-16 1.9 1.5
mapExcepts Haskus.Utils.Variant.Excepts src/lib/Haskus/Utils/Variant/Excepts.hs:102:1-33 1.9 0.0
liftE Haskus.Utils.Variant.Excepts src/lib/Haskus/Utils/Variant/Excepts.hs:110:1-38 1.9 0.0
mapVariantAt Haskus.Utils.Variant src/lib/Haskus/Utils/Variant.hs:(413,1)-(416,47) 1.9 0.0
uncons Data.ByteString.Lazy.UTF8 Data/ByteString/Lazy/UTF8.hs:(228,1)-(229,38) 1.9 0.6
parseAbs HPath src/HPath.hs:(142,1)-(147,38) 1.9 0.0
word Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:175:1-66 1.9 0.1
ioDesc Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:188:1-46 1.9 0.3
assign Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:217:1-43 1.9 0.2
anyOperator Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:209:1-51 1.9 3.0
select Language.Bash.Operator src/Language/Bash/Operator.hs:19:1-43 1.9 1.8
runParsecT Text.Megaparsec.Internal Text/Megaparsec/Internal.hs:(591,1)-(596,56) 1.9 0.1
getOptic Optics.Internal.Optic src/Optics/Internal/Optic.hs:70:5-12 1.9 0.0
wrapCompile Text.Regex.Posix.Wrap src/Text/Regex/Posix/Wrap.hsc:(372,1)-(384,42) 1.9 0.0
regNameParser URI.ByteString.Internal src/URI/ByteString/Internal.hs:(533,1)-(535,59) 1.9 0.3
getDownloadsF GHCup.Download lib/GHCup/Download.hs:(105,1)-(127,42) 1.9 0.0
printListResult Main app/ghcup/Main.hs:(1332,1)-(1377,18) 1.9 0.2
satisfying Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:(130,1)-(132,49) 0.0 1.3
pack Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:(99,1)-(122,14) 0.0 2.2
anyWord Language.Bash.Parse.Internal src/Language/Bash/Parse/Internal.hs:171:1-39 0.0 2.4
name Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:(313,1)-(316,38) 0.0 1.8
functionName Language.Bash.Parse.Word src/Language/Bash/Parse/Word.hs:(320,1)-(324,31) 0.0 1.5
getDownloads GHCup.Download lib/GHCup/Download.hs:(143,1)-(261,42) 0.0 2.2
This is on a 7 LOC bash file:
NAME="Exherbo"
PRETTY_NAME="Exherbo Linux"
ID="exherbo"
ANSI_COLOR="0;32"
HOME_URL="https://www.exherbo.org/"
SUPPORT_URL="irc://irc.freenode.net/#exherbo"
BUG_REPORT_URL="https://bugs.exherbo.org/"
I'm wondering if there's a way to reduce the allocations.
bash$ echo {{a,b}
{a {b
ghci> map (\c -> case c of Char x -> x) (head (braceExpand (map Char "{{a,b}")))
"{{a,b}"
Since the bash expansion algorithm knows about quoting and escaping, this would need to be made aware of those too. (It's really ugly...)
bash$ echo {q\{b{zp,\}c}{kmy\}}{tjl}
{q{bzp{kmy}}{tjl} {q{b}c{kmy}}{tjl}
QuickCheck found this in this job
Tests
Properties
brace expansion: FAIL (0.08s)
*** Failed! Assertion failed (after 56 tests):
"\\}\\,,{y}{\\{\\ }\\}{}\\}s\\ p\\{\\},\\,lhhiq\\,qv}{}\\},}\\}\\ \\,{\\}\\ \\{a\\ \\,\\,z\\,t"
Use --quickcheck-replay '55 TFGenR 0000000309618D3500000000000F4240000000000000E1390000000056F692C0 0 72057594037927935 56 0' to reproduce.
Consider the following here document:
Heredoc Here "EOF" False (stringToWord "here document")
If pretty printed it looks like that:
<<EOF
here documentEOF
This is clearly not the intended result. Maybe we should add a trailing newline if it is not present in the Word being rendered.
Megaparsec is more actively maintained, and will give us better error messages
It would be nice if we support Text
pretty printing in addition to simple String
s.
We could use grouping features to improve the output as well.
Personal choice: prettyprinter
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.