Code Monkey home page Code Monkey logo

Comments (11)

dicktyr avatar dicktyr commented on August 18, 2024 1

a quick test runs as expected
(I'm not able to test more thoroughly at this time)

I noticed this bug in an automated script
as I use ksh extensively in my own projects
(actually in preference to other languages)
so I'm very grateful for your work on ksh

thank you!

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

Yikes. Thanks for the report. Will look into it ASAP.

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

Bug introduced by commit 6d76174 (dev), 4361b6a (1.0).

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024
$ cd /
$ unset x
$ ls -d {b,d}"*"
ls: b*: No such file or directory
ls: d*: No such file or directory
$ ls -d $x{b,d}"*"
bin dev
$ ls -d $x"*"{n,r}
bin  sbin usr  var
$ ls -d "*"$x{n,r}   
bin  sbin usr  var
$ ls -d $x"*"     
ls: *: No such file or directory

The nature of the bug seems to be that, if both a variable expansion (quoted or not, set or not) and a brace expansion are used, quoted wildcards are subject to expansion.

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

The closest I've been able to come to fixing this so far is the patch below. All the above reproducers act correctly with this patch and all the current regression tests pass.

diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..8717b62e7 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2583,7 +2583,7 @@ static void endfield(Mac_t *mp,int split)
 		mp->atmode = 0;
 		if(mp->patfound)
 		{
-			int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+			int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
 			sh.argaddr = 0;
 #if SHOPT_BRACEPAT
 			/* in POSIX mode, disallow brace expansion for unquoted expansions */

Unfortunately the patch also reintroduces #549 for cases where any part of the word is quoted -- even if the quoted part is empty. For example:

$ mkdir testdir
$ cd testdir
$ touch a b c ./-
$ p='[a\-c]'
$ echo $p  # OK
- a c
$ echo ""$p  # BUG
a b c
$ echo $p""  # BUG
a b c

The fundamental problem is that, by the time endfield() is reached, we have a complete word and we can no longer distinguish between the quoted and unquoted parts of it. The Mac_t flags apply to the entire word.

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

Patch version two. In mac_copy(), don't internally backslash-escape a backslash in a glob pattern bracket expression. This also fixes the #549 regression reintroduced by the previous patch, at least for standard glob patterns. For this, we need to use the bracketexpr flag (introduced as a copyto() local variable in 6c73c8c) in mac_copy(), so we move it to the Mac_t struct, making it globally accessible. Initialisation is automatic. Please test this.

Patch v2
diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..29ac26aed 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -76,6 +76,7 @@ typedef struct  _mac_
 	char		macsub;		/* set to 1 when running mac_substitute */
 	int		dotdot;		/* set for .. in subscript */
 	void		*nvwalk;	/* for name space walking */
+	char		bracketexpr; 	/* set when in [brackets] within a non-ERE glob pattern */
 } Mac_t;
 
 #undef ESCAPE
@@ -437,7 +438,6 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 	char		oldquote = mp->quote;	/* save "double quoted" state */
 	char		ansi_c = 0;		/* set when processing ANSI C escape codes */
 	int32_t		ere = 0;		/* bitmask of pattern options indicating an extended regular expression */
-	char		bracketexpr = 0; 	/* set when in [brackets] within a non-ERE glob pattern */
 	Sfio_t		*sp = mp->sp;
 	Stk_t		*stkp = sh.stk;
 	char		*resume = 0;
@@ -533,7 +533,7 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 			if(mp->pattern)
 			{
 				/* preserve \ for escaping glob pattern bracket expression operators */
-				if(bracketexpr && n==S_BRAOP)
+				if(mp->bracketexpr && n==S_BRAOP)
 					break;
 				/* preserve \digit for pattern matching */
 				/* also \alpha for extended patterns */
@@ -636,8 +636,8 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 				mp->pattern = c;
 			break;
 		    case S_ENDCH:
-			if(bracketexpr && cp[-1]==RBRACT && !(mp->quote || mp->lit))
-				bracketexpr--;
+			if(mp->bracketexpr && cp[-1]==RBRACT && !(mp->quote || mp->lit))
+				mp->bracketexpr--;
 			if((mp->lit || cp[-1]!=endch || mp->quote!=newquote))
 				goto pattern;
 			if(endch==RBRACE && mp->pattern && brace)
@@ -738,12 +738,12 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 				cp = first = fcseek(0);
 				break;
 			}
-			if(mp->pattern==1 && !ere && !bracketexpr)
+			if(mp->pattern==1 && !ere && !mp->bracketexpr)
 			{
-				bracketexpr++;
+				mp->bracketexpr++;
 				/* a ] following [, as in []abc], should not close the bracket expression */
 				if(cp[0]==RBRACT && cp[1])
-					bracketexpr++;
+					mp->bracketexpr++;
 			}
 			/* FALLTHROUGH */
 		    case S_PAT:
@@ -883,7 +883,7 @@ static void copyto(Mac_t *mp,int endch, int newquote)
 			break;
 		    case S_BRAOP:
 			/* escape a quoted !^- within a bracket expression */
-			if(!bracketexpr || !(mp->quote || mp->lit))
+			if(!mp->bracketexpr || !(mp->quote || mp->lit))
 				continue;
 			if(c)
 				sfwrite(stkp,first,c);
@@ -2481,7 +2481,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				continue;
 			}
 			if(n==S_ESC)
-				sfputc(stkp,ESCAPE);
+			{
+				if(!mp->bracketexpr)
+					sfputc(stkp,ESCAPE);
+			}
 			else if(n==S_EPAT)
 			{
 				/* don't allow extended patterns in this case */
@@ -2583,7 +2586,7 @@ static void endfield(Mac_t *mp,int split)
 		mp->atmode = 0;
 		if(mp->patfound)
 		{
-			int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+			int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
 			sh.argaddr = 0;
 #if SHOPT_BRACEPAT
 			/* in POSIX mode, disallow brace expansion for unquoted expansions */

edit: It's broken, do not apply

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

It still isn't right. With patch v2 (and before):

$ touch '\'
$ pat='[x&y]'
$ echo $pat
\

That pattern should not resolve to the file \; the pattern didn't even contain a backslash.

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

That is just going to have to be a separate bug. And I'm not sure it will ever be fixed without redesigning the entire split/glob mechanism from the ground up.

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

Patch version two. In mac_copy(), don't internally backslash-escape a backslash in a glob pattern bracket expression. This also fixes the #549 regression reintroduced by the previous patch, at least for standard glob patterns.

Argh. No it doesn't fix those. I don't know what made me think that. I must have been testing the wrong binary.

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

So, patch v2 is total nonsense. We cannot use the bracketexpr flag from copyto() in mac_copy(), becasue mac_copy() is called at S_EOF, meaning at the end of processing.

I have a new patch where mac_copy() does its own detection of a bracket expression, making that fix work:

Patch v3 (also broken, do not apply)
diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..9d19df04a 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2451,6 +2451,7 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 	}
 	else if(!mp->quote && mp->split && (mp->ifs||mp->pattern))
 	{
+		char	bracketexpr = 0; 	/* set when in [brackets] */
 		/* split words at ifs characters */
 		state = sh.ifstable;
 		if(mp->pattern)
@@ -2467,6 +2468,8 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				if(state[c]==0)
 					state[c] = S_PAT;
 			}
+			if(state[RBRACT]==0 && state[LBRACT]==S_PAT)
+				state[RBRACT] = S_ENDCH;
 			if(state[ESCAPE]==0)
 				state[ESCAPE] = S_ESC;
 		}
@@ -2481,7 +2484,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				continue;
 			}
 			if(n==S_ESC)
-				sfputc(stkp,ESCAPE);
+			{
+				if(!bracketexpr)
+					sfputc(stkp,ESCAPE);
+			}
 			else if(n==S_EPAT)
 			{
 				/* don't allow extended patterns in this case */
@@ -2491,7 +2497,21 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 					mp->noextpat = 1;
 			}
 			else if(n==S_PAT)
+			{
 				mp->patfound = mp->pattern;
+				if(!bracketexpr && mp->pattern)
+				{
+					bracketexpr++;
+					/* a ] following [, as in []abc], should not close the bracket expression */
+					if(cp[0]==RBRACT && cp[1])
+						bracketexpr++;
+				}
+			}
+			else if(n==S_ENDCH)
+			{
+				if(bracketexpr)
+					bracketexpr--;
+			}
 			else if(n && mp->ifs)
 			{
 				if(mbwide() && n==S_MBYTE)
@@ -2557,8 +2577,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
 				if(state[c]==S_PAT)
 					state[c] = 0;
 			}
-			if(sh.ifstable[ESCAPE]==S_ESC)
-				sh.ifstable[ESCAPE] = 0;
+			if(state[RBRACT]==S_ENDCH)
+				state[RBRACT] = 0;
+			if(state[ESCAPE]==S_ESC)
+				state[ESCAPE] = 0;
 		}
 	}
 	else
@@ -2583,7 +2605,7 @@ static void endfield(Mac_t *mp,int split)
 		mp->atmode = 0;
 		if(mp->patfound)
 		{
-			int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+			int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
 			sh.argaddr = 0;
 #if SHOPT_BRACEPAT
 			/* in POSIX mode, disallow brace expansion for unquoted expansions */
diff --git a/src/cmd/ksh93/tests/glob.sh b/src/cmd/ksh93/tests/glob.sh
index ae4a8e101..d95eeae42 100755
--- a/src/cmd/ksh93/tests/glob.sh
+++ b/src/cmd/ksh93/tests/glob.sh
@@ -471,6 +471,8 @@ test_glob '<~(K)[]-z]>' ~(K)["]-z"]
 unquoted_patvar='[\!N]'; test_glob '<[\!N]>' $unquoted_patvar
 unquoted_patvar='[\^N]'; test_glob '<[\^N]>' $unquoted_patvar
 unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' $unquoted_patvar
+unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' ""$unquoted_patvar
+unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' $unquoted_patvar""
 : > -; test_glob '<->' $unquoted_patvar
 : > a > c; test_glob '<-> <a> <c>' $unquoted_patvar
 : > !; unquoted_patvar='[\!N]'; test_glob '<!>' $unquoted_patvar

But that doesn't fix the #549 regression reintroduced by patch v1 either, it just breaks it in the opposite way:

$ mkdir testdir
$ cd testdir
$ touch a b c ./-
$ p='[a\-c]'
$ echo $p      # BUG
a b c
$ echo ""$p    # OK
- a c
$ echo $p""    # OK
- a c

And, as a bonus, we get a lot of regression test failures as well:

test glob begins at 2023-06-13+13:12:12
	glob.sh[477]: FAIL: glob -- expected '<[\!N]>', got '<\> <b>'
	glob.sh[478]: FAIL: glob -- expected '<[\^N]>', got '<\> <b>'
	glob.sh[479]: FAIL: glob -- expected '<[a\-c]>', got '<b>'
	glob.sh[480]: FAIL: glob -- expected '<[a\-c]>', got '<[a-c]>'
	glob.sh[481]: FAIL: glob -- expected '<[a\-c]>', got '<[a-c]>'
	glob.sh[482]: FAIL: glob -- expected '<->', got '<b>'
	glob.sh[483]: FAIL: glob -- expected '<-> <a> <c>', got '<a> <b> <c>'
	glob.sh[484]: FAIL: glob -- expected '<!>', got '<!> <-> <\> <a> <b> <c>'
	glob.sh[485]: FAIL: glob -- expected '<^>', got '<!> <-> <\> <^> <a> <b> <c>'
	glob.sh[489]: FAIL: glob -- expected '<~(S)[\!N]>', got '<\> <b>'
	glob.sh[490]: FAIL: glob -- expected '<~(S)[\^N]>', got '<\> <b>'
	glob.sh[491]: FAIL: glob -- expected '<~(S)[a\-c]>', got '<b>'
	glob.sh[492]: FAIL: glob -- expected '<->', got '<b>'
	glob.sh[493]: FAIL: glob -- expected '<-> <a> <c>', got '<a> <b> <c>'
	glob.sh[494]: FAIL: glob -- expected '<!>', got '<!> <-> <\> <a> <b> <c>'
	glob.sh[495]: FAIL: glob -- expected '<^>', got '<!> <-> <\> <^> <a> <b> <c>'
	glob.sh[497]: FAIL: glob -- expected '<~(K)[\!N]>', got '<\> <b>'
	glob.sh[498]: FAIL: glob -- expected '<~(K)[\^N]>', got '<\> <b>'
	glob.sh[499]: FAIL: glob -- expected '<~(K)[a\-c]>', got '<b>'
	glob.sh[500]: FAIL: glob -- expected '<->', got '<b>'
	glob.sh[501]: FAIL: glob -- expected '<-> <a> <c>', got '<a> <b> <c>'
	glob.sh[502]: FAIL: glob -- expected '<!>', got '<!> <-> <\> <a> <b> <c>'
	glob.sh[503]: FAIL: glob -- expected '<^>', got '<!> <-> <\> <^> <a> <b> <c>'
test glob failed at 2023-06-13+13:12:12 with exit code 23 [ 254 tests 23 errors ]
test quoting begins at 2023-06-13+13:13:15
	quoting.sh[53]: FAIL: nested $(print -r - '') differs from ''
	quoting.sh[56]: FAIL: "nested $(print -r - '')" differs from ''
test quoting failed at 2023-06-13+13:13:16 with exit code 2 [ 96 tests 2 errors ]

from ksh.

McDutchie avatar McDutchie commented on August 18, 2024

As far as I can tell, the "design" of the split/glob mechanism, which is decades of kludge upon kludge upon kludge etc., is simply incompatible with correct operation. Any change in the internal backslash escaping mechanism will introduce regressions. At least I sure don't know how to fix things. And they never did get it right at AT&T. This is simply hopeless. I'm throwing in the towel.

The first, simple patch above at least improves things, and fixes the unacceptable regression that @dicktyr reported. So I'm just going to commit it, make an urgent 1.0.6 release, and hope there aren't any other serious issues left to be discovered.

And we're just going to have to live with those corner case bugs involving backslashes. I cannot fix them. A comprehensive redesign is needed where that internal backslash escaping mechanism is replaced by something completely different. But that is a pipe dream, at least for the time being.

from ksh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.