Here's the bug.
Let's use this example:
% printf '%s' 'a' | grep -o 'b*'
The code from earlier:
/*
* rdar://problem/86536080 - if our first match
* was 0-length, we wouldn't progress past that
* point. Incrementing nst here ensures that if
* no other pattern matches, we'll restart the
* search at one past the 0-length match and
* either make progress or end the search.
*/
if (pmatch.rm_so == pmatch.rm_eo) {
if (MB_CUR_MAX > 1) {
wchar_t wc;
int advance;
advance = mbtowc(&wc,
&pc->ln.dat[nst],
MB_CUR_MAX);
assert(advance > 0);
nst += advance;
} else {
nst++;
}
}
Here's the problem: pc->ln.dat is the string for the current line. nst is an offset into that string. Note that this code is enclosed in a loop. The first time around that loop, pc->ln.dat is "a", and nst is 0. Thus &pc->ln.dat[nst] is effectively "a". mbtowc returns 1 as we would expect.
The loop iterates, and now pc->ln.dat is still "a", but nst is 1, so &pc->ln.dat[nst] is "" (the empty string). When mbtowc is given a pointer to a null char (as we have here), it returns 0. Given that, the assertion now fails.
The problem can be state in one of two ways:
The loop should have exited early after the first iteration (or at least changed the local match state so that we don't arrive at the aforementioned code block), or
The code block should be amended so that we neither try to read at nor past the terminating null char.
For option 2, something like this -- as I have tested by compiling Apple's grep from source -- would suffice:
diff --git a/grep/util.c b/grep/util.c
index f362f97..ab3aec1 100644
--- a/grep/util.c
+++ b/grep/util.c
@@ -691,7 +691,7 @@ procline(struct parsec *pc)
#ifdef __APPLE__
/* rdar://problem/86536080 */
if (pmatch.rm_so == pmatch.rm_eo) {
- if (MB_CUR_MAX > 1) {
+ if (MB_CUR_MAX > 1 && nst < pc->ln.len) {
wchar_t wc;
int advance;
@@ -721,7 +721,7 @@ procline(struct parsec *pc)
* either make progress or end the search.
*/
if (pmatch.rm_so == pmatch.rm_eo) {
- if (MB_CUR_MAX > 1) {
+ if (MB_CUR_MAX > 1 && nst < pc->ln.len) {
wchar_t wc;
int advance;
To restate the problem: the latest grep on macOS indexes into the current string out of bounds, and the only reason the error isn't more catastrophic is because grep terminates the current line buffer with an additional null character (which is not part of the original input file), which just so happens to tickle an assert that checks how many bytes wide the current character (which is outside of the string!) is.