Posted
over 8 years
ago
by
Jan Palus
|
Posted
over 8 years
ago
by
Jan Palus
https://bugs.launchpad.net/~jan-palus
|
I just published the first version of git find on gh/mirabilos/git-find
for easy collaboration. The repository deliberately only contains the
script and the manual page so it can easily be merged into git.git with
complete history later, should
... [More]
they accept it. git find is MirOS licenced. It does require
a recent mksh (Update: I did start it
in POSIX sh first, but it eventually turned out to require arrays, and
I don’t know perl(1) and am not going to rewrite it in C) and some common
utility extensions to deal with NUL-separated lines (sort -z,
grep -z, git ls-tree -z); also, support for '\0'
in tr(1) and a comm(1) that does not choke on embedded NULs in lines.
To install or uninstall it, run…
$ git clone [email protected]:mirabilos/git-find.git
$ cd git-find
$ sudo ln -sf $PWD/git-find /usr/lib/git-core/
$ sudo cp git-find.1 /usr/local/share/man/man1/
… hack …
$ sudo rm /usr/lib/git-core/git-find \
/usr/local/share/man/man1/git-find.1
… then you can call it as “git find”
and look at the documentation with “git help find”, as is customary.
The idea behind this utility is to have a tool like “git grep” that
acts on the list of files known to git (and not e.g. ignored files)
to quickly search for, say, all PNG files in the repository (but not
the generated ones). “git find” acts on the index for the HEAD, i.e.
whatever commit is currently checked-out (unlike “git grep” which also
knows about “git add”ed files; fix welcome) and then offers a filter
syntax similar to find(1) to follow up: parenthesēs, ! for
negation, -a and -o for boolean are supported, as
well as -name, -regex and -wholename and
their case-insensitive variants, although regex uses grep(1) without
(or, if the global option -E is given, with) -E,
and the pattern matches use mksh(1)’s, which ignores the locale and
doesn’t do [[:alpha:]] character classes yet. On the plus
side, the output is guaranteed to be sorted; on the minus side, it
is rather wastefully using temporary files (under $TMPDIR
of course, so use of tmpfs is recommended). -print0 is the
only output option (-print being the default).
Another mode “forwards” the file list to the system find;
since it doesn’t support DOS-style response files, this only works
if the amount of files is smaller than the operating system’s limit;
this mode supports the full range (except -maxdepth) of the
system find(1) filters, e.g. -mmin -1 and -ls, but
it occurs filesystem access penalty for the entire tree and doesn’t
sort the output, but can do -ls or even -exec.
The idea here is that it can collaboratively be improved, reviewed,
fixed, etc. and then, should they agree, with the entire history, subtree-merged into git.git and shipped to the world.
Part of the development was sponsored by tarent solutions GmbH, the
rest and the entire manual page were done in my vacation. [Less]
|
I just published the first version of git find on gh/mirabilos/git-find
for easy collaboration. The repository deliberately only contains the
script and the manual page so it can easily be merged into git.git with
complete history later, should
... [More]
they accept it. git find is MirOS licenced. It does require
a recent mksh (Update: I did start it
in POSIX sh first, but it eventually turned out to require arrays, and
I don’t know perl(1) and am not going to rewrite it in C) and some common
utility extensions to deal with NUL-separated lines (sort -z,
grep -z, git ls-tree -z); also, support for '\0'
in tr(1) and a comm(1) that does not choke on embedded NULs in lines.
To install or uninstall it, run…
$ git clone [email protected]:mirabilos/git-find.git
$ cd git-find
$ sudo ln -sf $PWD/git-find /usr/lib/git-core/
$ sudo cp git-find.1 /usr/local/share/man/man1/
… hack …
$ sudo rm /usr/lib/git-core/git-find \
/usr/local/share/man/man1/git-find.1
… then you can call it as “git find”
and look at the documentation with “git help find”, as is customary.
The idea behind this utility is to have a tool like “git grep” that
acts on the list of files known to git (and not e.g. ignored files)
to quickly search for, say, all PNG files in the repository (but not
the generated ones). “git find” acts on the index for the HEAD, i.e.
whatever commit is currently checked-out (unlike “git grep” which also
knows about “git add”ed files; fix welcome) and then offers a filter
syntax similar to find(1) to follow up: parenthesēs, ! for
negation, -a and -o for boolean are supported, as
well as -name, -regex and -wholename and
their case-insensitive variants, although regex uses grep(1) without
(or, if the global option -E is given, with) -E,
and the pattern matches use mksh(1)’s, which ignores the locale and
doesn’t do [[:alpha:]] character classes yet. On the plus
side, the output is guaranteed to be sorted; on the minus side, it
is rather wastefully using temporary files (under $TMPDIR
of course, so use of tmpfs is recommended). -print0 is the
only output option (-print being the default).
Another mode “forwards” the file list to the system find;
since it doesn’t support DOS-style response files, this only works
if the amount of files is smaller than the operating system’s limit;
this mode supports the full range (except -maxdepth) of the
system find(1) filters, e.g. -mmin -1 and -ls, but
it occurs filesystem access penalty for the entire tree and doesn’t
sort the output, but can do -ls or even -exec.
The idea here is that it can collaboratively be improved, reviewed,
fixed, etc. and then, should they agree, with the entire history, subtree-merged into git.git and shipped to the world.
Part of the development was sponsored by tarent solutions GmbH, the
rest and the entire manual page were done in my vacation.
[Less]
|
I just published the first version of git find on gh/mirabilos/git-find
for easy collaboration. The repository deliberately only contains the
script and the manual page so it can easily be merged into git.git with
complete history later, should
... [More]
they accept it. git find is MirOS licenced. It does require
a recent mksh (Update: I did start it
in POSIX sh first, but it eventually turned out to require arrays, and
I don’t know perl(1) and am not going to rewrite it in C) and some common
utility extensions to deal with NUL-separated lines (sort -z,
grep -z, git ls-tree -z); also, support for '\0'
in tr(1) and a comm(1) that does not choke on embedded NULs in lines.
To install or uninstall it, run…
$ git clone [email protected]:mirabilos/git-find.git
$ cd git-find
$ sudo ln -sf $PWD/git-find /usr/lib/git-core/
$ sudo cp git-find.1 /usr/local/share/man/man1/
… hack …
$ sudo rm /usr/lib/git-core/git-find \
/usr/local/share/man/man1/git-find.1
… then you can call it as “git find”
and look at the documentation with “git help find”, as is customary.
The idea behind this utility is to have a tool like “git grep” that
acts on the list of files known to git (and not e.g. ignored files)
to quickly search for, say, all PNG files in the repository (but not
the generated ones). “git find” acts on the index for the HEAD, i.e.
whatever commit is currently checked-out (unlike “git grep” which also
knows about “git add”ed files; fix welcome) and then offers a filter
syntax similar to find(1) to follow up: parenthesēs, ! for
negation, -a and -o for boolean are supported, as
well as -name, -regex and -wholename and
their case-insensitive variants, although regex uses grep(1) without
(or, if the global option -E is given, with) -E,
and the pattern matches use mksh(1)’s, which ignores the locale and
doesn’t do [[:alpha:]] character classes yet. On the plus
side, the output is guaranteed to be sorted; on the minus side, it
is rather wastefully using temporary files (under $TMPDIR
of course, so use of tmpfs is recommended). -print0 is the
only output option (-print being the default).
Another mode “forwards” the file list to the system find;
since it doesn’t support DOS-style response files, this only works
if the amount of files is smaller than the operating system’s limit;
this mode supports the full range (except -maxdepth) of the
system find(1) filters, e.g. -mmin -1 and -ls, but
it occurs filesystem access penalty for the entire tree and doesn’t
sort the output, but can do -ls or even -exec.
The idea here is that it can collaboratively be improved, reviewed,
fixed, etc. and then, should they agree, with the entire history, subtree-merged into git.git and shipped to the world.
Part of the development was sponsored by tarent solutions GmbH, the
rest and the entire manual page were done in my vacation.
[Less]
|
I just published the first version of git find on gh/mirabilos/git-find
for easy collaboration. The repository deliberately only contains the
script and the manual page so it can easily be merged into git.git with
complete history later, should
... [More]
they accept it. git find is MirOS licenced. It does require
a recent mksh (Update: I did start it
in POSIX sh first, but it eventually turned out to require arrays, and
I don’t know perl(1) and am not going to rewrite it in C) and some common
utility extensions to deal with NUL-separated lines (sort -z,
grep -z, git ls-tree -z); also, support for '\0'
in tr(1) and a comm(1) that does not choke on embedded NULs in lines.
To install or uninstall it, run…
$ git clone [email protected]:mirabilos/git-find.git
$ cd git-find
$ sudo ln -sf $PWD/git-find /usr/lib/git-core/
$ sudo cp git-find.1 /usr/local/share/man/man1/
… hack …
$ sudo rm /usr/lib/git-core/git-find \
/usr/local/share/man/man1/git-find.1
… then you can call it as “git find”
and look at the documentation with “git help find”, as is customary.
The idea behind this utility is to have a tool like “git grep” that
acts on the list of files known to git (and not e.g. ignored files)
to quickly search for, say, all PNG files in the repository (but not
the generated ones). “git find” acts on the index for the HEAD, i.e.
whatever commit is currently checked-out (unlike “git grep” which also
knows about “git add”ed files; fix welcome) and then offers a filter
syntax similar to find(1) to follow up: parenthesēs, ! for
negation, -a and -o for boolean are supported, as
well as -name, -regex and -wholename and
their case-insensitive variants, although regex uses grep(1) without
(or, if the global option -E is given, with) -E,
and the pattern matches use mksh(1)’s, which ignores the locale and
doesn’t do [[:alpha:]] character classes yet. On the plus
side, the output is guaranteed to be sorted; on the minus side, it
is rather wastefully using temporary files (under $TMPDIR
of course, so use of tmpfs is recommended). -print0 is the
only output option (-print being the default).
Another mode “forwards” the file list to the system find;
since it doesn’t support DOS-style response files, this only works
if the amount of files is smaller than the operating system’s limit;
this mode supports the full range (except -maxdepth) of the
system find(1) filters, e.g. -mmin -1 and -ls, but
it occurs filesystem access penalty for the entire tree and doesn’t
sort the output, but can do -ls or even -exec.
The idea here is that it can collaboratively be improved, reviewed,
fixed, etc. and then, should they agree, with the entire history, subtree-merged into git.git and shipped to the world.
Part of the development was sponsored by tarent solutions GmbH, the
rest and the entire manual page were done in my vacation.
[Less]
|
izabera did make a good point in IRC the other
day for why we will need to have two locales at the very least in
MirBSD – C and C.UTF-8 (the latter being widespread
enough by now, thanks to me,
interestingly enough. He uses code which leads to
... [More]
unexpected results…
$ generate() { tr -dc "[:alnum:]" < /dev/urandom | dd bs="$len" count=1; }
$ len=10; echo $(generate 2>/dev/null)
Ut流54Ȫf
… because tr(1) was the first utility I
converted to Unicode, to explore possibilities and craft the OPTU encoding
and, thus, “流” is, indeed, an alphanumeric character.
This implies two things: we need to change MirBSD libc locale functions
back to support two charsets (and make setlocale(3) match), and mksh(1)
should implement locale tracking (to change set ±U whenever one
of the relevant parameters (${LC_ALL:-${LC_CTYPE:-${LANG:-C}}})
changes in the session; users could still set utf8-mode manually
though). For this to not break anything, we’ll have to audit scripts in
MirBSD though (usually adding export LC_ALL=C at their begin is
enough, and we need this for portable scripts anyway) and remove all
occurrences of #ifndef __MirBSD__ before setlocale(3) calls in
applications. This will take a while.
Secondly, I opened an
issue with POSIX about handling of the (deprecated, and for good reason)
`-style command substitutions. The GNU autoconf texinfo manual gives
good advice for portable shell scripts, and we all knew that foo="bar
`echo \"baz\"`" wasn’t portable due to use of more than one set of double
quotes, but my (and the yash authors’) reading of the standard (and mksh R52’s POSIX mode) make it set $foo to
bar "baz" instead of the historic bar baz now, and I wish
to get this clarified (and, possibly, the standard changed to match historic
practice, as this breaks at least the Acrobat Reader 5 start script). Nothing
has been decided yet (due to the holidays, I’m sure), but we got input from
some other people involved in shell.
So, if any #!/bin/sh scripts break or behave weirdly with R52,
you now know why. I’m waiting for an official statement.
[Less]
|
izabera did make a good point in IRC the other
day for why we will need to have two locales at the very least in
MirBSD – C and C.UTF-8 (the latter being widespread
enough by now, thanks to me,
interestingly enough. He uses code which leads to
... [More]
unexpected results…
$ generate() { tr -dc "[:alnum:]" < /dev/urandom | dd bs="$len" count=1; }
$ len=10; echo $(generate 2>/dev/null)
Ut流54Ȫf
… because tr(1) was the first utility I
converted to Unicode, to explore possibilities and craft the OPTU encoding
and, thus, “流” is, indeed, an alphanumeric character.
This implies two things: we need to change MirBSD libc locale functions
back to support two charsets (and make setlocale(3) match), and mksh(1)
should implement locale tracking (to change set ±U whenever one
of the relevant parameters (${LC_ALL:-${LC_CTYPE:-${LANG:-C}}})
changes in the session; users could still set utf8-mode manually
though). For this to not break anything, we’ll have to audit scripts in
MirBSD though (usually adding export LC_ALL=C at their begin is
enough, and we need this for portable scripts anyway) and remove all
occurrences of #ifndef __MirBSD__ before setlocale(3) calls in
applications. This will take a while.
Secondly, I opened an
issue with POSIX about handling of the (deprecated, and for good reason)
`-style command substitutions. The GNU autoconf texinfo manual gives
good advice for portable shell scripts, and we all knew that foo="bar
`echo \"baz\"`" wasn’t portable due to use of more than one set of double
quotes, but my (and the yash authors’) reading of the standard (and mksh R52’s POSIX mode) make it set $foo to
bar "baz" instead of the historic bar baz now, and I wish
to get this clarified (and, possibly, the standard changed to match historic
practice, as this breaks at least the Acrobat Reader 5 start script). Nothing
has been decided yet (due to the holidays, I’m sure), but we got input from
some other people involved in shell.
So, if any #!/bin/sh scripts break or behave weirdly with R52,
you now know why. I’m waiting for an official statement.
[Less]
|
izabera did make a good point in IRC the other
day for why we will need to have two locales at the very least in
MirBSD – C and C.UTF-8 (the latter being widespread
enough by now, thanks to me,
interestingly enough. He uses code which leads to
... [More]
unexpected results…
$ generate() { tr -dc "[:alnum:]" < /dev/urandom | dd bs="$len" count=1; }
$ len=10; echo $(generate 2>/dev/null)
Ut流54Ȫf
… because tr(1) was the first utility I
converted to Unicode, to explore possibilities and craft the OPTU encoding
and, thus, “流” is, indeed, an alphanumeric character.
This implies two things: we need to change MirBSD libc locale functions
back to support two charsets (and make setlocale(3) match), and mksh(1)
should implement locale tracking (to change set ±U whenever one
of the relevant parameters (${LC_ALL:-${LC_CTYPE:-${LANG:-C}}})
changes in the session; users could still set utf8-mode manually
though). For this to not break anything, we’ll have to audit scripts in
MirBSD though (usually adding export LC_ALL=C at their begin is
enough, and we need this for portable scripts anyway) and remove all
occurrences of #ifndef __MirBSD__ before setlocale(3) calls in
applications. This will take a while.
Secondly, I opened an
issue with POSIX about handling of the (deprecated, and for good reason)
`-style command substitutions. The GNU autoconf texinfo manual gives
good advice for portable shell scripts, and we all knew that foo="bar
`echo \"baz\"`" wasn’t portable due to use of more than one set of double
quotes, but my (and the yash authors’) reading of the standard (and mksh R52’s POSIX mode) make it set $foo to
bar "baz" instead of the historic bar baz now, and I wish
to get this clarified (and, possibly, the standard changed to match historic
practice, as this breaks at least the Acrobat Reader 5 start script). Nothing
has been decided yet (due to the holidays, I’m sure), but we got input from
some other people involved in shell.
So, if any #!/bin/sh scripts break or behave weirdly with R52,
you now know why. I’m waiting for an official statement. [Less]
|
izabera did make a good point in IRC the other
day for why we will need to have two locales at the very least in
MirBSD – C and C.UTF-8 (the latter being widespread
enough by now, thanks to me,
interestingly enough. He uses code which leads to
... [More]
unexpected results…
$ generate() { tr -dc "[:alnum:]" < /dev/urandom | dd bs="$len" count=1; }
$ len=10; echo $(generate 2>/dev/null)
Ut流54Ȫf
… because tr(1) was the first utility I
converted to Unicode, to explore possibilities and craft the OPTU encoding
and, thus, “流” is, indeed, an alphanumeric character.
This implies two things: we need to change MirBSD libc locale functions
back to support two charsets (and make setlocale(3) match), and mksh(1)
should implement locale tracking (to change set ±U whenever one
of the relevant parameters (${LC_ALL:-${LC_CTYPE:-${LANG:-C}}})
changes in the session; users could still set utf8-mode manually
though). For this to not break anything, we’ll have to audit scripts in
MirBSD though (usually adding export LC_ALL=C at their begin is
enough, and we need this for portable scripts anyway) and remove all
occurrences of #ifndef __MirBSD__ before setlocale(3) calls in
applications. This will take a while.
Secondly, I opened an
issue with POSIX about handling of the (deprecated, and for good reason)
`-style command substitutions. The GNU autoconf texinfo manual gives
good advice for portable shell scripts, and we all knew that foo="bar
`echo \"baz\"`" wasn’t portable due to use of more than one set of double
quotes, but my (and the yash authors’) reading of the standard (and mksh R52’s POSIX mode) make it set $foo to
bar "baz" instead of the historic bar baz now, and I wish
to get this clarified (and, possibly, the standard changed to match historic
practice, as this breaks at least the Acrobat Reader 5 start script). Nothing
has been decided yet (due to the holidays, I’m sure), but we got input from
some other people involved in shell.
So, if any #!/bin/sh scripts break or behave weirdly with R52,
you now know why. I’m waiting for an official statement.
[Less]
|