Code Monkey home page Code Monkey logo

gawk-windows's Introduction

This project is a fork of the well-known GNU Awk (https://www.gnu.org/software/gawk/).

The goal of the fork is to implement a fully functional native gawk interpreter for Windows, which is not inferior in capabilities to the Linux version of gawk.



Highlights of this fork.

1. Support for UTF-8 encoding.

When starting gawk, it is enough to specify the UTF-8 encoding, for example, using the "--locale=ru_RU.UTF-8" switch.
If the "--locale" switch is not specified, then the values of the environment variables LC_ALL, LC_CTYPE, LC_COLLATE, LC_MONETARY, LC_NUMERIC, LC_TIME and LANG are taken into account, just as it is done in the UNIX environment.

The "--locale" switch recognizes various forms of locale names, both standard in the UNIX environment, for example "ru_RU.UTF-8", and accepted in Windows, for example "Russian_Russia.65001".
Various combinations of the name of the same locale are possible, for example: "ru-RU.UTF8", "Russian_Russia.utf-8", "Russian_Russia.cp65001", etc.

2. Support for all unicode characters in regular expressions (when using UTF-8 encoding).

Internal gawk functions operate on 32-bit unicode characters (encoded in UTF-32).

3. Support for escaping double quotes using two consecutive double quotes in command line arguments, which allows to embed the gawk call in the cmd.exe interpreter chain of commands:

(echo.a) | gawk "{ gsub(/a/, ""&b""); print }"
ab

Note:
gawk also supports the traditional way of escaping double quotes with a backslash, like \".
But since cmd.exe does not interpret backslashes in a special way, it is easier to use two double quotes to escape a double quote:

(echo.a) | gawk "{ gsub(/a/, \"^&b\"); print }"
ab

4. Ported all 13 standard gawk extensions (dynamically loaded plugins):

- filefuncs.dll implements the functions: chdir, stat, fts
- fnmatch.dll   implements the functions: fnmatch
- inplace.dll   implements the functions: begin, end
- intdiv.dll    implements the functions: intdiv
- ordchr.dll    implements the functions: ord, chr
- readdir.dll   implements the functions: readdir
- readfile.dll  implements the functions: readfile
- revoutput.dll implements the functions: revoutput
- revtwoway.dll implements the functions: revtwoway
- rwarray.dll   implements the functions: writea, reada
- rwarray0.dll  implements the functions: writea, reada
- time.dll      implements the functions: gettimeofday, sleep
- testext.dll   is used for testing gawk API for extensions

5. Almost all tests (more than 500) are ported, with the exception of a few, very platform-dependent:

poundbang, devfd devfd1 devfd2, pty1 pty2, timeout, fork fork2, ignrcas3

6. Full support for WindowsXP (only 32-bit version of gawk, built with MinGW.org).

7. Statically linked executable files and dynamic extensions - no external dependencies, simple copying is enough for installation.



Internal changes to the codebase:

1. Avoiding the use of the platform-dependent integer type "long" (for 64-bit UNIX - "long" is 8 bytes, but for 64-bit Windows - "long" is 4 bytes). Instead of long, the awk_long_t type was introduced, whose size does not depend on the OS type.
2. The maximum possible use of unsigned types to represent non-negative values.
3. Ability to build with a C++ compiler.
4. Code cleanup - most warnings of C/C++ compilers in pedantic build mode have been eliminated or suppressed.
5. Support for 3 build environments: MinGW.org (gcc-32), mingw-64 (gcc-32/64, clang-32/64), MSVC (cl-32/64, clang-32/64).
6. Support for building with the activation of the static code analysis mode by the compiler (build.bat supports the "analyze" option).
7. The interface for dynamic extensions has been significantly redesigned - all the functions of the standard C library are taken over by the executable file, it also stores the global state of the program (for example, locale settings).



Building gawk from the sources.

1) For the build, the batch file build.bat is used, which should be launched in the cmd.exe window.

2) It is enough to install one of the following 3 build environments.

 1. MinGW.org (http://www.mingw.org/, installer - https://osdn.net/projects/mingw/downloads/68260/mingw-get-setup.exe/)

    After installing mingw-get, install gcc and mpfr:

    C:\MinGW\bin\mingw-get.exe install gcc
    C:\MinGW\bin\mingw-get.exe install mpfr

    Before building, in the cmd.exe window, you need to set the following environment variables:

    set "PATH=C:\MinGW\bin"
    set "PATH=%PATH%;%SystemRoot%\System32;%SystemRoot%;%SystemRoot%\System32\Wbem"

 2. mingw-64 (http://mingw-w64.org/doku.php, msys2 installer - https://sourceforge.net/projects/msys2/files/Base/x86_64/)

    After installing msys2, launch the msys2 window, in which we execute:

    $ pacman -Syu

    After restarting the terminal, run again:

    $ pacman -Su

    Install at least one of the 4 sets of compilers:

    $ pacman -S mingw-w64-x86_64-gcc
    $ pacman -S mingw-w64-i686-gcc
    $ pacman -S mingw-w64-x86_64-clang  mingw-w64-x86_64-lld
    $ pacman -S mingw-w64-i686-clang    mingw-w64-i686-lld

    Before building, in the cmd.exe window, you need to set the following environment variables:

    - to build a 32-bit version of gawk:

    set "PATH=C:\msys64\mingw32\bin"
    set "PATH=%PATH%;%SystemRoot%\System32;%SystemRoot%;%SystemRoot%\System32\Wbem"

    - to build a 64-bit version of gawk:

    set "PATH=C:\msys64\mingw64\bin"
    set "PATH=%PATH%;%SystemRoot%\System32;%SystemRoot%;%SystemRoot%\System32\Wbem"

 3. MSVC (https://visualstudio.microsoft.com/).

    Install Visual Studio or Build Tools for Visual Studio.

    To use the clang compiler in the Visual Studio environment, make sure it is installed, run:

    "C:\Program Files (x86)\Microsoft Visual Studio\Installer\vs_installer.exe"

    Please select
    Modify -> Individual components -> Search components -> clang -> C++ Clang Compiler for Windows

    Click the "Modify" button or, if clang is already installed, then "Close".

    Setting up the environment for the build is carried out by running the following commands in the cmd.exe window:

    for /f "tokens=1 delims==" %f in ('set ^| %SystemRoot%\System32\find.exe /i "Microsoft Visual Studio"') do set %f=
    set "PATH=%SystemRoot%\System32;%SystemRoot%;%SystemRoot%\System32\Wbem"

    to build a 32-bit version of gawk:

    - when using Build Tools for Visual Studio

    "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x86

    - when using Visual Studio Community

    "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x86

    or, to build a 64-bit version of gawk:

    - when using Build Tools for Visual Studio

    "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64

    - when using Visual Studio Community

    "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64

    To be able to use clang, additionally set environment variables:

    set LLVMAR="%VCINSTALLDIR%Tools\Llvm\bin\llvm-ar.exe"
    set CLANGMSVC="%VCINSTALLDIR%Tools\Llvm\bin\clang.exe"
    set CLANGMSVCXX="%VCINSTALLDIR%Tools\Llvm\bin\clang++.exe"

    or, to use 64-bit versions of compilers:

    set LLVMAR="%VCINSTALLDIR%Tools\Llvm\x64\bin\llvm-ar.exe"
    set CLANGMSVC="%VCINSTALLDIR%Tools\Llvm\x64\bin\clang.exe"
    set CLANGMSVCXX="%VCINSTALLDIR%Tools\Llvm\x64\bin\clang++.exe"

3) You need to download and set the paths to three 3-party projects:

 https://github.com/mbuilov/mscrtx.git
 https://github.com/mbuilov/libutf16.git
 https://github.com/mbuilov/unicode_ctype.git

 If you unzip the .zip files of projects in C:\projects

 then, before building, you need to set the following environment variables:

 set "MSCRTX=C:\projects\mscrtx-master"
 set "LIBUTF16=C:\projects\libutf16-master"
 set "UNICODE_CTYPE=C:\projects\unicode_ctype-master"

4) Now, in the gawk source directory, you need to run build.bat.

 build.bat requires the specification of build parameters - depending on the used build environment, compiler, build type, bitness, etc.
 If build.bat is called without parameters, then a brief help on the allowed parameters will be displayed.

 Examples of running build.bat:

 - for MinGW.org

   build all gcc

   to build a debug version (debugging - using C:\MinGW\bin\gdb.exe, installed by the command "C:\MinGW\bin\mingw-get.exe install gdb"):

   build all gcc c++ pedantic debug 32

 - for mingw-64

   build all clang 64

 - for MSVC

   build all cl c++ pedantic debug 32

   build all clang-msvc c++ pedantic 64

 Etc.

 The result of the build (in the "dist" directory) is a statically linked GNU Awk interpreter, for installation of which you just need to copy gawk.exe to your work computer.
 Dynamic extensions are also installed by copying them into the directory next to gawk.exe.



Running tests.

 To test gawk, standard Windows programs are sufficient, nothing additional is required.
 The only thing to do to perform optional gawk networking tests, you need to enable standard Windows components - "Simple TCP/IP Services".

 The tests themselves are supplied with the gawk source code - in the "test" directory.
 Test.bat is used to run tests.

 test.bat assumes there is a tested gawk instance in the "dist" directory, however, this location can be overridden by setting the BLD_DIST environment variable, e.g.:

 set "BLD_DIST=C:\projects\gawk-windows\dist"

 By default, running test.bat without parameters, will execute all tests.
 However, you can also specify which test suite to run.
 A list of test suites can be found by running:

 test.bat help



The author of this fork is Michael M. Builov ([email protected]).

The changes made are published under the same license as the original GNU Awk source codes - GNU GPLv3.

gawk-windows's People

Contributors

ajschorr avatar arnoldrobbins avatar azc100 avatar eli-zaretskii avatar jmgdjgpp avatar mbuilov avatar scldad avatar wb8tyw avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.