Code Monkey home page Code Monkey logo

Comments (3)

kelson42 avatar kelson42 commented on September 16, 2024

Comment by Brion:

Can this be done with the standard POSIX file APIs? Or do we need to use Win32 APIs for file access? :P

(Use of legacy 8-bit encodings in POSIX APIs is a problem for PHP, and thus MediaWiki, when running on Windows servers as well.)

Sounds like you gotta use the Win32 wide-char APIs to get Unicode filename support... at least that's what glib does on Windows for this reason -- https://developer.gnome.org/glib/2.26/glib-File-Utilities.html#glib-File-Utilities.description

from libzim.

kelson42 avatar kelson42 commented on September 16, 2024

lvandeve wrote:

Hello,

I have done a little bit of research on this (I must admit it was on a Linux computer however).

I hope this may help at least with fixing this issue.

Most File, FileImpl, zim::ifstream and zim::streambuf works with string. There is one intermediate string -> const char* -> string conversion in FileImpl, which the attached patch removes (to use string for everything). This to hopefully support the NUL-characters that may appear in the octets of UTF-16 strings put in a 8-bit char string.

In addition, the patch has a comment at the locations in fstream.cpp where probably different Windows API needs to be called.

I think that this way, file.h does not need to be changed to have wchar_t or wstring constructors.

What follows below the line is the patch:


diff --git a/zimlib/include/zim/file.h b/zimlib/include/zim/file.h
index a6ac75b..8aa5f0d 100644
--- a/zimlib/include/zim/file.h
+++ b/zimlib/include/zim/file.h
@@ -39,7 +39,7 @@ namespace zim
File()
{ }
explicit File(const std::string& fname)

  •    : impl(new FileImpl(fname.c_str()))
    
  •    : impl(new FileImpl(fname))
       { }
    
     const std::string& getFilename() const   { return impl->getFilename(); }
    

diff --git a/zimlib/include/zim/fileimpl.h b/zimlib/include/zim/fileimpl.h
index 1cf584d..ada65ee 100644
--- a/zimlib/include/zim/fileimpl.h
+++ b/zimlib/include/zim/fileimpl.h
@@ -53,7 +53,7 @@ namespace zim
offset_type getOffset(offset_type ptrOffset, size_type idx);

 public:
  •  explicit FileImpl(const char* fname);
    
  •  explicit FileImpl(const std::string& fname);
    
     time_t getMTime() const   { return zimFile.getMTime(); }
    

diff --git a/zimlib/src/fileimpl.cpp b/zimlib/src/fileimpl.cpp
index 8c072eb..d7a7f4a 100644
--- a/zimlib/src/fileimpl.cpp
+++ b/zimlib/src/fileimpl.cpp
@@ -38,7 +38,7 @@ namespace zim
//////////////////////////////////////////////////////////////////////
// FileImpl
//

  • FileImpl::FileImpl(const char* fname)
  • FileImpl::FileImpl(const std::string& fname)
    : zimFile(fname),
    direntCache(envValue("ZIM_DIRENTCACHE", DIRENT_CACHE_SIZE)),
    clusterCache(envValue("ZIM_CLUSTERCACHE", CLUSTER_CACHE_SIZE))
    diff --git a/zimlib/src/fstream.cpp b/zimlib/src/fstream.cpp
    index 5ce72f5..e4accf8 100644
    --- a/zimlib/src/fstream.cpp
    +++ b/zimlib/src/fstream.cpp
    @@ -59,6 +59,24 @@ namespace zim
    //
    streambuf::OpenfileInfo::OpenfileInfo(const std::string& fname_)
    : fname(fname_),
    +// I think regular std::string with 8-bit characters
    +// is OK up to this point, that is, in the ctor and fields of File,
    +// FileImpl, ifstream, streambuf etc... (that is, no wchar_t or wstring
    +// necessary there and no API change needed), the string just may have
    +// NULL-characters if bytes of UTF-16 are present, but std::string
    +// supports that (const char* does not).
    +// So c_str() or any C/C++ API that uses C strings should not be used at
    +// this point or anywhere before.
    +// So here a Windows-specific API that supports UTF-16 is needed for
    +// WIN32, and the fname string needs to converted here to what it
    +// needs if necessary. That is still TODO, hence the comment here.
    +// Apparently the fname received here on Windows is UTF-16 encoded with
    +// two chars per wchar_t, if used by Kiwix (the other possibility would
    +// have been that it was UTF-8 encoded and needed to be converted to
    +// UTF-16). Maybe a comment in the File constructor could also be nice
    +// to ensure all users will be consistent with this?
    +// This concludes my investigation, I hope it can help fixing the
    +// issue :)
    #ifdef HAVE_OPEN64
    fd(::open64(fname.c_str(), O_RDONLY | O_LARGEFILE | O_BINARY))
    #else
    @@ -293,6 +311,8 @@ time_t streambuf::getMTime() const
    if (mtime || files.empty())
    return mtime;

+// See comment under streambuf::OpenfileInfo::OpenfileInfo about UTF-16:
+// the same applies here as well to the c_str() call and stat.
const char* fname = files.front()->fname.c_str();

#ifdef HAVE_STAT64

from libzim.

kelson42 avatar kelson42 commented on September 16, 2024

@mgautierfr You will probably need to have a look for kiwix-desktop 2.0

from libzim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.