Comments (7)
This is also reproducible with in-tree input as well, for instance, with test/enc/enc001.txt
. It means that make test
fails on all Linux platforms I tried.
I meant to take a better look at this, and was tracking it (and have some extra info) here:
https://trello.com/c/EgR0JZaK/1180-pict-broken-with-utf-16-possibly-others
from pict.
An update here. I quickly debugged this, and found that the reading of lines from an unicode encoded file is broken:
diff --git a/cli/mparser.cpp b/cli/mparser.cpp
index 4e12245..46ba5f3 100644
--- a/cli/mparser.cpp
+++ b/cli/mparser.cpp
@@ -107,6 +107,7 @@ bool readLineFromFile( wifstream& file, wstring& line )
if( file.eof()
|| c == L'\n'
|| c == 0 ) return( true );
+ // execution never gets here, no line is ever read
line += c;
}
And this seems to be where the execution loops forever. I'll follow up with a fix.
from pict.
On Linux, I've traced this all the way to libstdc++. The file.get()
call ultimately gets to the buffer sbumpc()
. There, the current buffer position (gptr()
) is always equal to the the end of buffer pointer (epgtr()
. A snippet from where this is checked:
/**
* @brief Getting the next character.
* @return The next character, or eof.
*
* If the input read position is available, returns that character
* and increments the read pointer, otherwise calls and returns
* @c uflow().
*/
int_type
sbumpc()
{
int_type __ret;
if (__builtin_expect(this->gptr() < this->egptr(), true))
{
__ret = traits_type::to_int_type(*this->gptr());
this->gbump(1);
}
else
__ret = this->uflow();
return __ret;
}
At this point, I imagine the libstdc++ code is not that naive and buggy, so I tend to believe that something else is required for the buffer to operate correctly. I've briefly looked at imbue
and locales regarding the stream, but I really need a better grasp on the fundamentals here.
from pict.
Thanks for trying to get to the bottom of this. Really appreciate it.
from pict.
Some news here: wifstreams do operate differently when a locale is set via imbue
. With a simple hack such as this:
$ git diff
diff --git a/cli/mparser.cpp b/cli/mparser.cpp
index 4e12245..43bf28e 100644
--- a/cli/mparser.cpp
+++ b/cli/mparser.cpp
@@ -1,5 +1,6 @@
#include <fstream>
#include <sstream>
+#include <locale>
#include "model.h"
using namespace std;
@@ -436,6 +437,8 @@ bool CModelData::readModel( const wstring& filePath )
return( false );
}
+ locale loc(locale("en_US.UTF-8"));
+ file.imbue(loc);
wstring line;
// read definition of parameters
pict can then parse the model on the test/enc/enc003.txt
file:
$ ./pict test/enc/enc003.txt
A B C
a 1 x
a 3 y
c 1 z
b 1 y
c 2 x
b 2 z
a 2 y
b 3 x
a 3 z
c 3 y
But this "correct" operation is dependent on the content of the file being read (which in this case is UTF-8 with a BOM). Based on my experiments, at least on GNU/Linux, wifstream does not seem to be a good framework for building a getEncodingType()
function. ifstream (or even fopen(), open(), etc) on the other hand, are immune to locales, BOMs, etc.
To summarize it, it looks like this will need more than a quick hack.
from pict.
Thanks for looking into this Cleber. I poked around a bit as well. The diff between platforms is annoying. If you ever wondered why readLineFromFile reads each character in a loop, it is because long time ago getline() was behaving differently on Windows and MacOS and getting text char-by-char was the least common denominator that worked. I suppose getline isn't quite working these days either.
The reliance on BOMs is a partial solution at best. It might be time to start passing the input locale/encoding to PICT explicitly as a param:
pict.exe model.txt -l "en_US.UTF-8"
and use that for anything other than ANSI or files with the couple of already supported BOMs.
from pict.
I think this issue fixed by #60.
u8_rus.txt works well with #60 .
from pict.
Related Issues (20)
- function pickValue Assert Fail. when Java Invoke pict.class
- README: "Excluding other parameters" section: "using up" combinations HOT 8
- dylib invoke crash due to fatal error.
- Multiple value lines for the same parameter HOT 4
- Does the api-usage have full functionality as cli? HOT 2
- CLI taking long and never complete the execution
- Will the constraints grammar support mathematical expressions(e.g. +, -, *, /) in the future? HOT 1
- issue 1
- issue 2
- [Feature] Support exclusion/inclusion of parameters based on values
- New updates aren't published properly
- CMake build system for PICT?
- Support for clang using -Wall HOT 1
- Release 3.7.4 to publish vcpkg port HOT 2
- how to update profile?
- Does not build on macOS HOT 5
- This repo is missing important files
- make error on Ubuntu 5.4.0 HOT 1
- Documentation on sub-models HOT 9
- support the arm64 platform? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pict.