Comments (22)
Looks good now!
from hledger-flow.
This looks like an issue related to the concurrent processing of statements, and which I thought I had fixed.
Look at this commit message:
6822508
Does your output look similar?
You can run the command using this data to avoid sharing private information:
https://github.com/apauley/hledger-flow-example
The only unexpected version in your setup is the GHC version which is old.
Did you compile using a different stack.yaml, or specified that the system GHC should be used?
Please download the binary release that I have compiled using GHC 8.6.4 and compare:
https://github.com/apauley/hledger-flow/releases/tag/v0.11-beta
from hledger-flow.
Another thing, the example finances repo contains a submodule, so before you can run the command against the sample data you'll have to:
git submodule init
git submodule update
https://github.com/apauley/hledger-flow#after-cloning-this-repository
from hledger-flow.
I'm adding an import step to CI so I can see the output on more machines.
Here is an example build (look for the hledger-flow import
step):
https://circleci.com/gh/apauley/hledger-flow/56
Looks OK to me, I haven't been able to reproduce the issue.
Waiting for @lestephane to provide more info, e.g. if this happens when compiled using GHC 8.6.4
from hledger-flow.
It is unreproducible with my version of hledger-flow (the one with the old ghc) and your examples repo.
I'm still stumped.
from hledger-flow.
I'm using the GHC from the ubuntu 16.04 repo. Wouldn't want to mess with that if I can help it.
from hledger-flow.
Look at this commit message:
6822508
The mangled output does indeed look similar
from hledger-flow.
Your examples repo only deals with one financial institution. I have two.
Could it be that you've fixed the concurrency problem for the one-institution+multiple-statements case, but that the problem still exists for the multiple-institutions+multiple-statements case?
from hledger-flow.
I'm using the GHC from the ubuntu 16.04 repo. Wouldn't want to mess with that if I can help it.
Stack uses its own version of GHC that doesn't interfere with the system GHC. It is specified by the resolver in stack.yaml, so unless you're modifying the build instructions, stack should compile with GHC 8.6.4 which it downloaded itself.
You can check the GHC version by running:
stack exec -- ghc --version
from hledger-flow.
It is unreproducible with my version of hledger-flow (the one with the old ghc) and your examples repo.
And if you run it with the version that I compiled on your personal data?
Could the mangled output be the output from the preprocess
scripts?
What do you use for those scripts?
hledger-flow
converts the input in parallel, so the preprocess
scripts and other commands will be running concurrently.
I haven't seen mangled output coming from these external scripts myself, but I guess it is still possible.
The example repo uses a Python preprocess script.
from hledger-flow.
Your examples repo only deals with one financial institution. I have two.
Could it be that you've fixed the concurrency problem for the one-institution+multiple-statements case, but that the problem still exists for the multiple-institutions+multiple-statements case?
I don't think this is the issue - I'm also running hledger-flow
against data with multiple owners, each with multiple institutions and accounts over many years, and I don't see it there.
from hledger-flow.
If we go with the theory that hledger-flow
's parallel processing doesn't always play well with external script output, then there are some options we can consider.
At the moment hledger-flow
doesn't do anything with the external script output, it leaves that entirely to those scripts. This is nice in the sense that users will see the output from those scripts as it happens.
But we could also change it and let hledger-flow
capture the output, and then produce the output only after the script exited.
@lestephane I'd like to reproduce the issue on my end, so I'm keen to know more about what you use for preprocess
scripts
from hledger-flow.
For the preprocess script I was using python (a script i admittedly don't need if I correctly use .rules)
#!/usr/bin/env python3
import sys, csv
import os.path
from datetime import datetime
def sanitize(infile, outfile):
csv_reader = csv.reader(infile, dialect='excel', delimiter=';')
csv_writer = csv.writer(outfile, dialect='unix', delimiter=',')
seen_first_row = False
for row in csv_reader:
if not seen_first_row:
seen_first_row = True
csv_writer.writerow(row)
continue
date = row[0]
when = datetime.strptime(date, "%Y %B %d")
row[0] = datetime.strftime(when, "%Y-%m-%d")
csv_writer.writerow(row)
infile.close()
outfile.close()
def parse_args(args):
if len(args) < 2:
usage = "USAGE: {0} statement.csv [output.csv]".format(os.path.basename(args[0]))
print(usage, file=sys.stderr)
sys.exit(1)
else:
infile = open(args[1])
if len(args) == 2 or args[2] == '-':
outfile = sys.stdout
else:
outfile = open(args[2], 'w')
return infile, outfile
if __name__ == '__main__':
inputfile, outputfile = parse_args(sys.argv)
sanitize(inputfile, outputfile)
from hledger-flow.
Did your concurrency fix also cover standard error (STDERR) mangling, not just standard output (STDOUT)?
from hledger-flow.
Did your concurrency fix also cover standard error (STDERR) mangling, not just standard output (STDOUT)?
It handles both stdout and stderr, but only its own output, not for external scripts:
3d28857#diff-0eb3276dd6fe22479e50626f9d79774fR72
I'll try out some options to make this better in a few days.
The one option is capturing external script stdout and stderr and making it part of hledger-flow
output.
Another option is to add a command-line switch that disables parallel processing, to help with debugging issues like these - issue #12
from hledger-flow.
I'm rather clueless about Haskell, but if indeed the scripts output is just forwarded as-is to the terminal without any line buffering, it would indeed cause the problem. A command-line switch to disable concurrency would work for me.
from hledger-flow.
@lestephane I've released version 0.11.1 that has a sequential flag:
https://github.com/apauley/hledger-flow/releases/tag/v0.11.1.0-beta
Can you test if this solves the mangled output issue?
This is what it looks like on the example data:
$ hledger-flow import --sequential --verbose
2019-04-14 22:38:52.755648259 SAST hledger-flow Starting import
Collecting input files...
Found 3 input files in 0.106195961s. Proceeding with import...
2019-04-14 22:38:52.86193147 SAST hledger-flow Found a preprocess file at 'import/gawie/bogart/cheque/preprocess'
2019-04-14 22:38:52.86199349 SAST hledger-flow Begin: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-04-28.csv'
2019-04-14 22:38:52.886898512 SAST hledger-flow End: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-04-28.csv' ExitSuccess (0.024896382s)
2019-04-14 22:38:52.886963452 SAST hledger-flow Did not find a construct file at 'import/gawie/bogart/cheque/construct'
2019-04-14 22:38:52.887060572 SAST hledger-flow Begin: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-04-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules'
2019-04-14 22:38:53.059614055 SAST hledger-flow End: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-04-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules' ExitSuccess (0.172543803s)
2019-04-14 22:38:53.059679085 SAST hledger-flow Found a preprocess file at 'import/gawie/bogart/cheque/preprocess'
2019-04-14 22:38:53.059721615 SAST hledger-flow Begin: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-05-28.csv'
2019-04-14 22:38:53.083864217 SAST hledger-flow End: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-05-28.csv' ExitSuccess (0.024135742s)
2019-04-14 22:38:53.083935607 SAST hledger-flow Did not find a construct file at 'import/gawie/bogart/cheque/construct'
2019-04-14 22:38:53.084022167 SAST hledger-flow Begin: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-05-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules'
2019-04-14 22:38:53.261712753 SAST hledger-flow End: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-05-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules' ExitSuccess (0.177682536s)
2019-04-14 22:38:53.261788333 SAST hledger-flow Found a preprocess file at 'import/gawie/bogart/cheque/preprocess'
2019-04-14 22:38:53.261828203 SAST hledger-flow Begin: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-03-30.csv'
2019-04-14 22:38:53.286723964 SAST hledger-flow End: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-03-30.csv' ExitSuccess (0.024890661s)
2019-04-14 22:38:53.286792934 SAST hledger-flow Did not find a construct file at 'import/gawie/bogart/cheque/construct'
2019-04-14 22:38:53.286883454 SAST hledger-flow Begin: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-03-30.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules'
2019-04-14 22:38:53.459486947 SAST hledger-flow End: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-03-30.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules' ExitSuccess (0.172595953s)
2019-04-14 22:38:53.459847977 SAST hledger-flow Looking for possible extra include files for 'import/gawie/bogart/cheque/2016-include.journal' among these 2 options: ["import/gawie/bogart/cheque/2016-opening.journal","import/gawie/bogart/cheque/_manual_/2016/pre-import.journal"]. Found 1: ["import/gawie/bogart/cheque/2016-opening.journal"]
2019-04-14 22:38:53.459891177 SAST hledger-flow Looking for possible extra include files for 'import/gawie/bogart/cheque/2016-include.journal' among these 2 options: ["import/gawie/bogart/cheque/2016-closing.journal","import/gawie/bogart/cheque/_manual_/2016/post-import.journal"]. Found 0: []
2019-04-14 22:38:53.460122558 SAST hledger-flow Looking for possible extra include files for 'import/gawie/bogart/2016-include.journal' among these 2 options: ["import/gawie/bogart/2016-opening.journal","import/gawie/bogart/_manual_/2016/pre-import.journal"]. Found 0: []
2019-04-14 22:38:53.460160128 SAST hledger-flow Looking for possible extra include files for 'import/gawie/bogart/2016-include.journal' among these 2 options: ["import/gawie/bogart/2016-closing.journal","import/gawie/bogart/_manual_/2016/post-import.journal"]. Found 0: []
2019-04-14 22:38:53.460367028 SAST hledger-flow Looking for possible extra include files for 'import/gawie/2016-include.journal' among these 2 options: ["import/gawie/2016-opening.journal","import/gawie/_manual_/2016/pre-import.journal"]. Found 0: []
2019-04-14 22:38:53.460392258 SAST hledger-flow Looking for possible extra include files for 'import/gawie/2016-include.journal' among these 2 options: ["import/gawie/2016-closing.journal","import/gawie/_manual_/2016/post-import.journal"]. Found 0: []
2019-04-14 22:38:53.460578048 SAST hledger-flow Looking for possible extra include files for 'import/2016-include.journal' among these 2 options: ["import/2016-opening.journal","import/_manual_/2016/pre-import.journal"]. Found 0: []
2019-04-14 22:38:53.460770778 SAST hledger-flow Looking for possible extra include files for 'import/2016-include.journal' among these 2 options: ["import/2016-closing.journal","import/_manual_/2016/post-import.journal"]. Found 0: []
Imported 3 journals in 0.705221059s
See how the Begin:
s and End:
s are grouped.
With parallel processing all the Begin:
s are shown together at the same time, and the End:
s only later when they finish.
from hledger-flow.
I'll use the --sequential
flag all the time. To save typing, I use a wrapper
$ cat $(which hlflow)
#!/usr/bin/env bash
set -euxo pipefail
if echo "$@" | awk "/import/{exit 1}"; then
hledger-flow "$@"
else
hledger-flow "$@" --sequential
fi
I'm experiencing some other usability issues which I need to report before confirming
from hledger-flow.
--sequential
helps, thanks
from hledger-flow.
@lestephane Can you test if the latest release has any effect on the mangled output?
Especially when parallel processing is enabled.
I saw there were some output statements that didn't use the concurrency-safe functions, so I changed them.
If this doesn't make it better then I'll continue with capturing the output of the scripts.
https://github.com/apauley/hledger-flow/releases/tag/v0.11.1.1-beta
from hledger-flow.
@lestephane Just out of interested, hledger-flow
now captures the output of external processes, and outputs it safely after the process exited. But only when doing parallel processing - when doing sequential processing it lets the external process display its own output as it runs.
I'm keen to know if you see any more mangled output, especially when importing in parallel.
If you do see anything, please re-open this issue. Thanks!
from hledger-flow.
No problem, I haven't seen any mangled output so far. Running in parallel from now on
from hledger-flow.
Related Issues (20)
- File-specific rules HOT 5
- hledger-flow does not 'see' _manual_ year subdirectory if there is no corresponding 1-in subdirectory HOT 1
- QUESTION: how to break up a transaction/payment? HOT 12
- If I delete a file in a `1-in` directory, re-running `hledger-flow import` does not remove the corresponding files in the `2-preprocessed` and `3-journal` directories HOT 4
- Missing version bound on turtle breaks build HOT 3
- Have a way to use `--cost` option for income-expense reports HOT 1
- Documentation on workflow HOT 8
- `hledger-flow` reports empty for user sub-accounts (due to missing `directives.journal` at lower levels) HOT 3
- (docs) unclear what to do if starting balance is not 0 HOT 6
- Where to put account declarations and prices?
- Support for Apple Silicon (aarch64-darwin) HOT 3
- hackage doesn't have the 0.15 release
- Windows: the preprocess and construct scripts are not executed HOT 1
- QUESTION: tags, reports, multiple contributors, virtual accounts, how to do it simply? HOT 2
- when preprocess is called with a $1 that has a .timeclock extension, $2 has a .csv extension HOT 3
- 3-journal/ files not ending in ".journal" extension are added to yearly include files HOT 6
- Make it possible to configure the number of cores being used (the default is to use all cores, which slows down the machine) HOT 18
- hledger-flow does not 'see' hledger despite it being present in the PATH as a symlinked executable HOT 3
- cabal install error: Not in scope: type constructor 'Rel' HOT 2
- Question: where to include "meta" statements (`account...`, `commodity format` & `alias`) & prices? HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hledger-flow.