Code Monkey home page Code Monkey logo

Comments (22)

lestephane avatar lestephane commented on June 12, 2024 1

Looks good now!

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

This looks like an issue related to the concurrent processing of statements, and which I thought I had fixed.

Look at this commit message:
6822508

Does your output look similar?

You can run the command using this data to avoid sharing private information:
https://github.com/apauley/hledger-flow-example

The only unexpected version in your setup is the GHC version which is old.
Did you compile using a different stack.yaml, or specified that the system GHC should be used?

Please download the binary release that I have compiled using GHC 8.6.4 and compare:
https://github.com/apauley/hledger-flow/releases/tag/v0.11-beta

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

Another thing, the example finances repo contains a submodule, so before you can run the command against the sample data you'll have to:

git submodule init
git submodule update

https://github.com/apauley/hledger-flow#after-cloning-this-repository

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

I'm adding an import step to CI so I can see the output on more machines.

Here is an example build (look for the hledger-flow import step):
https://circleci.com/gh/apauley/hledger-flow/56

Looks OK to me, I haven't been able to reproduce the issue.

Waiting for @lestephane to provide more info, e.g. if this happens when compiled using GHC 8.6.4

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

It is unreproducible with my version of hledger-flow (the one with the old ghc) and your examples repo.
I'm still stumped.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

I'm using the GHC from the ubuntu 16.04 repo. Wouldn't want to mess with that if I can help it.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

Look at this commit message:
6822508

The mangled output does indeed look similar

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

Your examples repo only deals with one financial institution. I have two.
Could it be that you've fixed the concurrency problem for the one-institution+multiple-statements case, but that the problem still exists for the multiple-institutions+multiple-statements case?

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

I'm using the GHC from the ubuntu 16.04 repo. Wouldn't want to mess with that if I can help it.

Stack uses its own version of GHC that doesn't interfere with the system GHC. It is specified by the resolver in stack.yaml, so unless you're modifying the build instructions, stack should compile with GHC 8.6.4 which it downloaded itself.

You can check the GHC version by running:
stack exec -- ghc --version

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

It is unreproducible with my version of hledger-flow (the one with the old ghc) and your examples repo.

And if you run it with the version that I compiled on your personal data?

Could the mangled output be the output from the preprocess scripts?
What do you use for those scripts?

hledger-flow converts the input in parallel, so the preprocess scripts and other commands will be running concurrently.
I haven't seen mangled output coming from these external scripts myself, but I guess it is still possible.

The example repo uses a Python preprocess script.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

Your examples repo only deals with one financial institution. I have two.
Could it be that you've fixed the concurrency problem for the one-institution+multiple-statements case, but that the problem still exists for the multiple-institutions+multiple-statements case?

I don't think this is the issue - I'm also running hledger-flow against data with multiple owners, each with multiple institutions and accounts over many years, and I don't see it there.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

If we go with the theory that hledger-flow's parallel processing doesn't always play well with external script output, then there are some options we can consider.

At the moment hledger-flow doesn't do anything with the external script output, it leaves that entirely to those scripts. This is nice in the sense that users will see the output from those scripts as it happens.

But we could also change it and let hledger-flow capture the output, and then produce the output only after the script exited.

@lestephane I'd like to reproduce the issue on my end, so I'm keen to know more about what you use for preprocess scripts

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

For the preprocess script I was using python (a script i admittedly don't need if I correctly use .rules)

#!/usr/bin/env python3

import sys, csv
import os.path
from datetime import datetime


def sanitize(infile, outfile):
    csv_reader = csv.reader(infile, dialect='excel', delimiter=';')
    csv_writer = csv.writer(outfile, dialect='unix', delimiter=',')

    seen_first_row = False

    for row in csv_reader:
        if not seen_first_row:
            seen_first_row = True
            csv_writer.writerow(row)
            continue

        date = row[0]
        when = datetime.strptime(date, "%Y %B %d")

        row[0] = datetime.strftime(when, "%Y-%m-%d")

        csv_writer.writerow(row)

    infile.close()
    outfile.close()


def parse_args(args):
    if len(args) < 2:
        usage = "USAGE: {0} statement.csv [output.csv]".format(os.path.basename(args[0]))
        print(usage, file=sys.stderr)
        sys.exit(1)
    else:
        infile = open(args[1])
        if len(args) == 2 or args[2] == '-':
            outfile = sys.stdout
        else:
            outfile = open(args[2], 'w')
        return infile, outfile


if __name__ == '__main__':
    inputfile, outputfile = parse_args(sys.argv)
    sanitize(inputfile, outputfile)

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

Did your concurrency fix also cover standard error (STDERR) mangling, not just standard output (STDOUT)?

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

Did your concurrency fix also cover standard error (STDERR) mangling, not just standard output (STDOUT)?

It handles both stdout and stderr, but only its own output, not for external scripts:
3d28857#diff-0eb3276dd6fe22479e50626f9d79774fR72

I'll try out some options to make this better in a few days.
The one option is capturing external script stdout and stderr and making it part of hledger-flow output.

Another option is to add a command-line switch that disables parallel processing, to help with debugging issues like these - issue #12

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

I'm rather clueless about Haskell, but if indeed the scripts output is just forwarded as-is to the terminal without any line buffering, it would indeed cause the problem. A command-line switch to disable concurrency would work for me.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@lestephane I've released version 0.11.1 that has a sequential flag:
https://github.com/apauley/hledger-flow/releases/tag/v0.11.1.0-beta

Can you test if this solves the mangled output issue?

This is what it looks like on the example data:

$ hledger-flow import --sequential --verbose
2019-04-14 22:38:52.755648259 SAST	hledger-flow Starting import
Collecting input files...
Found 3 input files in 0.106195961s. Proceeding with import...
2019-04-14 22:38:52.86193147 SAST	hledger-flow Found a preprocess file at 'import/gawie/bogart/cheque/preprocess'
2019-04-14 22:38:52.86199349 SAST	hledger-flow Begin: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-04-28.csv'
2019-04-14 22:38:52.886898512 SAST	hledger-flow End:   executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-04-28.csv' ExitSuccess (0.024896382s)
2019-04-14 22:38:52.886963452 SAST	hledger-flow Did not find a construct file at 'import/gawie/bogart/cheque/construct'
2019-04-14 22:38:52.887060572 SAST	hledger-flow Begin: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-04-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules'
2019-04-14 22:38:53.059614055 SAST	hledger-flow End:   importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-04-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules' ExitSuccess (0.172543803s)
2019-04-14 22:38:53.059679085 SAST	hledger-flow Found a preprocess file at 'import/gawie/bogart/cheque/preprocess'
2019-04-14 22:38:53.059721615 SAST	hledger-flow Begin: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-05-28.csv'
2019-04-14 22:38:53.083864217 SAST	hledger-flow End:   executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-05-28.csv' ExitSuccess (0.024135742s)
2019-04-14 22:38:53.083935607 SAST	hledger-flow Did not find a construct file at 'import/gawie/bogart/cheque/construct'
2019-04-14 22:38:53.084022167 SAST	hledger-flow Begin: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-05-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules'
2019-04-14 22:38:53.261712753 SAST	hledger-flow End:   importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-05-28.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules' ExitSuccess (0.177682536s)
2019-04-14 22:38:53.261788333 SAST	hledger-flow Found a preprocess file at 'import/gawie/bogart/cheque/preprocess'
2019-04-14 22:38:53.261828203 SAST	hledger-flow Begin: executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-03-30.csv'
2019-04-14 22:38:53.286723964 SAST	hledger-flow End:   executing 'import/gawie/bogart/cheque/preprocess' on 'import/gawie/bogart/cheque/1-in/2016/123456789_2016-03-30.csv' ExitSuccess (0.024890661s)
2019-04-14 22:38:53.286792934 SAST	hledger-flow Did not find a construct file at 'import/gawie/bogart/cheque/construct'
2019-04-14 22:38:53.286883454 SAST	hledger-flow Begin: importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-03-30.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules'
2019-04-14 22:38:53.459486947 SAST	hledger-flow End:   importing 'import/gawie/bogart/cheque/2-preprocessed/2016/123456789_2016-03-30.csv' using rules file 'import/gawie/bogart/cheque/bogart-cheque.rules' ExitSuccess (0.172595953s)
2019-04-14 22:38:53.459847977 SAST	hledger-flow Looking for possible extra include files for 'import/gawie/bogart/cheque/2016-include.journal' among these 2 options: ["import/gawie/bogart/cheque/2016-opening.journal","import/gawie/bogart/cheque/_manual_/2016/pre-import.journal"]. Found 1: ["import/gawie/bogart/cheque/2016-opening.journal"]
2019-04-14 22:38:53.459891177 SAST	hledger-flow Looking for possible extra include files for 'import/gawie/bogart/cheque/2016-include.journal' among these 2 options: ["import/gawie/bogart/cheque/2016-closing.journal","import/gawie/bogart/cheque/_manual_/2016/post-import.journal"]. Found 0: []
2019-04-14 22:38:53.460122558 SAST	hledger-flow Looking for possible extra include files for 'import/gawie/bogart/2016-include.journal' among these 2 options: ["import/gawie/bogart/2016-opening.journal","import/gawie/bogart/_manual_/2016/pre-import.journal"]. Found 0: []
2019-04-14 22:38:53.460160128 SAST	hledger-flow Looking for possible extra include files for 'import/gawie/bogart/2016-include.journal' among these 2 options: ["import/gawie/bogart/2016-closing.journal","import/gawie/bogart/_manual_/2016/post-import.journal"]. Found 0: []
2019-04-14 22:38:53.460367028 SAST	hledger-flow Looking for possible extra include files for 'import/gawie/2016-include.journal' among these 2 options: ["import/gawie/2016-opening.journal","import/gawie/_manual_/2016/pre-import.journal"]. Found 0: []
2019-04-14 22:38:53.460392258 SAST	hledger-flow Looking for possible extra include files for 'import/gawie/2016-include.journal' among these 2 options: ["import/gawie/2016-closing.journal","import/gawie/_manual_/2016/post-import.journal"]. Found 0: []
2019-04-14 22:38:53.460578048 SAST	hledger-flow Looking for possible extra include files for 'import/2016-include.journal' among these 2 options: ["import/2016-opening.journal","import/_manual_/2016/pre-import.journal"]. Found 0: []
2019-04-14 22:38:53.460770778 SAST	hledger-flow Looking for possible extra include files for 'import/2016-include.journal' among these 2 options: ["import/2016-closing.journal","import/_manual_/2016/post-import.journal"]. Found 0: []
Imported 3 journals in 0.705221059s

See how the Begin:s and End:s are grouped.
With parallel processing all the Begin:s are shown together at the same time, and the End:s only later when they finish.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

I'll use the --sequential flag all the time. To save typing, I use a wrapper

$ cat $(which hlflow)
#!/usr/bin/env bash

set -euxo pipefail

if echo "$@" | awk "/import/{exit 1}"; then
    hledger-flow "$@"
else
    hledger-flow "$@" --sequential
fi

I'm experiencing some other usability issues which I need to report before confirming

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

--sequential helps, thanks

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@lestephane Can you test if the latest release has any effect on the mangled output?
Especially when parallel processing is enabled.
I saw there were some output statements that didn't use the concurrency-safe functions, so I changed them.

If this doesn't make it better then I'll continue with capturing the output of the scripts.

https://github.com/apauley/hledger-flow/releases/tag/v0.11.1.1-beta

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@lestephane Just out of interested, hledger-flow now captures the output of external processes, and outputs it safely after the process exited. But only when doing parallel processing - when doing sequential processing it lets the external process display its own output as it runs.

I'm keen to know if you see any more mangled output, especially when importing in parallel.

If you do see anything, please re-open this issue. Thanks!

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

No problem, I haven't seen any mangled output so far. Running in parallel from now on

from hledger-flow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.