Code Monkey home page Code Monkey logo

Comments (22)

apauley avatar apauley commented on June 12, 2024 1

Hi @lestephane, thanks for detailing your workaround. I've been preoccupied with moving me and my family from Africa to Europe the last few months, but once I'm settled I would really like to get a good solution for this in.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024 1

I've tried it a few times today and it seems to work.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024 1

I think it should work if you move --batch-size 200 to just before import

Right you are, my wrapper helper script ordering of those arguments was to blame.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

I changed the behaviour intentionally in v0.11.3 to always be consistent about what the base directory is:
#41

My thinking at the time was that it would be less confusing if the base dir were always the same, in the same way that way that git treats the root of any git repo.

You do make a good case though for the need to import a subset of files.
Should we automatically treat a subdir of the hledger-flow base dir as an indication that we should only import that subdir and below? Possibly.

The other option could be to have another command-line flag. I'm leaning towards the first option.

The part where you get an error when specifying the base dir as . while in a subdir is unexpected, I would have expected it to detect the top-level base dir even in that case.
I'll look at this unexpected error first.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

Another reason to provide this partial import mode is that when something goes wrong and I need to run in --verbose mode, I typically also have to include --sequential. If hledger-flow import imports everything everytime, a lot of logging will occur, and unless you know what you're looking for, this will get tedious.

I don't have an elegant solution to how we should activate the partial import (yet). But I'm also tending to the first option, it's the one that causes the less head scratching.

To take a relatable(?) example:

When i run grep -r PATTERN, the search starts in the working directory and works its way downwards into sub-directories.

When I run grep -r PATTERN . or grep -r /some/directory, the search starts in the specified directory and, again, works its way downwards into sub-directories

The directionality expectation is derived from that. From my prior expectation of how unix tools work, I'd expect hledger-flow import to start importing from the working directory, working its way downwards into sub-directories, and hledger-flow import /some/directory to start importing from the specified directory, working its way downwards into sub-directories.

If you want to do the full import correctly without needing to think about the correct working directory,
it's possible to setup a bash alias like so

hledger_import() {
    hledger-flow import /finance/rootdirectory "$@"
} 
alias hlimport="hledger_import"

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

The workaround I currently use is actually not working either, v0.11.1.2 in sub-account import mode removes all siblings from include files in the parent directory of the directory where I run the import. That's another head scratcher.

$ alias | grep hl
alias hl='${HOME}/.local/bin/hledger-flow-v0.11.1.2'
$ cd ~/Finance/import/personal
$ git diff
(empty)
$ hl import
...
$ git diff
~/Finance/import/personal$ git diff
diff --git a/import/2018-include.journal b/import/2018-include.journal
index 62a832f..3726eb7 100644
--- a/import/2018-include.journal
+++ b/import/2018-include.journal
@@ -1,6 +1,5 @@
 ### Generated by hledger-flow - DO NOT EDIT ###
 
 !include 2018-opening.journal
-!include business-de/2018-include.journal
 !include personal/2018-include.journal
 !include 2018-closing.journal

I think it's fair to assume that running an import in a sub-account is unlikely to require a modification of any parent includes, and so all existing parent includes should be left untouched. This keeps the directionality of side-effects pointing downwards, always. If i need to regenerate includes to account for new years, I need to go to the import root to run the import.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

Running in a subdirectory isn't something I tried to support until v0.11.3, and then the behaviour I had in mind was to always import everything.

So I can think the behaviour is unpredictable in earlier versions. We'll have to change it and document it so that it is part of the supported feature set.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

I've tried v0.11.3 just now, but it also imports everything. So my current workaround for partial import is to use v0.11.1.2 for imports, and revert changes made to includes in parent directories.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

Another side-effect of the automatic detection of the import directory (in import everything mode) is that hledger-flow attempts to import some directories that exist outside of the import hierarchy. These are directories that I moved out of the way because they were work-in-progress. I don't expect them to cause an import error when I run an import in a subaccount of the root import directory.

~/Finance/import/personal$ hledger-flow --version
hledger-flow 0.12.3.0 linux x86_64 ghc 8.6

~/Finance/import/personal$ hledger-flow import --sequential
Collecting input files...
Found 81 input files in 0.795055531s. Proceeding with import...
I couldn't find the right number of directories between "import" and the input file:
/home/lestephane/Vault/Finance/wip/bisq/account-1/1-in/2019/2019.csv

hledger-flow expects to find input files in this structure:
import/owner/bank/account/filestate/year/trxfile

Have a look at the documentation for a detailed explanation:
https://github.com/apauley/hledger-flow#input-files

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@lestephane Could you please add --show-options to the output above (actually for all outputs when reporting something), I'd like to see what hledger-flow is using as the base dir.

Please do it with the latest 0.12.3.1 release, I've made a change to always use an absolute path for the base dir.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024
~/Vault/Finance/import/personal$ hledger-flow-v0.12.3.1 import --show-options
RuntimeOptions {baseDir = FilePath "/home/lestephane/Vault/Finance/", hfVersion = "hledger-flow 0.12.3.1 linux x86_64 ghc 8.6", hledgerInfo = HledgerInfo {hlPath = FilePath "/home/lestephane/Vault/Finance/hledger", hlVersion = "hledger 1.14.99"}, sysInfo = SystemInfo {os = "linux", arch = "x86_64", compilerName = "ghc", compilerVersion = Version {versionBranch = [8,6], versionTags = []}}, verbose = False, showOptions = True, sequential = False}
Collecting input files...
Found 139 input files in 0.674321691s. Proceeding with import...
I couldn't find the right number of directories between "import" and the input file:
/home/lestephane/Vault/Finance/wip/paypal/account/1-in/2019/2019.csv
...

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

(Here is my current workaround for anyone interested)

I use the latest hledger-flow import when I need a full import, which is taking longer and longer as the number of files grows on my end (20 seconds for 290 files nowadays, will grow worse for sure). Since haskell also uses all available cores, that's 20 seconds where the laptop is not responsive. Can't have that.

So if I'm only working in one account subdirectory, and do not want this delay, I run the newest version of hledger-flow that does not have this import everything everytime bug (v0.11.1.2), using an alias:

function _hlimport() {
  PATH="${PATH}:${HOME}/.local/bin" "${HOME}/.local/bin/hledger-flow-v0.11.1.2" import "$@"
  git status -s |
    awk '$1~/^MM?/ && $2~/^(..\/)+([[:digit:]]{4}-include|all-years)\.journal/{print $2}' |
      xargs --verbose --no-run-if-empty --max-lines=1 git checkout --
}
alias hlimport="_hlimport"

Once the alias is in place, an hlimport invocation imports only the subdirectory I'm in (which is good), but also modifies includes in parent directories (which is bad, see my June 5 comment in this issue).

That's where the git checkout comes in, to restore those parent includes to their values from the git index.

import/personal/wallet/cash$ hlimport
Collecting input files...
Found 37 input files in 0.023231917s. Proceeding with import...
Imported 37 journals in 1.043626395s    
git checkout -- ../../../../all-years.journal 
git checkout -- ../../../2017-include.journal 
git checkout -- ../../../2018-include.journal 
git checkout -- ../../../2019-include.journal 
git checkout -- ../../../all-years.journal 
git checkout -- ../../2017-include.journal 
git checkout -- ../../2018-include.journal 
git checkout -- ../../2019-include.journal 

This trick is only meant to save time for localized work on one account sub-directory at a time.

After each work session you need to commit work on the account before working on another one.

And when the parent includes do have new modifications that need to be kept (additions mostly), then git add or git commit those first. If ever in doubt whether the include files are all correct, just rerun the entire hledger-flow import using the latest release.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@lestephane You can check out v0.13 for now:
https://github.com/apauley/hledger-flow/releases/tag/v0.13.0.0

It should solve one of your problems, the need to use v0.11.1.2
I'm still looking at the issue where include files in parent directories are regenerated with just a subset of journals.

Example use: hledger-flow --show-options import --experimental-rundir ./import/gawie/bogart/cheque

I removed the bug label, because the earlier error (Unable to find an hledger-flow import directory at './') was fixed, and I think the behaviour in 0.12.x is correct, even though it prevented you from getting fast feedback. The processing of a subset of files on the other hand is currently producing unexpected results (the include files). So it is faster but not 100% correct.

I think fast feedback is an important use case, I hope to release some more updates to address this.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@lestephane I haven't released anything yet, but there is something that mostly works in the branch rundir-improvements (#78).

You can compile that branch and test it a bit if you'd like.
There have been a lot of annoying corner cases that I fixed as I found them, so please let me know if you find anything else unexpected.

Usage:

hledger-flow import --enable-future-rundir ./import/gawie/bogart

A known issue in that branch:
if you're doing a full import (using the top-level base dir) with --enable-future-rundir it generates unnecessary yearly include files in the base dir.

hledger-flow import --help
Usage: hledger-flow import [DIR] [--enable-future-rundir]
  Uses hledger with your own rules and/or scripts to convert electronic statements into categorised journal files

Available options:
  DIR                      The directory to import. Use the base directory for a
                           full import or a sub-directory for a partial import.
                           Defaults to the current directory. This behaviour is
                           changing: see --enable-future-rundir
  --enable-future-rundir   Enable the future (0.14.x) default behaviour now:
                           start importing only from the directory that was
                           given as an argument, or the currect directory.
                           Previously a full import was always done. This switch
                           will be removed in 0.14.x
  -h,--help                Show this help text

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

Can you confirm that the branch is rundir and not rundir-improvements?

$ git fetch --all
Fetching origin
remote: Enumerating objects: 125, done.
remote: Counting objects: 100% (125/125), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 125 (delta 60), reused 96 (delta 40), pack-reused 0
Receiving objects: 100% (125/125), 45.55 KiB | 1.17 MiB/s, done.
Resolving deltas: 100% (60/60), completed with 3 local objects.
From https://github.com/apauley/hledger-flow
   ced0b70..e8e508b  master     -> origin/master
 * [new branch]      rundir     -> origin/rundir
 * [new tag]         v0.13.2.0  -> v0.13.2.0
 * [new tag]         v0.13.1.0  -> v0.13.1.0

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@lestephane The branch is gone, I merged and released it a few hours ago:
https://github.com/apauley/hledger-flow/releases/tag/v0.13.2.0

The known issue I mentioned is also fixed.

Does the new behaviour match what you would expect?

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

compiling, should I be worried about this warning?

~/GitRepos/hledger-flow$ stack install
Stack has not been tested with GHC versions above 8.6, and using 8.8.3, this may fail <<<
Preparing to install GHC to an isolated location.
This will not interfere with any system-level installation.
ghc-8.8.3:   50.33 MiB / 187.19 MiB ( 26.89%) downloaded...^C 

~/GitRepos/hledger-flow$ stack upgrade
Current Stack version: 2.1.3, available download version: 2.1.3
Skipping binary upgrade, you are already running the most recent version

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

Looking good so far, and I didn't notice any unexpectedly modified include files.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

Looking good so far, and I didn't notice any unexpectedly modified include files.

Great, let's close this issue, I think the main issue is solved. If it isn't we can re-open.
But for issues that are possibly just related, not exactly the same, I'll prefer a new issue to be opened. We can link to this one though if that happens.

Future releases will still remove the flag and make this behaviour the default.

compiling, should I be worried about this warning?

~/GitRepos/hledger-flow$ stack install
Stack has not been tested with GHC versions above 8.6, and using 8.8.3, this may fail <<<

No need to worry about this, it compiles successfully despite the warning.
I expect that newer versions of stack will stop complaining.

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

--enable-future-rundir is now the default behaviour, as of release 0.14.1. The option has been deprecated (will be removed in a future release).

Specifying the option in the latest release doesn't do anything, other than print a message to the console.

from hledger-flow.

lestephane avatar lestephane commented on June 12, 2024

@apauley the ability to specify an arbitrary directory has disappeared. Was that intentional?

$ hlimport import/personal/wallet/cash/
using hledger flow executable: hledger-flow-async-batches-793f882bb22ac7b89a98077ee95b3464bbc5c0e0...
/home/lestephane/.local/bin/hledger-flow-async-batches-793f882bb22ac7b89a98077ee95b3464bbc5c0e0 +RTS -N10 -RTS --show-options import --batch-size 200 import/personal/wallet/cash/
Invalid argument `import/personal/wallet/cash/'

from hledger-flow.

apauley avatar apauley commented on June 12, 2024

@apauley the ability to specify an arbitrary directory has disappeared. Was that intentional?

batch size is an option on the main hledger-flow command, and if you put it after the import subcommand it is interpreted as an option on import.

I think it should work if you move --batch-size 200 to just before import

from hledger-flow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.