snivilised / pixa Goto Github PK

0.0 0.0 0.0 549 KB

🧙 Directory tree based bulk image processor (Also serves as a working example of how to use extendio, cobrass and arcadia)

License: MIT License

Go 94.25% Shell 5.75%

pixa's People

pixa's Issues

target path incorrect when it already exists

when a target path already exists (due to a prior run), eg:

/Users/plastikfan/dev/test/pics/blur/TRASH

the next run in the same location, seems to append the relative path to the existing one, resulting in an excessively deep replicated path:

/Users/plastikfan/dev/test/pics/blur/TRASH/blur/TRASH/blur/TRASH/

inline mode

The output file is generated in the same folder as the source. The original file is either moved to a central trash directory or we create a local trash directory. We never do automated deletion as there always needs to be a way to get back to the original, leave user in control of deletion

rename magick package to proxy

this reflects the fact that pixa should not be tied to magick. Theoretically, we should be able to use another application, so the domain package should be something generic.

tidy up

tidy output and replace with log entry as appropriate
remove old commentiing

validateSuffixes implemented incorrectly

why teh eff didnt the tests fail? because the config readers are mocked out!

create path finder template path

the template paths defined in path finder for are strings, which are not type safe. We can improve on this by using fake enum.

implement advanced and default config

The scheme config also has to be take out of the sampler config as it can also be used in a full run.

prepare shrink command implementation

Since ImageMagick is a complex tool, there is no need to write the front end of it, because that would be impractical. Instead, we'll have a sub command called magick, that will delegate arguments directly to it. However, we we also have a default command that does not delegate to magick. The default, will just navigate the tree showing the items found according to parameters meant for the pixa command. This way the default command simply becomes a navigator, without invoking magick.

Next, we can also define profiles that act as a facade over magick. For example, the shrink facade will invoke one of the few ways we can compress an image, eg gassian-basic, sample-factor, or adaptive.

We really don't want to be going overboard here, because this is really just a test tool with some residual value, rather than a test tool that has no value other than to test (eg scorpio).

We can implement a magick sub command which deletegates all parameters to ImageMagick, doing so combines the power of ImageMagick with pixa's ability to perform concurrent tree navigation.

gassian-blur: magick source.jpg -strip -interlace Plane -gaussian-blur 0.05 -quality 85% result.jpg
sample-factor: magick source.jpg -strip -interlace Plane -sampling-factor 4:2:0 -quality 85% result.jpg
adaptive: -strip -interlace Plane -gaussian-blur 0.05 -quality 60% -adaptive-resize 60% img_original.jpg img_resize.jpg

The quality option is generic so it can be populated inside a parent entity inherited by the child profiles. This goes for any other common options.

We also probably need some kind of way to implement reading options from a file, for ease of use. This would be in addition to the viper config already provided by spf13

sample feature

Allows user to try out multiple profiles in the same batch to see which one works the best. Requires a sample entry in the config which is just an array of string. Each string referrers to a profile

The sample config also needs a number of images to try on (let's default to 3) and a minimum image size in MB (default=1).

For each of the images run the selected profile.

By default, the default directory is the current one. For each file we create a relative path that contains the profile name. The out file name by default s the same as the input file, but this can be changed by a config setting

Actually we can create many more config settings without necessarily creating flags. This will avoid complicating the cli.

Some more design notes:

let's delete the Mode param as it is too vague
we introduce 2 concepts, the OutputStrategy and DeletionStrategy, each of which will implement an Inline and Mirror strategy. (as we'll see later, they can be mixed with each other)
The runner is aware of the output strategy moving files accordingly, using the path-finder to create the paths and the file-manager to interact with the file system, using a vfs.
OutputStrategy(INLINE): the output file(s) are created in the same location as item.Path (but do we create as the original name or as an alternative output name; this would also require renaming the original as required)
OutputStrategy(MIRROR): the output files(s) are created in mirror path,, but we also re-create the relative directory of the source to implement mirroring
DeletionStrategy(INLINE): the Trash folder is in the same directory as item.Path
DeletionStrategy(MIRROR): the Trash is in the a mirrored relative directory under MirrorPath
Since, the OutputStrategy and DeletionStrategy are independent of each other, they can be mixed and matched as required; ie just because you're using Mirror in one context, doesn't mean you have to use Mirror in the other.
perhaps we have a transparency mode, ie perform renames such that the new generated files adopt the existing files, so there is no difference, except for the original file would be renamed to something else. With transparency enabled, we make all the decisions to make this possible; we internally make the choice of which strategies are in place, so the user doesn't have to work this out for themselves. But the deletion strategy is independent of transparency, so it really only applies to output. Or, perhaps, do we assume transparent by default and the other options adjust this.
In scenarios where we need to rename the original file to something else, by default, we append "LEGACY" to the end. This can be configured in advanced settings
Advanced settings, a new section in config contains settings for which there are no command line flags, as they rarely need to be changed
the strategies are properties of the path finder as they help create paths depending on usage mode

Further tasks:

consolidate controller

no need to have full/versus sampling controller. This is because I initially made the mistake of think that multi-run was execlusive to sampling; ie not applicable to a full run.

move config into command pkg

remove verboseness of path outputs

Seeing output lke this:

dummy:executing: 'magick /Users/plastikfan/dev/test/pics/blur/TRASH/blur/TRASH/blur/TRASH/screen-shot-1.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/blur/TRASH/screen-shot-1.png'

is too verbose. The full paths should be shown in the log, but the output should contains only relative paths, to reduce the output the user has to contend with.

And to simplify more, consider not outputing for every file, just write a summary.

make use of filepath.Rel

navigate the directory tree

golangci-lint keeps timing out in CI

See: run-configuration

Change to 5m

validate user specified file suffixes

define json schema for yaml config

define commands and options

Include options definitions with appropriate cobrass validators.

Should not implement the functionality, that will be done under a different issue (#3)

To allow pixa to delete options to magick, the user would invoke using --. Any options provided after --, are not interpreted by cobra, this is a posix standard. See this cobra issue for more info

handle third party flags

Since we don'y need to do anything with third party parameters except to delegate them, there is no need or them to be defined as native types; they can just be strings.

Also, the hierachy needs to be flattened out. Initially, it seemed like a good idea to group parameters into categories, but in the fullness of time, this has proven not to be a good idea. We probably need to be able to handle third party parameters with some sort of iteration and if they are located in different sub structs, then this becomes much more difficult.

Currently in pflag, there is no way to capture unknown flags, ie the flags which we class as third party and theoretically would be specified on the command line by the end user following --.

Because of this, we unfortunately have to explicitly define them on our native command so that they are not unknown, but as we are aware of them, we can delegate them as required. This is a sub-optimal work-around because it means we need to do more work and handle args that we shouldn't have to. magick is quite extensive and has many flags. Another way we can do this is to allow third party flags to be specified as a single unit eg:

pixa shrink ... --third "--foo --bar 2 --baz 3"

The saving grace we have is the profiles feature. The user can specify the name of a profile and the flags defined for that profile can easily be retrieved. However, the intention behind the profile was as a supplementary way of defining these third party flags as we would ideally like to allow the user to define these on the command line. It may be that we say that using a profile is the only way to define third party flags.

We can define common used flags like --GaussianBlur, --SamplingFactor, --Interlace and --Quality natively on the appropriate command and then rely on the profiles feature to allow other flags to be defined. The explicit flags will override any matching one in the specified profile.

improve the sampling workflow

Sample mode should not be considered transparent because typically the workflow would be:

run a smaple
inspect the result
perform full run

If transparent, then the originals are noved to the trash location, but this is a problem for a sample run, since the user would have to manually retrieve the original back to source location and re-run. Transparency is only good for a full run.

We need a better workflow.

BEFORE A NON TRANSPARENT RUN, ANY SAMPLE FILES FOUND IN THE SAME DIR AS THE ORIGIN SHOULD BE MOVED TO THE --output LOCATION. THE REASON FOR THIS IS THE USER MAY PERFORM A TRANSPARENT SAMPLE RUN, FOLLOWED BY A NON TRANSPARENT FULL RUN. IN THIS CASE, THE SAMPLE FILES WILL BE IN THE WRONG LOCATION AND SHOULD BE MOVED TO THE --output

full run

Specify a profile name and run like sample version except without the image count.

extension mapping

lowercase
Rename jpeg to jpg

The above is optional

improve access to root repo directory

update unit tests which need to reference the root of the repo but using a git call:

git rev-parse --show-toplevel

executed this form go using something like:

cmd := exec.Command("git", "rev-parse", "--show-toplevel")
output, err := cmd.CombinedOutput()
...

fully implement shrink command

builds upon #3 to complete the implementation

create an easy way to cleanup files after running shrink

create 2 new sub-commands:

accept: keeps the newley created files delete all others, including what in trash
revert: discards newly created files

In order for this to work, we may need a new kind of journal that can be consulted to see what happened.

design mini navigation framework

The framework needs to allow it to be used across multiple commands and then be made generic enough to be used by other applications. Eventually, this framework will be implemented in either/and arcadia and cobrass, depending on relevant pieces of functionality. Actially, a clear separation should be established between the functionality that is arcadia specific and that which is more generc than that and should be implemented in cobrass.

Legend:

🦄 generic
🧙‍♂️specialised

Need the following entities:

EntryBase 🦄 struct, contains generic properties such as the navigation session, and perhaps a config provider (wraps viper)
CommandEntry struct: (eg ShrinkEntry 🧙‍♂️), with embedded EntryBase 🦄
🦄 the entry should have a Configure method that allows it to configure the navigation session. EntryBase will also have a Configure method, to configure the generic properties. We need to allow EntryBase and the command Entry to Configure the session.
each command should have an entry function, eg RunShrink. This entry point should create the ShrinkEntry
entry needs to define a workflow🦄. We can have a dynamic workflow, or a fixed workflow. The fixed workflow contains steps: begin, : process and : end. The fixed workflow will be used by the entry. A dynamic workflow can be composed of custom steps.
workflow runner 🦄 executes the defined steps. The workflow runner is not aware of the begin, process and end steps. All it sees is a sequence and it just runs them transparently.
execution profiles 🦄 : simple (typically used for low complexity operations, typically cpu bound short running tasks) or scheduled (used by longer running and probably concurrent operations requiring synchronisation, typically used for IO bound operations)
task scheduler 🦄 to manage execution of go routines and channels

It should be easy to switch a command to use the simple or scheduled profiles. Eg, a visit command which will defined to use the simple profile, should be able to be switched over to the scheduled profile simply by setting a property.

Configuring the navigation and then setting up the appropriate workflow are 2 separate phases. A workflow step will need to execute the navigator, therefore the workflow infrastructure has a dependency on the navigator, but not the other way around.

use polyfilter

filters not working properly

duplicant transparency mode

The term duplicant is not a real word, rather, it is a rif on duplicate, but duplicate is not adequate because it implies a copy of. The concept that duplicant represents is the fact that when pixa runs in a mode where multiple results are created for each input file, each of these are a duplicant of the original (not a duplicate). And just to be clear, a duplicant can only be created when mutiple results are created. So this either means a full or sample run is executed where a scheme is referenced, which contains more than 1 profile.

Transparency mode for duplicants can't really exist, because transparency mode always refers to the idea that the file system remains the same as it did before pixa was run except that input files are replaced inline by the output and the input is placed elsewhere. This means that to clean up, the user needs to delete files in alterntive locations (either a central trash location or inline trash location). Having said this, what we can say for duplicant transpancy mode, it might be admirable so always create result files in the same location as the input, to make it easy to compare input and resultant files. To combat the obvious clash in names, result files need to be decorated with an appropriate tag which would be an amalgam of the <scheme name>/<profile>/ADHOC.

The destination for the input is transparent if --trash has not been specified. The destination for the input file should be renamed to the origin path (item.Parent) + item.Name + an original file tag, eg "INPUT". So for example:

item.Name: 01_Backyard-Worlds-Planet-9_s01.jpg
destination: 01_Backyard-Worlds-Planet-9_s01.INPUT.jpg

The path finder should have a new method Yield_ which represents an output and is equivalent to the Destination method for the input.

Another requirement that emerges from this issue is that the TRASH location is only used if not in transparent mode. SO transparency has is separate effect on the INPUT and yields (or duplcants). It also means that transparent can't be set explicitly, it is derived by the presence of --output (which applies to yield) and/ or --trash (which applies to input).

On further thought, it may be better to drop th duplicant terminology, it will only end in tears! So instead create a Result method on path finder which accpets resultInfo containg fields that allow it to return the result path for a result in the same manner that the Destination method does. It will apply to the duplicant scenario already described and a single output result.

change ReadDirectory hook to access configured journal label

it is currently hardcoded to ".$journal", but shopuld be accessed from config via finder.statics.

define tidy up functionality

to include journal file management.

vfs not created

need to define e2e test with nothing mocked out, so that we can check the app works outside the confines of a test.

 ✘ plastikfan@Hydra   λ   main  pixa shrink ~/dev/test/pics --profile blur --sample
===> 💥💥💥 USING DUMMY EXECUTOR !!!!
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x128 pc=0x14a1b6f]

goroutine 1 [running]:
github.com/snivilised/pixa/src/app/command.(*MsAdvancedConfigReader).Read(0x8?, {0x0, 0x0})
	/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/config-readers.go:73 +0x2f
github.com/snivilised/pixa/src/app/command.(*Bootstrap).viper(0xc00020b180)
	/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/bootstrap.go:264 +0x114
github.com/snivilised/pixa/src/app/command.(*Bootstrap).buildShrinkCommand(0xc00020b180, 0xc00018caf8)
	/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/shrink-cmd.go:342 +0x716
github.com/snivilised/pixa/src/app/command.(*Bootstrap).Root(0xc00020b180, {0x0, 0x0, 0x1ae7c78?})
	/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/bootstrap.go:191 +0x638
github.com/snivilised/pixa/src/app/command.Execute()
	/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/root-cmd.go:27 +0x26
main.main()
	/Users/plastikfan/dev/github/snivilised/pixa/src/app/main/main.go:10 +0xf
 ✘ plastikfan@Hydra   λ   main

Also, there is mix up on ConfigureOptionsInfo which has a redundant Viper field which is duplicated (probably just leftover from a previous refactoring but should have been removed but wasn't).

implement sampling non transparent

Non transparent concept encompasses the following points:

resultant file does not replace the input file, it is generated in an alternative location
scheme is a validate usage scenario as there will be multiple results for each input (1 point of note; I forgot to add a test for the sampling transparent mode where a sceheme contain just a single profile; in this case; transparency is valid as there still is a single output per input; so add this test)
since the result will be generated in an alternative location, the input can either be moved or renamed with/without a trash label. The whole point of using the shrink command is to eventually delete the original file (to be implemented by an alternative sub command, or help the user identify files to be deleted; hence the trash label)

add timeout on running third party program

execution should stop immediately, if timeout occurs. May need to check extendio functionality to see what happens when a 3rd party error occurs and implement a new behaviour to define what happens; ie do we want to continue on error or bail immediately.

create front end ui with charms

I recently discovered the great looking bubbletea library from charms and think it would be a great fit for pixa. This issue will track multiple child issues.

See these other reources:

inngest: interactive clis with bubbletea
#130
define a model

set logger on annotated wait group

forgot to do this when implementing the logging.

replace zap with slog

implement root command

This will simply navigate the directory tree.

journal file should include original extension

eg:

pixa shrink ~/dev/test/pics --profile blur --skim

===> 💥💥💥 USING DUMMY EXECUTOR !!!!
🧙 pixa Running shrink, with args: '/Users/plastikfan/dev/test/pics'
===> 🛡️  beginning traversal ...
✨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/screen-shot-1.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/screen-shot-1.png'
✨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/screen-shot-2.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/screen-shot-2.png'
✨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/wonky.avatar.jpg --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/wonky.avatar.jpg'
✨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/wonky.avatar.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/wonky.avatar.png'
🚩 error occurred during navigation (journal file '/Users/plastikfan/dev/test/pics/wonky.avatar.$journal.txt' not found)💔 [started: 'Tue, 16 Jan 2024 15:32:22 GMT', elapsed: '7.006276ms']
Error: journal file '/Users/plastikfan/dev/test/pics/wonky.avatar.$journal.txt' not found
Usage:
  main shrink [flags]

Flags:
  -F, --files string             files extended glob filter: <glob>|<suffixes csv> (negate-able with leading !)
  -X, --files-rx string          files-rx folder regular expression filter (negate-able with leading !)
  -Z, --folders-gb string        folders-gb folder glob filter (negate-able with leading !)
  -Y, --folders-rx string        folders-rx folder regular expression filter (negate-able with leading !)
  -b, --gaussian-blur float32    gaussian-blur (see magick documentation for more info) (default 0.05)
  -h, --help                     help for shrink
  -i, --interlace string         interlace (see magick documentation for more info) (default "plane")
  -o, --output string            output creates a mirror of the source directory tree containing processed images
  -q, --quality int              quality (see magick documentation for more info) (default 80)
  -f, --sampling-factor string   sampling-factor (see magick documentation for more info) (default "4:2:0")
  -s, --strip                    strip (see magick documentation for more info)
  -t, --trash string             trash indicates where deleted items are moved to

Global Flags:
      --cpu               cpu denotes parallel execution with all available processors
      --depth uint        depth denotes the number of sub directories to navigate
  -D, --dry-run           dry-run allows the user to see the effects of a command without running it
      --last              last is a flag that indicates which n items are to be sampled
      --no-files uint     no-files files specifies the number of files to sample (default 3)
      --no-folders uint   no-folders folders specifies the number of folders to sample (default 3)
      --now int           now denotes parallel execution with this number of workers in pool (default -1)
  -P, --profile string    profile specifies which set of flags/options to load from config
      --sample            sample is a flag that activates sampling
  -S, --scheme string     scheme is a collection of profiles, typically to repeat an operation over
  -K, --skim              skim sets the navigator to not descend into sub directories

reset the README doc

rename runner to controller

rework config

The current config is a bit of a problem, because it is causing duplicated code. The config interfaces are defined in proxy, but the implementation are defined in proxy. The problem starts with the mock data, that would need access to command, but doing so reveals an import cycle. So the resolutin would be to create a new cfg package that contains the implementation classes which could be reused by the mock config data.

add a skim flag

using extentio's navigator, by default assumes the whole directory structure rooted at the path specified by the user should be traversed. However, when creating a client app like pixa, the normal mode of usgae would be just to process the directory specified. We should conform to this norm so as to not surpise the user particluar when file system writes occur. We can then introduce a --recurse flag that allows for the full traversal to occur.

May need a fix in extendio to achieve this. In fact there is an outstanding issue in extendio whose aim is the limit the travesal depth.

use worker pool acceleration

Should only use the worker pool if the user specifies a particular flag on command line, either --cpu for all CPU's or --now for n number of workers, otherwise we proceed on the existing single threaded synchronised basis.

Or perhaps if we can define a flag that can have an optional value, then we can just use --cpu. The problem with CPU is that it's a bit of a misnomer when we specify a number greater than the number of CPUs as in reality, we really mean no of workers which is not related to the number of CPUs the host has. Given this, we should probably use --now instead and if the user omits an option value we set it to NumCPU.

migrate to using a virtual fs universally

Actually, this doesn't have to be implemented by explicitly passing in an instance of storage.VirtualFS.. We already have hook functions so we should that existing functionality instead.

The navigator uses (froom the file system):

Lstat
ReadDir

so we should define functions for these that work with with the vritual file system

In the case of Lstat, the client can just set the QueryStatus hook function, eg:

o.Hooks.QueryStatus = e.Vfs.Lstat

For ReadDirectory, the client could do something like this:

	o.Hooks.ReadDirectory = func(dirname string) ([]fs.DirEntry, error) {
		contents, err := e.Vfs.ReadDir(dirname)
		if err != nil {
			return nil, err
		}

		return lo.Filter(contents, func(item fs.DirEntry, index int) bool {
			return item.Name() != ".DS_Store"
		}), nil
	}

But this is boiler-plate code, so we should provide one that the client can use.

Using this approach, we can avoid universallly mocking out the file system, which just adds more comlexity to unit tests.

add structured logger

This is typical new logger functionality that should be created by client:

func NewLogger(info *LoggerInfo) Ref {
	return utils.NewRoProp(lo.TernaryF(info.Enabled,
		func() Logger {
			if info.Path == "" {
				panic(i18n.NewInvalidConfigEntryError(info.Path, "Store/Logging/Path"))
			}
			ws := zapcore.AddSync(&lumberjack.Logger{
				Filename:   info.Path,
				MaxSize:    info.Rotation.MaxSizeInMb,
				MaxBackups: info.Rotation.MaxNoOfBackups,
				MaxAge:     info.Rotation.MaxAgeInDays,
			})
			config := zap.NewProductionEncoderConfig()
			config.EncodeTime = zapcore.TimeEncoderOfLayout(info.TimeStampFormat)
			core := zapcore.NewCore(
				zapcore.NewJSONEncoder(config),
				ws,
				info.Level,
			)
			return zap.New(core)
		}, func() Logger {
			return zap.NewNop()
		}))
}

and to clean up ...

func (n *navigator) finish() error {
	return n.log.Get().Sync()
}

journalling

For every run that actually creates files or overwrites, we first create a dummy text file of the same main name at the start of the batch

After every successful image operation, we remove the associated journal file. With this in place we will be able to implement the resume feature with greater reliability. But this might require a double navigation of the directory tree. But this is also good because it would also allow us to implement a progress report too

add suffixes into advanced config

filter by file size

check how this is done in the standard find command

implement look-ahead

Both the sample and full modes require a look -ahead phase to support journalling. So this issue will implement this dual phase navigation.

create an agent for magick

It is assumed that magick is the third party program to be invoked. However this shouldnt be a hard rule. We should be able to intergrate with any appropriate third party program.

We should introduce the concept of an agent that represents the third party program and will interface with it as such. The agent is aware of anything that is specific to that program.

Also introduce a program field in the advanced config that allows it to be switched to wahat-ever program they wish. However, to prevent mis-configuration, we should not just allow anything to be used for this field. It should be vetted against an allowed list. In theory, more than 1 external program could be associated with a partilcuar agent if they have the same interface (in reality, this would be unlikely, but we should not make any assumptions upfront).

We probably need to introduce a new section into the advanced config for the explicit purpose of holdig any setting regarding the third party executable.

snivilised / pixa Goto Github PK

pixa's People

pixa's Issues

Recommend Projects

Recommend Topics

Recommend Org