snivilised / pixa Goto Github PK
View Code? Open in Web Editor NEWπ§ Directory tree based bulk image processor (Also serves as a working example of how to use extendio, cobrass and arcadia)
License: MIT License
π§ Directory tree based bulk image processor (Also serves as a working example of how to use extendio, cobrass and arcadia)
License: MIT License
when a target path already exists (due to a prior run), eg:
/Users/plastikfan/dev/test/pics/blur/TRASH
the next run in the same location, seems to append the relative path to the existing one, resulting in an excessively deep replicated path:
/Users/plastikfan/dev/test/pics/blur/TRASH/blur/TRASH/blur/TRASH/
The output file is generated in the same folder as the source. The original file is either moved to a central trash directory or we create a local trash directory. We never do automated deletion as there always needs to be a way to get back to the original, leave user in control of deletion
this reflects the fact that pixa should not be tied to magick. Theoretically, we should be able to use another application, so the domain package should be something generic.
why teh eff didnt the tests fail? because the config readers are mocked out!
the template paths defined in path finder for are strings, which are not type safe. We can improve on this by using fake enum.
The scheme config also has to be take out of the sampler config as it can also be used in a full run.
Since ImageMagick is a complex tool, there is no need to write the front end of it, because that would be impractical. Instead, we'll have a sub command called magick, that will delegate arguments directly to it. However, we we also have a default command that does not delegate to magick. The default, will just navigate the tree showing the items found according to parameters meant for the pixa command. This way the default command simply becomes a navigator, without invoking magick.
Next, we can also define profiles that act as a facade over magick. For example, the shrink facade will invoke one of the few ways we can compress an image, eg gassian-basic, sample-factor, or adaptive.
We really don't want to be going overboard here, because this is really just a test tool with some residual value, rather than a test tool that has no value other than to test (eg scorpio).
We can implement a magick sub command which deletegates all parameters to ImageMagick, doing so combines the power of ImageMagick with pixa's ability to perform concurrent tree navigation.
magick source.jpg -strip -interlace Plane -gaussian-blur 0.05 -quality 85% result.jpg
magick source.jpg -strip -interlace Plane -sampling-factor 4:2:0 -quality 85% result.jpg
-strip -interlace Plane -gaussian-blur 0.05 -quality 60% -adaptive-resize 60% img_original.jpg img_resize.jpg
The quality option is generic so it can be populated inside a parent entity inherited by the child profiles. This goes for any other common options.
We also probably need some kind of way to implement reading options from a file, for ease of use. This would be in addition to the viper config already provided by spf13
Allows user to try out multiple profiles in the same batch to see which one works the best. Requires a sample entry in the config which is just an array of string. Each string referrers to a profile
The sample config also needs a number of images to try on (let's default to 3) and a minimum image size in MB (default=1).
For each of the images run the selected profile.
By default, the default directory is the current one. For each file we create a relative path that contains the profile name. The out file name by default s the same as the input file, but this can be changed by a config setting
Actually we can create many more config settings without necessarily creating flags. This will avoid complicating the cli.
Some more design notes:
Further tasks:
no need to have full/versus sampling controller. This is because I initially made the mistake of think that multi-run was execlusive to sampling; ie not applicable to a full run.
Seeing output lke this:
dummy:executing: 'magick /Users/plastikfan/dev/test/pics/blur/TRASH/blur/TRASH/blur/TRASH/screen-shot-1.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/blur/TRASH/screen-shot-1.png'
is too verbose. The full paths should be shown in the log, but the output should contains only relative paths, to reduce the output the user has to contend with.
And to simplify more, consider not outputing for every file, just write a summary.
make use of filepath.Rel
See: run-configuration
Change to 5m
Include options definitions with appropriate cobrass validators.
Should not implement the functionality, that will be done under a different issue (#3)
To allow pixa to delete options to magick, the user would invoke using --. Any options provided after --, are not interpreted by cobra, this is a posix standard. See this cobra issue for more info
Since we don'y need to do anything with third party parameters except to delegate them, there is no need or them to be defined as native types; they can just be strings.
Also, the hierachy needs to be flattened out. Initially, it seemed like a good idea to group parameters into categories, but in the fullness of time, this has proven not to be a good idea. We probably need to be able to handle third party parameters with some sort of iteration and if they are located in different sub structs, then this becomes much more difficult.
Currently in pflag, there is no way to capture unknown flags, ie the flags which we class as third party and theoretically would be specified on the command line by the end user following --.
Because of this, we unfortunately have to explicitly define them on our native command so that they are not unknown, but as we are aware of them, we can delegate them as required. This is a sub-optimal work-around because it means we need to do more work and handle args that we shouldn't have to. magick is quite extensive and has many flags. Another way we can do this is to allow third party flags to be specified as a single unit eg:
pixa shrink ... --third "--foo --bar 2 --baz 3"
The saving grace we have is the profiles feature. The user can specify the name of a profile and the flags defined for that profile can easily be retrieved. However, the intention behind the profile was as a supplementary way of defining these third party flags as we would ideally like to allow the user to define these on the command line. It may be that we say that using a profile is the only way to define third party flags.
We can define common used flags like --GaussianBlur, --SamplingFactor, --Interlace and --Quality natively on the appropriate command and then rely on the profiles feature to allow other flags to be defined. The explicit flags will override any matching one in the specified profile.
Sample mode should not be considered transparent because typically the workflow would be:
If transparent, then the originals are noved to the trash location, but this is a problem for a sample run, since the user would have to manually retrieve the original back to source location and re-run. Transparency is only good for a full run.
We need a better workflow.
BEFORE A NON TRANSPARENT RUN, ANY SAMPLE FILES FOUND IN THE SAME DIR AS THE ORIGIN SHOULD BE MOVED TO THE --output LOCATION. THE REASON FOR THIS IS THE USER MAY PERFORM A TRANSPARENT SAMPLE RUN, FOLLOWED BY A NON TRANSPARENT FULL RUN. IN THIS CASE, THE SAMPLE FILES WILL BE IN THE WRONG LOCATION AND SHOULD BE MOVED TO THE --output
Specify a profile name and run like sample version except without the image count.
The above is optional
update unit tests which need to reference the root of the repo but using a git call:
git rev-parse --show-toplevel
executed this form go using something like:
cmd := exec.Command("git", "rev-parse", "--show-toplevel")
output, err := cmd.CombinedOutput()
...
create 2 new sub-commands:
In order for this to work, we may need a new kind of journal that can be consulted to see what happened.
The framework needs to allow it to be used across multiple commands and then be made generic enough to be used by other applications. Eventually, this framework will be implemented in either/and arcadia and cobrass, depending on relevant pieces of functionality. Actially, a clear separation should be established between the functionality that is arcadia specific and that which is more generc than that and should be implemented in cobrass.
Legend:
Need the following entities:
It should be easy to switch a command to use the simple or scheduled profiles. Eg, a visit command which will defined to use the simple profile, should be able to be switched over to the scheduled profile simply by setting a property.
Configuring the navigation and then setting up the appropriate workflow are 2 separate phases. A workflow step will need to execute the navigator, therefore the workflow infrastructure has a dependency on the navigator, but not the other way around.
The term duplicant is not a real word, rather, it is a rif on duplicate, but duplicate is not adequate because it implies a copy of. The concept that duplicant represents is the fact that when pixa runs in a mode where multiple results are created for each input file, each of these are a duplicant of the original (not a duplicate). And just to be clear, a duplicant can only be created when mutiple results are created. So this either means a full or sample run is executed where a scheme is referenced, which contains more than 1 profile.
Transparency mode for duplicants can't really exist, because transparency mode always refers to the idea that the file system remains the same as it did before pixa was run except that input files are replaced inline by the output and the input is placed elsewhere. This means that to clean up, the user needs to delete files in alterntive locations (either a central trash location or inline trash location). Having said this, what we can say for duplicant transpancy mode, it might be admirable so always create result files in the same location as the input, to make it easy to compare input and resultant files. To combat the obvious clash in names, result files need to be decorated with an appropriate tag which would be an amalgam of the <scheme name>/<profile>/ADHOC.
The destination for the input is transparent if --trash has not been specified. The destination for the input file should be renamed to the origin path (item.Parent) + item.Name + an original file tag, eg "INPUT". So for example:
The path finder should have a new method Yield_ which represents an output and is equivalent to the Destination method for the input.
Another requirement that emerges from this issue is that the TRASH location is only used if not in transparent mode. SO transparency has is separate effect on the INPUT and yields (or duplcants). It also means that transparent can't be set explicitly, it is derived by the presence of --output (which applies to yield) and/ or --trash (which applies to input).
On further thought, it may be better to drop th duplicant terminology, it will only end in tears! So instead create a Result method on path finder which accpets resultInfo containg fields that allow it to return the result path for a result in the same manner that the Destination method does. It will apply to the duplicant scenario already described and a single output result.
it is currently hardcoded to ".$journal", but shopuld be accessed from config via finder.statics.
to include journal file management.
need to define e2e test with nothing mocked out, so that we can check the app works outside the confines of a test.
β plastikfan@Hydra ξ° Ξ» ξ° ξ main ξ° pixa shrink ~/dev/test/pics --profile blur --sample
===> π₯π₯π₯ USING DUMMY EXECUTOR !!!!
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x128 pc=0x14a1b6f]
goroutine 1 [running]:
github.com/snivilised/pixa/src/app/command.(*MsAdvancedConfigReader).Read(0x8?, {0x0, 0x0})
/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/config-readers.go:73 +0x2f
github.com/snivilised/pixa/src/app/command.(*Bootstrap).viper(0xc00020b180)
/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/bootstrap.go:264 +0x114
github.com/snivilised/pixa/src/app/command.(*Bootstrap).buildShrinkCommand(0xc00020b180, 0xc00018caf8)
/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/shrink-cmd.go:342 +0x716
github.com/snivilised/pixa/src/app/command.(*Bootstrap).Root(0xc00020b180, {0x0, 0x0, 0x1ae7c78?})
/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/bootstrap.go:191 +0x638
github.com/snivilised/pixa/src/app/command.Execute()
/Users/plastikfan/dev/github/snivilised/pixa/src/app/command/root-cmd.go:27 +0x26
main.main()
/Users/plastikfan/dev/github/snivilised/pixa/src/app/main/main.go:10 +0xf
β plastikfan@Hydra ξ° Ξ» ξ° ξ main
Also, there is mix up on ConfigureOptionsInfo which has a redundant Viper field which is duplicated (probably just leftover from a previous refactoring but should have been removed but wasn't).
Non transparent concept encompasses the following points:
execution should stop immediately, if timeout occurs. May need to check extendio functionality to see what happens when a 3rd party error occurs and implement a new behaviour to define what happens; ie do we want to continue on error or bail immediately.
I recently discovered the great looking bubbletea library from charms and think it would be a great fit for pixa. This issue will track multiple child issues.
See these other reources:
define a model
forgot to do this when implementing the logging.
This will simply navigate the directory tree.
eg:
pixa shrink ~/dev/test/pics --profile blur --skim
===> π₯π₯π₯ USING DUMMY EXECUTOR !!!!
π§ pixa Running shrink, with args: '/Users/plastikfan/dev/test/pics'
===> π‘οΈ beginning traversal ...
β¨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/screen-shot-1.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/screen-shot-1.png'
β¨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/screen-shot-2.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/screen-shot-2.png'
β¨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/wonky.avatar.jpg --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/wonky.avatar.jpg'
β¨ dummy:executing: 'magick /Users/plastikfan/dev/test/pics/$pixa$/blur/TRASH/wonky.avatar.png --gaussian-blur 0.05 --interlace plane --strip /Users/plastikfan/dev/test/pics/wonky.avatar.png'
π© error occurred during navigation (journal file '/Users/plastikfan/dev/test/pics/wonky.avatar.$journal.txt' not found)π [started: 'Tue, 16 Jan 2024 15:32:22 GMT', elapsed: '7.006276ms']
Error: journal file '/Users/plastikfan/dev/test/pics/wonky.avatar.$journal.txt' not found
Usage:
main shrink [flags]
Flags:
-F, --files string files extended glob filter: <glob>|<suffixes csv> (negate-able with leading !)
-X, --files-rx string files-rx folder regular expression filter (negate-able with leading !)
-Z, --folders-gb string folders-gb folder glob filter (negate-able with leading !)
-Y, --folders-rx string folders-rx folder regular expression filter (negate-able with leading !)
-b, --gaussian-blur float32 gaussian-blur (see magick documentation for more info) (default 0.05)
-h, --help help for shrink
-i, --interlace string interlace (see magick documentation for more info) (default "plane")
-o, --output string output creates a mirror of the source directory tree containing processed images
-q, --quality int quality (see magick documentation for more info) (default 80)
-f, --sampling-factor string sampling-factor (see magick documentation for more info) (default "4:2:0")
-s, --strip strip (see magick documentation for more info)
-t, --trash string trash indicates where deleted items are moved to
Global Flags:
--cpu cpu denotes parallel execution with all available processors
--depth uint depth denotes the number of sub directories to navigate
-D, --dry-run dry-run allows the user to see the effects of a command without running it
--last last is a flag that indicates which n items are to be sampled
--no-files uint no-files files specifies the number of files to sample (default 3)
--no-folders uint no-folders folders specifies the number of folders to sample (default 3)
--now int now denotes parallel execution with this number of workers in pool (default -1)
-P, --profile string profile specifies which set of flags/options to load from config
--sample sample is a flag that activates sampling
-S, --scheme string scheme is a collection of profiles, typically to repeat an operation over
-K, --skim skim sets the navigator to not descend into sub directories
The current config is a bit of a problem, because it is causing duplicated code. The config interfaces are defined in proxy, but the implementation are defined in proxy. The problem starts with the mock data, that would need access to command, but doing so reveals an import cycle. So the resolutin would be to create a new cfg package that contains the implementation classes which could be reused by the mock config data.
using extentio's navigator, by default assumes the whole directory structure rooted at the path specified by the user should be traversed. However, when creating a client app like pixa, the normal mode of usgae would be just to process the directory specified. We should conform to this norm so as to not surpise the user particluar when file system writes occur. We can then introduce a --recurse flag that allows for the full traversal to occur.
May need a fix in extendio to achieve this. In fact there is an outstanding issue in extendio whose aim is the limit the travesal depth.
Should only use the worker pool if the user specifies a particular flag on command line, either --cpu for all CPU's or --now for n number of workers, otherwise we proceed on the existing single threaded synchronised basis.
Or perhaps if we can define a flag that can have an optional value, then we can just use --cpu. The problem with CPU is that it's a bit of a misnomer when we specify a number greater than the number of CPUs as in reality, we really mean no of workers which is not related to the number of CPUs the host has. Given this, we should probably use --now instead and if the user omits an option value we set it to NumCPU.
Actually, this doesn't have to be implemented by explicitly passing in an instance of storage.VirtualFS.. We already have hook functions so we should that existing functionality instead.
The navigator uses (froom the file system):
so we should define functions for these that work with with the vritual file system
In the case of Lstat, the client can just set the QueryStatus hook function, eg:
o.Hooks.QueryStatus = e.Vfs.Lstat
For ReadDirectory, the client could do something like this:
o.Hooks.ReadDirectory = func(dirname string) ([]fs.DirEntry, error) {
contents, err := e.Vfs.ReadDir(dirname)
if err != nil {
return nil, err
}
return lo.Filter(contents, func(item fs.DirEntry, index int) bool {
return item.Name() != ".DS_Store"
}), nil
}
But this is boiler-plate code, so we should provide one that the client can use.
Using this approach, we can avoid universallly mocking out the file system, which just adds more comlexity to unit tests.
This is typical new logger functionality that should be created by client:
func NewLogger(info *LoggerInfo) Ref {
return utils.NewRoProp(lo.TernaryF(info.Enabled,
func() Logger {
if info.Path == "" {
panic(i18n.NewInvalidConfigEntryError(info.Path, "Store/Logging/Path"))
}
ws := zapcore.AddSync(&lumberjack.Logger{
Filename: info.Path,
MaxSize: info.Rotation.MaxSizeInMb,
MaxBackups: info.Rotation.MaxNoOfBackups,
MaxAge: info.Rotation.MaxAgeInDays,
})
config := zap.NewProductionEncoderConfig()
config.EncodeTime = zapcore.TimeEncoderOfLayout(info.TimeStampFormat)
core := zapcore.NewCore(
zapcore.NewJSONEncoder(config),
ws,
info.Level,
)
return zap.New(core)
}, func() Logger {
return zap.NewNop()
}))
}
and to clean up ...
func (n *navigator) finish() error {
return n.log.Get().Sync()
}
For every run that actually creates files or overwrites, we first create a dummy text file of the same main name at the start of the batch
After every successful image operation, we remove the associated journal file. With this in place we will be able to implement the resume feature with greater reliability. But this might require a double navigation of the directory tree. But this is also good because it would also allow us to implement a progress report too
check how this is done in the standard find command
Both the sample and full modes require a look -ahead phase to support journalling. So this issue will implement this dual phase navigation.
It is assumed that magick is the third party program to be invoked. However this shouldnt be a hard rule. We should be able to intergrate with any appropriate third party program.
We should introduce the concept of an agent that represents the third party program and will interface with it as such. The agent is aware of anything that is specific to that program.
Also introduce a program field in the advanced config that allows it to be switched to wahat-ever program they wish. However, to prevent mis-configuration, we should not just allow anything to be used for this field. It should be vetted against an allowed list. In theory, more than 1 external program could be associated with a partilcuar agent if they have the same interface (in reality, this would be unlikely, but we should not make any assumptions upfront).
We probably need to introduce a new section into the advanced config for the explicit purpose of holdig any setting regarding the third party executable.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.