seandenigris / resources-live Goto Github PK

4.0 2.0 0.0 1.1 MB

License: GNU General Public License v3.0

Smalltalk 100.00%

pharo smalltalk dynabook gtoolkit

resources-live's Introduction

Resources Live

TLDR: Free users to communicate with objects instead of managing files. Files conflate what a thing is with where it is. We take care of the boring location part for you. So you can "play an mp3" instead of "opening an mp3 file in a player app". Nicer, no?

Overview

NB. This section is an export of the class comment of BaselineOfResourcesLive. When viewed from inside the system, it is live, dynamic and beautiful. "Just the markdown" only gives you a taste. We suggest you dive in and view the documentation as it was intended as quickly as possible - it will be more enjoyable and productive!

Motivation

What is a file? A file is data, which often represents something in a user's domain e.g. a photo. Files that matter to users have a label e.g. /Users/me/image_1.jpg. This label consists of a name and a location - although with the advent of universal search (e.g. Mac Spotlight) one could debate how useful/interesting the location is in many cases. Typically the location is thought of as a concrete place. The filesystem is like an organized closet, where the closet is subdivided into shelves, and then the shelves are subdivided into plastic bins, etc.

But what does any of the above have to do with what matters? If we take a photo, and that photo happens to be stored digitally, why can't we deal with it abstractly so that we don't have to know or care what its label is or where its bits are stored?

One problem with the filesystem paradigm is that files need to be in one place, and can be linked to other places. This creates tension. The first pain point is balancing navigability and maintenance. If I have one giant folder, it can be hard to find any given thing, but if I have a folder tree 6 levels deep, it can be a maintenance headache, as well as requiring effort to drill down through all those levels.

The next pain point is where real-life categories aren't cleanly separated. Let's say I work for a company, and also belong to an employee union. I am interviewed about an issue on which the company and the union are aligned. Where do I put the video of the interview? In the company folder? In the union folder? Yes, I could symlink from one to the other, but the point is that it really belongs to both domains equally and filesystems don't support that concept. Also, linking doesn't work for all use cases e.g. syncing services like Dropbox and Google Sync can't properly handle them.

So we're mixing two concepts. What a digital thing is, and what it represents in the real world. ResourcesLive is an attempt to disentangle these two. It's mission is to handle the first - the digital entity, the file - magically for the user, although the user should be able to customize if they really care. It then models the second concept - what the digital thing represents in the real world - on its own terms. So an image file might have #edit and #view capabilities, a sound can be #play-ed, etc.

Usage

A typical use case is to have a resource library particular to your project. E.g. a library of mp3 files for a music app. We'll take that example, and assume you're using SimplePersistence to store your data.

Set File Library location. Otherwise, the default will use a system-wide location. NB Be careful with this because there is no conflict management if you have multiple images/apps using the same library. ResourcesLive uses its backup directory to determine the library location, so:

    MyProjectDB class>>initialize
    	"Add the following line *before* you send #restoreLastBackup"
    	ResourcesLiveDB backupDirectoryParent: self backupDirectoryParent.

Set up serialization

    MyProjectDB class>>schema
    	^ { MyProject. MyProject. ResourcesLiveDB }.

Installation

In GToolkit (preferably) or Pharo (v. 9 best supported at time of writing), do the following:

[
EpMonitor current disable.
[ Metacello new
	baseline: 'ResourcesLive';
	repository: 'github://seandenigris/ResourcesLive';
	"onConflict: [ :ex | ex allow ];"
	load ] ensure: [ EpMonitor current enable ].

] fork.

N.B. you only have to do the outer fork if on GT and you want the UI to stay responsive during the load.

Disclaimer

This project is part of a ~20 year (as of 2021) exploration of the Dynabook idea (a la Alan Kay). It's intensely personal and opinionated and I've open sourced it due to repeated requests. Use at your own risk. Any part may change at any time. I'm happy to give support when I have time in the form of explanations, but do not expect me to implement any particular feature, or even accept PRs if they don't feel right. That said, I'm happy to have anyone along on the journey :)

License Explanation

The license is MIT. However, my original intent was to release my Dynabook libraries under a copy far left license (free use for cooperatives, but negotiated licenses for those utilizing paid labor for profit). I love sharing any work I do, but am disgusted by the propect that (especially multi-billion-dollar) corporations will exploit my work for free, especially toward ends with which I don't philosophically agree. However, after many discussions with colleagues, it appears that at this moment there is just no way to protect one's work from parasites without effectively keeping it from everyone. Even GPL, which doesn't even come close to "solving" the problem stated above, seems enough to put off most people. In closing, now that my intentions are clear, I request the following from any entity utilizing wage labor or selling for profit who uses my work:

Attribution
Pay for what you use, or don't use it

While there may be no legal means for me to enforce the above given that this code is released under MIT, my intentions should be clear; violate the above at risk to your own conscience.

resources-live's People

Contributors

Stargazers

Watchers

resources-live's Issues

Reference files vs. program data

Should we make a distinction between:
a) reference files, which we want to stick around until we manually delete them, just like a normal filesystem
b) program data, like a sound file that pronounces a ForeignWord domain object

In the latter case, it's possible that the sound file is no longer relevant when there are no more references to it.

Possible avenues for exploration:

a garbage collection flag?
hold the resources in a weak collection?

Archive/Location Policies

Use case: I have Launcher GT templates. In over a year, I've never had to go back and use an old one, but since they've already been semi-manually created, it might be useful to keep the history around. However, I don't need it locally, but want to back the deprecated templates up to a server.

What to do if the server is not available? I was thinking about maybe queuing move commands and clearing the queue on next mounting/connection.
Design? How do we signal the library? Whose responsibility is the move primarily? Library? Resource? Collaboration?

Is a Resource a Resource without a Library?

OCR: Second Try based on user feedback

Use case:

When OCRing a receipt, the amount, 12.43 is mistakenly read as 12-43.
We know from the domain that there won't be a negative amount here, and certainly there wouldn't be a dash in the middle. The format must be something like: $?/d+(./d+)?
The user indicates that this particular area on the receipt is an amount
We want to retry to OCR just that area as an amount

How to do about this? Two ways that pop out are: 1) give a pattern to the engine?, or if we can't do that 2) restrict allowed characters to numbers and decimal (fairly straightforward with Tesseract - although there may have been a bug prior to 4.1)

Next question, who needs to do/know about this? In our OCR element, we currently have the capability for the user to say "this area should be an amount". Now we have the text and location. I guess for now we can put it in the the element. We want to:

See if the existing text is compatible
2a. If it is, use it
2b. If it isn't, re-OCR using some rules and try again (i.e. go to one, but don't get into an infinite loop)

CURRENT: Validation of number is embedded in visitor/reader - we should attempt to validate first?

Soft #deleteAll

It would be nice to go through the OS trash system instead of rm…
See https://apple.stackexchange.com/questions/50844/how-to-move-files-to-trash-from-command-line for possible Mac implementation

Sane Filenames

Although we could easily make totally random filenames (e.g. UIDs), what if the project becomes abandonware or the data gets corrupted? We'd like to leave the filesystem in a state so that operations can continue merrily along without the app. So the principle is: Looking at the filesystem, no one should have any idea that the app even exists; it should look like the work of an extremely-disciplined user.

Ensure file locations are contained by lib folder

FS magic can get confusing e.g. (FileLocator home / ') resolve' = FileLocator root resolve "WTF".

Here's a snippet to translate a reference into a "locator + path":

locator := FileLocator home.
subchild := (FileLocator imageDirectory / 'k') resolve: 'd/j/r'.
relPath := locator makeRelative: subchild.
result := locator withPath: relPath.
self assert: result = subchild.

Moves: To Verify or Not to Verify

A move is just updating the directory index, not moving the bits. If we force a secure move, then the file will have to be copied unnecessarily. One could see maybe different use cases would favor different strategies. Maybe move and then verify via checksum would be a good default strategy?

URL Import - Detect File Type?

E.g. if we are downloading HTML vs. mp3, can we select the right resource subclass? Zinc? downloadTo:?

Use case: Import Image from Webpage

Imported from ResourceInfo class comment 2015-08-10: I have a webpage open in Safari and I want to import an image from the page into Smalltalk

Library - New Item Spotter

Is string aUrl or aFile?

Then adapt:

gtSpotterNewFor: aStep
	<gtSearch>
	aStep listProcessor
		title: 'New Resource';
		priority: 50;
		itemName: [ :input | '+ ', input ];
		previewObject: [ :input | LlAuthoredWork new title: input; yourself ];
		wantsToDisplayOnEmptyQuery: false;
		filter: GtNullFilter
			item: [ :filter :context | 
				| isExisting |
				isExisting := self works anySatisfy: [ :e | 
					e title = context textTrimmed ].
				isExisting ifFalse: [ 
					filter value: context textTrimmed ] ];
		send: [ :newName :spotterStep | 
			| work |
			work := LlAuthoredWork new title: newName; yourself.
			self beAwareOf: work.
			work ]

Guaranteed Unique Filenames

For now, we just have a #deny: check in #import: to ensure destination file doesn't already exist. If in the future, we do a counter or UUID, the files can be updated with e.g.:

self collect: [ :e | | destination |
    destination := e file parent / RlResources library nextID asString, e file extension ].

I asked about UUIDs on Pharo Users and Peter Uhnák seemed to think they are robust, including across platforms and images.

HTML SmaCC Parser

Initial import from antlr4 is already done. I used John Brant's script to convert the grammar for both the lexer and parser from https://raw.githubusercontent.com/antlr/grammars-v4/master/html. I pasted the results into the source view of https://github.com/seandenigris/Resources-Live/blob/master/src/ResourcesLive/RlHTMLParser.class.st, which also generated https://github.com/seandenigris/Resources-Live/blob/master/src/ResourcesLive/RlHTMLScanner.class.st, but the parser does not work. To fix it (per John Brant on Discord GToolkit help channel 10/12/2020):

Looking at your grammar, I think the next step would be to try to fix the TODO parts that are in the grammar that the conversion tool couldn't handle. It appears that there are two main issues with the grammar that weren't handled by the conversion. The first is that SmaCC doesn't have non-greedy matching for the scanner (.?). The other is the pushMode/popMode code. For the non-greedy matching, the regex needs to be modified. Some of them are easy to modify like SCRIPT_OPEN which can be changed to <script [^\>] > since it only ends with a > we can take any character except for the >. For items like SCRIPTLET that end with either a ?> or %>, then you would need a more complex regex similar to the one for a C-style comment /* / (e.g., /* [^\*] *+ ([^\/\*] [^\*]* *+)* / handles C comments). For the push/popMode stuff, you'll need to add a production before the token is used in the grammar. For example, in the script production, you would write PushScript <SCRIPT_OPEN> .... Then you'll need to create a PushScript : [self scope: #SCRIPT]; . Similarly for popMode, you would create a production like Pop to add before that token. For now, you could define it as Pop : [self scope: #default];. If a stack is really needed, then the push and pop rules will need to be modified a little.

Emails - Hard to get timestamp

There are situations where the Date header is in another timezone (seems to often be GMT). The date in the local TZ might appear in another header, but not necessarily be easily parsable. Since Pharo's TZ implementation is broken (i.e. it can't convert from GMT to a local historical offset due to DST), allow a cacheable timestamp which can be set manually when needed.

Can't Download from GH Release URL

e.g. the following fails:

'https://github.com/SquareBracketAssociates/Booklet-DataFrame/releases/download/continuous/DataFrame-wip.pdf' asUrl asRlResource

Reported at svenvc/zinc#89 . There are several possible workarounds described in svenvc/zinc#69, but requires more recent Zinc than that in GT - reported on Discord

De-duplication

E.g. a file that was an email attachment that is also filed by topic.

Two possible options:

If email is documentation, link from topic and recreate (will probably be compressed)
Remove multipart and link to topic

Merge LITCacheable with RlCache

Refer to seandenigris/Living-Library#17

Resource drag and drop - handle once for lib or for each resource type?

File Locations

The dream is that we hand a file to the library and it figures it out. The point of this library is to take away the burden of file-y things like locations. However, there may be cases where manipulating the location might be beneficial.

Example 1: The Cloud

I store a PDF in Google Drive. This introduces a concern not handled by the file

Example 2: Fine-grained control over

For example, I have some files that would be most easily navigable in a hierarchy. I don't want to write a web app to access my data, so I dump them into the cloud so that I have a poor man's UI on my mobile device by navigating through the files. If they were all in the same folder and there were a lot of files, this would not be possible.

Now, the next issue is where to put the logic. Injecting it into RL naively is causing a rapid multiplication of messages to handle every cell in the matrix i.e. file + location + move/copy. Two possible ideas are an Import object, or provide hooks for the object that gets passed in e.g. anObject preferredLocationIn: self "aResourceLibrary"

Archive some emails without attachment

Email Quicklook (Mac)

Was using qlamanage, which now seems to be broken (see fault) for eml files. Maybe this helps?

Importing Emails from Mac Mail

Problem description moved to wiki page. Now we just have to automate and integrate the process.

ApplescriptScript

Where was I going with this?! Let's remove it for now (ResourcesLive-SeanDeNigris.3)

Cloud Services - e.g. Google Drive and Dropbox

We'd like to represent the file locations in an abstract way so that they can be found locally if synced, and remotely if not. For the moment, for Drive, we have Resources wrapping the file's URL

HEIC - cache jpeg version

Currently a new temp file is created on each access

[MetaC]: Why is PharoEnhancements Not Loaded?

It is a dependency of our monolithic main package, but it seems not to be loaded on GT on P10. The load was working previous on P9-based GT.

Resource asMorph - do we want to pull in LivingWhatever?

Remove #macLock:

Ported to Pharo-Enhancements (where it should've been!!)

Library Folder: "Moving" vs. "Setting"

If I move my folder, all the resources get moved. But let's say I'm making a change that doesn't require moving the resources e.g. from /Users/me/Library to FileLocator library which points to the same place. A simple setter like #folder: is so ingrained as meaning just set the variable, that I find it confusing that it also moves all the resources. I wonder if it wouldn't be better to have #moveFolder:, so one better realizes the implications, and then maybe #basicFolder: could just set the var?

aZnUrl asRlResource vs. LlWebResource

Web resources can be cached, and ZnUrls are downloaded and the url is set as the source. Are these really two different use cases?

Can we generalize from File to Location?

The original title of this issue was "Model File or Domain Object?" e.g. Email vs. EmailFile.
In my experience so far, it seems important that the Domain is the primary object. Maybe it uses a specialized file object underneath. In fact, the question seems to be even bigger: maybe there's a generalized location underneath, of which a file could be one, but a pointer to a physical resource (e.g. "in my file cabinet under 'z'", or a resource on the web (e.g. an article, youtube video)

Import Really Has 3 Strategies

The options are:

Move
Copy
In-place

Can we unify into one, instead of separate importInPlace:?

Protected File URLs

We have a URL that points to a file, which requires some login before access is permitted. Maybe there's an abstraction missing, like what we're handing out is not really a plain URL, but a Mechanize link, which could be mostly polymorphic with URL, but carries the client with it.

LivingLibrary vs. ResourcesLive

What is the relationship between these two projects?

LivingLibrary

Presents everything - paper paterials, websites, files…

ResourcesLive

Like a SuperFinder, it represents files as live objects, totally customizable to the user with no context switching

Conclusion

LivingLibrary would be the outer layer, and would use ResourcesLive to manage files