samvera / valkyrie Goto Github PK
View Code? Open in Web Editor NEWA Data Mapper library to enable multiple backends for storage of files and metadata in Samvera
License: Other
A Data Mapper library to enable multiple backends for storage of files and metadata in Samvera
License: Other
I think this is because it's storing them in string fields.
In the readme it says to run rake server:development
but this causes an error because the tmp
directory is not part of the git clone.
$ rake server:development
Loading configuration from /Users/jcoyne/.solr_wrapper.yml
Unable to copy /var/folders/9t/rygbnddx0b1ckw6tjs3m18qm0000gq/T/d20170706-12948-s9kond to tmp/blacklight-core: No such file or directory @ dir_s_mkdir - tmp/blacklight-core
I have no idea what the interface for this might be like, but the speed was dramatically different and it helped things a lot. A transaction buffer (as is possible now) looks something like this:
memory_adapter = Valkyrie::Persistence::Memory::Adapter.new
adapter = Valkyrie::AdapterContainer.new(
persister: CompositePersister.new(Valkyrie::Adapter.find(:postgres).persister, memory_adapter.persister),
query_service: Valkyrie::Adapter.find(:postgres).query_service
)
## Save a bunch of stuff via adapter.persister.save(model: book)
#
## Now that you have a bunch of saved objects in the database,
# you can DRASTICALLY speed up solr indexing (18 mins -> a few seconds for 2600) by doing them all in one call
Valkyrie::Adapter.find(:solr).persister.save_all(models: memory_adapter.query_service.find_all)
If the method name is find_by_id
, I don't think a named id
parameter provides any extra clarity.
.find_by_id(id: id)
vs.
.find(id: id)
or
.find_by_id(id)
Would prefer either of the latter two forms.
I think it's more honest.
How do we store mime_type and filename? In Fedora these are stored with the binary.
The shared specs are a good step, but as we start to solidify some of the interfaces we should probably find a good way to add proper documentation.
What features in an example repository which, when fulfilled, mean this is a viable pattern for Hydra (and/or Hyrax)?
Ideas I'd like feedback on:
I have a branch now which has two folders in this repository. However, now that I think about it, I wonder if we can turn off autoloading of the lib directory and just have an entire gem structure in lib/valkyrie
Convert methods to use named parameters in persister classes.
Supporting multiple types per property will be important for use cases such as controlled in-repository terms. Need some way to distinguish "3" as a remote ID from "3" the string.
When we work on #53 we're going to need to store the user's identifier. Originally I thought that was going to be the username or email or whatever devise said was the primary key, but I realized it might be better to simply support GlobalIDs as a data type in Valkyrie.
That way you could just have GlobalID turn them into objects if you wanted that, and there'd be a difference between "tpend" and "gid://app/User/1"
Virtus objects can have metadata attached to properties - ordered: true
seems like one we could add.
However, this will probably be annoyingly difficult to implement for the AF adapter.
For things like NOIDs. I think this is necessary - especially for migration.
It's in the charter.
I haven't dug into it a lot, but it seems to have a lot of good and similar opinions to Valkyrie, with a lot more work put into it:
Now there's two: Disk & Memory.
I think we'll need at least three for a valid prototype:
In the future I'd like to look at
The alternative is fix the raw Fedora adapter's performance problem (#72).
This would be nested resources (Using hash code URIs) for the edm:TimeSpan use case (UCSB)
I think this basically means make blacklight-access-controls work. What's the difference between that and HydraAccessControls? @jcoyne ?
File upload gems tend to be pretty locked into ActiveRecord norms. It would be nice if we could prove that Carrierwave could be used with a Valkyrie model without too much interference.
https://coderwall.com/p/e9d_ja/using-carrierwave-uploader-for-tableless-model-in-rails relevant?
Question here about where one should draw the line between "query powered by the fact that you have a Solr index" and "query necessary for the backend to support."
I suggest we return void. Returning a File could be expensive and we may not use the result.
It takes work in each persister to navigate back and forth between native ruby datatypes and the data-store. We need to document which data types we support.
Right now all that's supported is Internal IDs, language-tagged RDF Literals, and strings. Dates? Times? Integers? ::RDF::URIs?
In bulk migration use cases, it might be more efficient to load up a lot of resources, change them in memory, and then persist them all at once (at least for solr/postgres.) The implementations can sometimes be complex (postgres in particular), and it's not efficient for all adapters (AF for instance). Do we want this?
Piotr Solnica, the main dev behind virtus wrote this comment a while ago: https://www.reddit.com/r/ruby/comments/3sjb24/virtus_to_be_abandoned_by_its_creator/ and there hasn't been much activity on the gem recently.
This might not be an issue and Virtus might be stable enough for our needs, but we might have to eliminate Virtus at some point.
Create a short list of steps on what it takes to add a new work type (IE Book or Page). Consider a generator.
I think our forms have dirty tracking, but our models way don't (on purpose.) We should find a way to document that.
Probably going to be something along the lines of
fedora_adapter = Valkyrie::Adapter.find(:fedora)
postgres_adapter = Valkyrie::Adapter.find(:postgres)
book = fedora_adapter.query_service.find_by(id: "myid")
book.id = nil
new_book = postgres_adapter.persister.save(model: book)
Try really hard to use hydra-derivatives here.
This would be an adapter which is proven to be able to interact with the way Hyrax stores data in Fedora/Solr. It will probably be difficult, and isn't actually part of the charter.
The use case exists in Hyrax, and at least two institutions I know of use it (UCSB & CHF):
I have a record which has complex metadata as one of the properties - IE, a date range where it's important that the beginning and the end of the range are stored together.
Possible implementation:
it "can save nested resources" do
book = resource_class.new(title: "Sub-nested")
book2 = resource_class.new(title: "Nested", nested_resource: book)
book3 = persister.save(model: resource_class.new(nested_resource: book2))
reloaded = query_service.find_by(id: book3.id)
expect(reloaded.nested_resource.first.title).to eq ["Nested"]
expect(reloaded.nested_resource.first.nested_resource.first.title).to eq ["Sub-nested"]
end
Now, the problem: Getting that test to pass with the postgres & memory adapters took about 20 LOC. Both natively support the concept of nesting and the abstractions are already written and debugged. However, for the other two adapters:
ActiveFedora: There's no interface for "here's a nested resource, build out the hash URIs and handle this for me please." I can't imagine how to write one, either. I could see this working out with something lower level, IE a Fedora persister which directly integrates with LDP, but I don't think that's an option ATM. Maybe the solution here is to reach out to those institutions have implemented this and see what they've done, so we can at least have a compatibility layer.
Solr: There is no such thing as nesting. You can add "child documents", but they're indexed independently, require an ID, and don't have the same lifespan as their parents (https://issues.apache.org/jira/browse/SOLR-6096).
So I'm inclined to say we either:
DEPRECATION WARNING: schema_migrations_table_name is deprecated and will be removed from Rails 5.2 (called from block (2 levels) in <top (required)> at /Users/jcoyne/workspace/valkyrie/spec/support/database_cleaner.rb:4)
Right now there's four - Memory/Postgres/Fedora/Solr.
I think I'd like to keep Memory/Postgres/Fedora working at least. If Solr doesn't, it's not the end of the world. If it does, it might make some interactions easier. There may be ongoing problems with supporting multiple data-types though.
test.com
is a real domain โ we should probably use example.com
instead and/or make it easier to configure which URIs are used.
Also any references to repository
should be storage_adapter
.
Consistent naming is important.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.