Comments (1)
I realized it may be difficult to get all media to this human level of understanding maybe even impossible when the media doesn't contain human-level content. (a video with a graphic effect.)
Besides someone could also ask, WHAT level of comprehension? (... in space and time.)
It can be solved by adding an extra number/id that specifies how deep the analysis has been.
-
raw-data
Each bit would be specified by specific media (one for each media type to be specific)
For example whatever the algo does, but when picture number 10 Is presented bit number 10 must be set. -
concrete - human-understandable objects.
[building, elephant, trees, human, stone, big-face,...,]
only very frequent objects should be on the list to get each bit activated 50% of a time.
same here: other bits that MAY be activated... -
properties, including abstract once
[social, violent, natural, NSFW,....]
Supplying sample data could help to align the different algos better together without being part of the specification.
in case of 1) raw-data, images would be rotated, scaled,...
text; characters could be deleted, words moved...
in the case of 2,) collection of media could be supplied,
if we want an elephant-picture, would different elephants
in text, this could include multiple languages.
For the existing pHash algo one xor value can be calculated to comply with the specification.
contra:
implementation in raw data would be specified very little.
pro:
depending on the use case it is up to the user how deep he wants his content to be analyzed:
- raw-data-based, 2) object-based, 3) theme-based
this can be desired as some video hash algos can be slow (on specific hardware)
the aim would be to have a specification that does not need to be changed while being open to getting the content matching always up-to-date.
P.S:Reference media need to be in a lose-less format. Sample/training data does not.
from iscc-specs.
Related Issues (20)
- Only high level functions should be required for conformance
- Define a standard HTTP User Agent for requests that target ISCC processing
- Update/Remove outdated stubs in iscc.pyi
- Make more clear ISCC is based on content and metadata
- Mention related efforts to establish content based identifiers
- Remove Meta-ID requirement for combinations of components.
- Create constant for Instance-ID chunk size
- Test Instanc-ID with even and odd number of chunks
- As fallback create title from filename if no other metadata is available. HOT 2
- Keep detailed Content-ID fingerprints as ISCC metadata.
- Specify handling of PDF (ISO 32000)
- Specify details of "convert image to grayscale"
- Switch ci/tests to github actions
- Change Instance-ID algrorithm to BLAKE3 HOT 5
- Change Data-ID to chunking algorithm compatible with other implementations.
- Implement Content-Code for binary data. HOT 1
- Clearly define the base58 alphabet HOT 5
- Switch to base32 encoding
- Cython implementation of minimum_hash is 3x faster HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iscc-specs.