Code Monkey home page Code Monkey logo

sn_data_types's Introduction

PLEASE NOTE THAT THIS REPOSITORY HAS NOW BEEN ARCHIVED

All data types development is now via the safe_network repository

sn_data_types

Safe Network Data Types

Crate Documentation CI Safe Rust
Documentation unsafe forbidden
MaidSafe website SAFE Dev Forum SAFE Network Forum

License

This SAFE Network library is dual-licensed under the Modified BSD (LICENSE-BSD https://opensource.org/licenses/BSD-3-Clause) or the MIT license (LICENSE-MIT https://opensource.org/licenses/MIT) at your option.

Contributing

Want to contribute? Great ๐ŸŽ‰

There are many ways to give back to the project, whether it be writing new code, fixing bugs, or just reporting errors. All forms of contributions are encouraged!

For instructions on how to contribute, see our Guide to contributing.

sn_data_types's People

Contributors

actions-user avatar bochaco avatar davidrusu avatar dirvine avatar iancoleman avatar jacderida avatar joshuef avatar lionel-faber avatar madadam avatar nbaksalyar avatar octol avatar oetyng avatar ravinderjangra avatar s-coyle avatar scorch-dev avatar ustulation avatar yoga07 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sn_data_types's Issues

Fix inconsistencies in `Response` variants

Responses that return a single value usually start with Get: GetADataValue, GetMDataShell, etc. Responses that may return multiple values are usually named starting with List: ListMDataValues, ListAuthKeysAndVersions, etc.

There are two inconsistencies, however, which may cause confusion: ListMDataUserPermissions, which returns a single value and should be GetMDataUserPermissions to match GetADataUserPermissions, and GetADataPermissions, which returns a list of values and should be ListADataPermissions to match ListMDataPermissions.

Fix the above when we do the next refactor.

Rename `Request::DeleteAData`

Just noticed this minor point: when possible, DeleteAData should probably be renamed to DeleteUnpubAData for consistency with DeleteUnpubIData.

Adjust naming and docs to reflect actual index behaviour

An entry is specified as a struct with key and value.

pub struct Entry {
    /// Key.
    pub key: Vec<u8>,
    /// Contained data.
    pub value: Vec<u8>,
}

Then follows this:

/// Returns the last entries index.
fn entries_index(&self) -> u64 {
    self.inner.data.len() as u64
 }

The comment and function name are misleading. It is returning the next unused index, not the "last entries index".
Consider a data with 0 items. There is no entry in it, and so it is not returning the last entries index.
A data with 1 item, will return 1, which is not the index of the last entry. The last entry, is the one and only existing Entry, at index 0.

Other examples of this:

/// If the specified last_entries_index does not match the last recorded entries index, an
/// error will be returned.
fn append(&mut self, mut entries: Entries, last_entries_index: u64) -> Result<()> {
[..]
    if last_entries_index != self.inner.data.len() as u64 {
        return Err(Error::InvalidSuccessor(self.inner.data.len() as u64));
}

This inconsistency percolates all the way to the top layers of the API.


Proposed change is to acknowledge (in all code and names everywhere) that we are actually passing in next unused index. Then passing in 0 when expecting an empty AD is exactly as it should be.

Name change in code and documentation:

last_entries_index -> next_unused_index

Exporting hex_string<->bytes conversion utilities

There are several consumers of sn_data_types that currently use some version of hex_string<->Vec<u8> conversion. This includes sn_node, sn_api, and sn_routing to name a few. All of these either re-implement functions like parse_hex or bytes_to_hex or pull in the hex or hex_fmt crates.

Usually, this is done in dealing with public/private keys exposed by this crate. Example use below:

Let bytes = hex_to_vec(cli_args.key);
Let key = PublicKey::Ed25519(
    ed2559_dalek::from_bytes(bytes)?
);

Some reimplementation/redundant sites I found:

  • sn_node
  • sn_api
  • sn_message(Actually, this isn't reimplementation as it is importing HexFmt when PublicKey could also implement fmt::LowerHex or just be converted to hex before display)

Consumers of sn_data_tyeps could benefit from reduced code redundancy and easier dependency management if sn_data_types re-exported the hex crate to expose functions like encode and decode that are used or re-implemented in the consumers of sn_data_types.

Alternatively, a convenience function could be added to utils.rs file to expose vec_to_hex and hex_to_vec functions so that consumers don't need to worry about introducing a new HexFormatError type exposed by the hex crate if we were to re-export hex directly.

I can throw together a PR in a bit to support this concept with a concrete implementation. But, for now, I'll put this git issue here to present the idea.

Description in readme.md

The readme for this repository should make it clear why this is needed in addition to the safe_core interfaces in the safe_client_libs repo. Just so devs know what to expect or not expect this library to do.

More context in this forum thread: https://forum.safedev.org/t/why-only-500-mib-limit-on-upload/2647/6 and from the preceding post:

This API in safe_core [fn get_unseq_mdata_value] and this API is safe-nd [fn get] might seem similar since they do the same thing i.e. get the value for a particular key in mutable data. The difference is the safe-nd API requires a mutable data object to be available in memory and the safe-core API fetches the value from the network.

support for streaming read/write of immutable data types

I'm looking into implementing FUSE directly on the Rust APIs and wondering what support exists or is planned for writing and reading data that may be too large to hold in memory.

I remember there were streaming r/w functions in the old SAFE NFS APIs, but am not sure if these just held the data in memory regardless.

It should be relatively straightforward to do streaming read and write given the data is chunked/de-chunked in the client, but I'm struggling to find where this happens in the current libraries. I have found the SelfEncryptor which I thought would be used by any relevant code, but I can't find anything which uses it!

Help!

  • So can you answer what if any streaming read/wrote support exists for immutable data, or if this is planned?

  • And can you point me to the code which currently generates the chunks and also where they are used to build the data map.

I built the Rust docs within safe-api, but it's still quite hard to find things and see how they work together, so I'd like to look at the code which does these things and maybe I can fill in any gaps myself until the API supports sequentiall read/write (assuming that's not there yet).

Restore maximum upper boundary for Coins

It could be considered standard practice that any type that is designed to represent a concept in the system, should not be allowed to take a value that is invalid for that concept.

The limit was recently removed, and the rationale for it was described as follows:

I think the limit could not be enforced the way it was implemented. It's not enough to check that the value of one Coins instance is within the limit because what we actually want is to check that the total amount of coins in the network is within limit which is more about the farming algorithm rules. So this check wasn't really useful and kind of provided false sense of security.

The above mentioned things seems to be very different things though.
One thing is to not allow an invalid representation, the other is to "check that the total amount of coins in the network is within limit"
I agree that the latter cannot be done with the former.
But that doesn't disqualify the rationale for the former.
I.e. not allow representing an invalid thing in the system.

I don't see that as providing false sense of security, because it seems obvious to me that checking that network amount is within limit cannot possibly be done that way.

Checking that total amount is within limit, is a completely different piece of functionality. Living somewhere else. But allowing Coins to surpass the max possible value, is breaking the definition, the domain logic, of what Coins is.

This is even more true in an interface library like safe-nd. The Coins struct will be used in many places. It would have been something else perhaps if it was a very internal piece in safe_vault for example, where there was an immediate logic some layer above that simply made it redundant.
But here we are providing a data type for external use, that is supposed to represent something very well defined in our system.
If we didn't call it Coins, but perhaps Amount, then it would have been something else, because an Amount doesn't have that boundary defined in our domain logic. But Coins have very strict boundaries / definition in the system.

And that is regardless of any other enforcing of business rules. (Otherwise, you're kind of coupling it to that business rule, expecting it to exist and cover up that case.)

But that's still off the point, these are two very different things really... so I don't see the connection.

To summarize

Types representing a business object should not be allowed to take invalid values for that object, regardless of any enforcement of other rules, such as "the network should not contain more than this or that limit".
In this case it's obvious that is not enforced in a Coins instance, since you could just add a multiple of the coins (of any value > 0), to break the invariant about how much should be allowed to exist in the system, since our system has a fixed amount for that, that is not going to change.

Consider adding a type for `BTreeMap<PublicKey, MDataPermissionSet>`

Consider adding a new type for BTreeMap<PublicKey, MDataPermissionSet> and replacing it in the six places where it occurs in mutable_data.rs, as well as two places in response.rs. This will make the code less verbose and will also be useful when we refer to this type in safe_client_libs.

Remove `unwrap` and possible network crash

From @Fraser999 's review comment here:

Hmm - I just noticed std::str::from_utf8(&self.data).unwrap() below (not part of this PR, I know).

Apart from the fact that we should always prefer the unwrap! macro, couldn't that cause the network to crash? There's no requirement for the values to be valid UTF8 strings afaik, so if vaults e.g. try and log a message containing one of these, they could panic.

I think for all large binary data values we should use Andreas' hex_fmt crate, as we do here for example, or if there no real benefit in seeing the value, then just printing .. is a good option too.

sequence cannot be fetched with ?v=1

fetching a sequence throws an error when ?v=1 is specified (lots of info in this post).

Simplest way to reproduce the bug:

$ safe seq store "first value"
$ safe seq append "second value" <the xorurl>
$ safe cat <the xorurl>?v=1
# this will go into ~10s 100% cpu usage followed by this error:
sn_cli error: [Error] VersionNotFound - Version '1' is invalid for the Sequence found at "<the xorurl>?v=1"

This is because of a bug in in_range.

/// Gets a list of items which are within the given indices.
pub fn in_range(&self, start: Index, end: Index) -> Option<Entries> {
let start_index = to_absolute_index(start, self.len() as usize)?;
let num_items = to_absolute_index(end, self.len() as usize)?; // end_index
self.current_lseq().map(|lseq| {
lseq.iter()
.take(num_items)
.enumerate()
.take_while(|(i, _)| i >= &start_index)
.map(|(_, entry)| entry.clone())
.collect::<Entries>()
})
}

L385 - num_items would have to be end-start, not just end, but there's no subtraction for Index type.

L389 and 391 - take and take_while don't communicate the underlying intention very clearly (in my opinion). I'm not up to speed on the most efficient way to extract a range of items in rust, but I replaced these two with a .filter with a single conditional that seemed to express the intention more clearly:
.filter(|(i, _)| i >= &start_index && i < &end_index)
but I'm not sure if this becomes inefficient for a large number of items (it has to iterate over all items).

My fix for this bug was to rename num_items to end_index and replace the two take* lines with a single .filter line, and this resolved the bug. But I have not submitted a PR because I feel the filter is too inefficient and am not sure what to use instead.

As a small additional detail, it would be handy to add documenting comments about the range, eg

/// Gets a list of items which are within the given indices.
/// Includes the item at `start`, excludes the item at `end`
/// for a total of `end-start` items. ie the range [start,end)
pub fn in_range(&self, start: Index, end: Index) -> Option<Entries> {
  ...

Module-level documentation not being generated by cargo doc

All modules are private so documentation for them does not appear in cargo doc.

Options:

  1. Move documentation to lib.rs
  2. Move all documentation to somewhere like a wiki or repo and link to it in the lib.rs documentation. This is what we do in safe_core.

Do not allow representation of invalid values for Coins

It could be considered standard practice that any type that is designed to represent a concept in the system, should not be allowed to take a value that is invalid for that concept.

The limit was recently removed, and the rationale for it was described as follows:

I think the limit could not be enforced the way it was implemented. It's not enough to check that the value of one Coins instance is within the limit because what we actually want is to check that the total amount of coins in the network is within limit which is more about the farming algorithm rules. So this check wasn't really useful and kind of provided false sense of security.

The above mentioned things seems to be very different things though.
One thing is to not allow an invalid representation, the other is to "check that the total amount of coins in the network is within limit"
I agree that the latter cannot be done with the former.
But that doesn't disqualify the rationale for the former.
I.e. not allow representing an invalid thing in the system.

I don't see that as providing false sense of security, because it seems obvious to me that checking that network amount is within limit cannot possibly be done that way.

Checking that total amount is within limit, is a completely different piece of functionality. Living somewhere else. But allowing Coins to surpass the max possible value, is breaking the definition, the domain logic, of what Coins is.

This is even more true in an interface library like safe-nd. The Coins struct will be used in many places. It would have been something else perhaps if it was a very internal piece in safe_vault for example, where there was an immediate logic some layer above that simply made it redundant.
But here we are providing a data type for external use, that is supposed to represent something very well defined in our system.
If we didn't call it Coins, but perhaps Amount, then it would have been something else, because an Amount doesn't have that boundary defined in our domain logic. But Coins have very strict boundaries / definition in the system.

And that is regardless of any other enforcing of business rules. (Otherwise, you're kind of coupling it to that business rule, expecting it to exist and cover up that case.)

But that's still off the point, these are two very different things really... so I don't see the connection.

To summarize

Types representing a business object should not be allowed to take invalid values for that object, regardless of any enforcement of other rules, such as "the network should not contain more than this or that limit".
In this case it's obvious that is not enforced in a Coins instance, since you could just add a multiple of the coins (of any value > 0), to break the invariant about how much should be allowed to exist in the system, since our system has a fixed amount for that, that is not going to change.

Rework `AppendOperation` struct

As this struct is only used in Request::Append, it is not necessary for it to contain address -- every other adata variant in Request already explicitly contains an address field. What we can do instead is make AppendOperation an enum with Seq and Unseq variants, so we can collapse the Request::AppendSeq and AppendUnseq variants into a single Request::AppendAData variant. This would simplify the API, but would require changes in safe_vault and safe_client_libs.

Inconsistent `new` functions for unpub types

Unpublished idata as well as mdata (which are always unpublished) always require an owner key to be created, with the new function. Unpublished adata, however, do not require an owner key in their new. This inconsistency should be addressed when possible.

Update encoding

We need to make the following changes:

  1. Merge the two error variants FailedToParseCoins and FailedToParseIdentity(String) into a single one FailedToParse(String) and ensure a meaningful message is added when constructing one of these variants (e.g. to replace the ones in coins.rs, consider something like Error::FailedToParse("Can't parse coin units".to_string()) and Error::FailedToParse("Can't parse coin remainder".to_string()).
  2. Create a utils.rs module. It should contain:
    • the verify_signature method currently in lib.rs
    • a serialisation helper:
    pub(crate) fn serialise<T: Serialize>(data: &T) -> Vec<u8> {
        unwrap!(bincode::serialize(data))
    }
    • an encoding helper:
    pub(crate) fn encode<T: Serialize>(data: &T) -> String {
        let serialised = serialise(&data);
        multibase::encode(Base::Base32z, &serialised)
    }
    • a decoding helper:
    pub(crate) fn decode<I: AsRef<str>, O: DeserializeOwned>(encoded: &I) -> Result<O> {
        let (base, decoded) =
            multibase::decode(encoded).map_err(|e| Error::FailedToParse(e.to_string()))?;
        if base != Base::Base32z {
            return Err(Error::FailedToParse(format!(
                "Expected z-base-32 encoding, but got {:?}",
                base
            )));
        }
        Ok(bincode::deserialize(&decoded).map_err(|e| Error::FailedToParse(e.to_string()))?)
    }
  3. Replace all occurrences of bincode::serialize with the new helper
  4. Remove base64 from the crate and exclusively use multibase z-base-32 encoding
  5. Change all existing helpers encode_to_base64 to encode_to_z_base_32 and similar change for decode helpers.
  6. Provide encode/decode helpers (and tests) for
    • XorName
    • PublicKey
    • AppPublicId
    • ClientPublicId
    • NodePublicId
    • PublicId
    • ADataAddress
    • IDataAddress
    • MDataAddress

Add missing `MData` functions

Add functions to MData that are implemented by its Seq and Unseq variants such as get, entries, values, etc. For example, entries should match on the variant and return MDataEntries instead of MDataSeqEntries and MDataUnseqEntries. This would simplify some code in SCL and vaults, for example:

(MDataKind::Seq, Ok(MData::Seq(mdata))) => {
    Response::ListMDataEntries(Ok(mdata.entries().clone().into()))
}
(MDataKind::Unseq, Ok(MData::Unseq(mdata))) => {
    Response::ListMDataEntries(Ok(mdata.entries().clone().into()))
}

could become:

(MDataKind::Seq, Ok(mdata)) if mdata.is_seq() |
(MDataKind::Unseq, Ok(mdata)) if mdata.is_unseq() => {
    Response::ListMDataEntries(Ok(mdata.entries().clone()))
}

This is low priority for now but it would make the API nicer to use.

Add missing Get function for AppendOnlyData.

Owners and Permissions can be fetched by their index, but an Entry (the data itself) cannot, which would be useful, instead of trying to use get_in_range. It seems get by index is just missing here.

Proposing to add

(in trait AppendOnlyData<P> in append_only_data.rs)

/// Fetches entry at index.
fn entry(&self, entry_index: impl Into<Index>) -> Option<&Entry>;

as well as

(in impl<P> AppendOnlyData<P> in append_only_data.rs)

/// Returns the entry at the index.
fn entry(&self, entry_index: impl Into<Index>) -> Option<&Entry> {
    let index = to_absolute_index(entry_index.into(), self.inner.data.len())?;
    self.inner.data.get(index)
}

and

(in impl Data in append_only_data.rs)

/// Fetches entry at index.
pub fn entry(&self, entry_index: impl Into<Index>) -> Option<&Entry> {
    match self {
        Data::PubSeq(data) => data.entry(entry_index),
        Data::PubUnseq(data) => data.entry(entry_index),
        Data::UnpubSeq(data) => data.entry(entry_index),
        Data::UnpubUnseq(data) => data.entry(entry_index),
    }
}

as well as corresponding test.

Any other places needing change for this @m-cat ?

Questions about the new data types

Note: this is a copy of a forum reply , posted here so it can be addressed at some point.

I was looking at the docs for sn_data_types to see what might have changed and they seem different to the model presented in the OP. Immutable data is clear enough (PrivateBlob and PublicBlob) but looking at SeqMap and UnseqMap I'm not sure of their correspondence with the OP, or how data that is unpublished becomes published.

I speculate that an UnseqMap is always private (as for Mutable Data in the OP as it is not versioned).

I am though having trouble reconciling the other new types with the descriptions of the types in the OP, so maybe a new overview in a new topic is needed?

For example, the docs day that SeqMap is for unpublished data, unlike the AppendOnly Data of the OP which can be published or unpublished. So there seem to be differences and I can't see how things become published in the API (but maybe this is still to be implemented).

I'm interested because I'm wondering how to implement a publicly writeable data structure to handle submission of issues by non-owners to a decentralised bug tracker (cf. git-bug). I'm looking for something like a public noticeboard or inbox, which can be polled for new issues by any client checking the status of a remote code repository.

One of the questions I was trying to answer is whether I can check the creator of each entry in such a publicly writeable "append only" style data structure? This would help discourage spam, by making it easier for a client to skip over entries created by identities from a blacklist.

Owner/Perm changes uses `next_unused_index` instead of `current_index` as references.

Background

In issue #120, it is proposed to correctly identify the index a user passes in when doing a mutation as next_unused_index instead of last_entry_index (and equivalently for owners and permissions index).

Problem

This issue describes another, but similar case.
To distinguish when an ownership or permission change happened, we specify the indices of data and one of the other (permissions or owners) at that time. Those indices are of the last item, so what we want is last_entry_index (and equivalently for owners and permissions index), and not next_unused_index.

It appears that when referring to what data/owner/permission index we were at when doing the change, we are now using next_unused_index, i.e. pointing to an index that doesn't yet exist.

Example 1:

If we look at check_unpub_permissiontest at the bottom of the file append_only_data.rs:

let mut inner = SeqAppendOnlyData::<UnpubPermissions>::new(XorName([1; 32]), 100);

        // no owner
        let data = Data::from(inner.clone());
        assert_eq!(
            data.check_permission(Action::Read, public_key_0),
            Err(Error::InvalidOwners)
        );

        // no permissions
        unwrap!(inner.append_owner(
            Owner {
                public_key: public_key_0,
                entries_index: 0,
                permissions_index: 0,
            },
            0,
        ));

This here seems to me that we're adding an owner before anything else is added.
So, we rightly pass in index 0 for next_unused_index for the owner.
But the owner object, states that entries and permissions is at index 0 when this change happened (the change being that the owner was added). But they are not, those vecs are empty.

Example 2:

If we take an arbitrary data version and permissions version, like 2743 and 14 respectively (i.e. those are the last indices of items in those vecs), the way it works now is that when an ownership change happens, it is stored with data_index: 2743 + 1 and permissions_index: 14 + 1 - both of which are not yet existing indices.
And when you want to get the actual data, and the actual permissions that were there when the ownership changed, you have to request it with [owner.data_index - 1], instead of just owner.data_index and equivalently for permissions.

Possible solutions

[...]

RUSTSEC-2020-0036: failure is officially deprecated/unmaintained

failure is officially deprecated/unmaintained

Details
Status unmaintained
Package failure
Version 0.1.8
URL rust-lang-deprecated/failure#347
Date 2020-05-02

The failure crate is officially end-of-life: it has been marked as deprecated
by the former maintainer, who has announced that there will be no updates or
maintenance work on it going forward.

The following are some suggested actively developed alternatives to switch to:

See advisory page for additional details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.