Code Monkey home page Code Monkey logo

ossfs's People

Contributors

aguschin avatar dependabot[bot] avatar efiop avatar github-actions[bot] avatar isidentical avatar karajan1001 avatar skshetry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ossfs's Issues

Solve ci failures

Now the ci test is in a failure state. It is because of the wrong ci script and the lack of secrets. To solve this we must repair the script and add secrets to it.

  • script repaired
  • add secrets

Add version support for ossfs.

OSS itself supports version objects, in some other fs like s3fs, it has already supported this kind of feature. ossfs should do this too.

Better handling errors from

Currently, the OSSFS's handling for the network exceptions is distributed and hard to maintain. We can add a middle level _call_ossfs to manage all of the network exceptions and translate them.

Migrate to simulator.

Using simulator can:

  1. greatly accelerate the tests ( can be executed parallelly).
  2. Provide other users a way to run the tests.

Unneed deleting cloud object.

Currently, in the put_file method resumable_upload would cause problems with the deleting the exist object in previous.

Support ECS ram role authentication

The code in core.py showed that the current implementation doesn't support ECS ram role authentication.
The capability is provided by oss2 with the combination of ProviderAuth and EcsRamRoleCredentialsProvider.
This is a very important authentication scenario and we hope this ossfs can prioritize the support.

Permission problem in ossfs.ls

Currently, the ossfs.ls would first call on the parent directory and then on itself. But the parent directory might have different permission if the object is at the root of the bucket and cause failure.

Address compatible with other fs.

In OSS, we didn't use an address like oss://bucket/path like in S3 or hdfs.
Instead, endpoints are always required for an OSS address.

Currently, the ossfs can only recognize an address like http://oss-cn-hangzhou.aliyuncs.com/mybucket/myobj, while the input might be oss://bucket/path

Dir cache for file status.

Get file status (isdir, isfile, size...) in remote is a very slow operation. Lots of these actions are required in some of the other operations. With the help of dircache, we could store these data in memory, these would greatly improve the performance.

Removing OSS-emulator related code.

Some of the tests on an OSS-Emulator would fail because of the different behaivor of the emulator and real server.
These codes shouldn't be included into ossfs.

set a default endpoint

ossfs should come with a default endpoint, otherwise if you don't set anything in your environment / config it will fail like this;

(.venv38) (Python 3.8.5+) [ 11:35ÖÖ ]  [ <hey>@<hey>:~/temp_remotes/ossfs_remote(master✗) ]
 $ dvc push -vv
#2021-07-15 11:35:36,932 TRACE: Namespace(all_branches=False, all_commits=False, all_tags=False, cd='.', cmd='push', cprofile=False, cprofile_dump=None, func=<class 'dvc.command.data_sync.CmdDataPush'>, glob=False, instrument=False, instrument_open=False, jobs=None, pdb=False, quiet=0, recursive=False, remote=None, run_cache=False, targets=[], verbose=2, version=None, with_deps=False)
2021-07-15 11:35:37,022 DEBUG: Check for update is enabled.
2021-07-15 11:35:37,526 DEBUG: Preparing to upload data to 'oss://dvc-test-github/batuhan-cache'
2021-07-15 11:35:37,526 DEBUG: Preparing to collect status from oss://dvc-test-github/batuhan-cache
2021-07-15 11:35:37,526 DEBUG: Collecting information from local cache...
2021-07-15 11:35:37,527 TRACE: Assuming '/home/isidentical/temp_remotes/ossfs_remote/.dvc/cache/8d/da64717821b0fcbbcdb48afe082822' is unchanged since it is read-only             
2021-07-15 11:35:37,528 DEBUG: Collecting information from remote cache...                                                                                                        
2021-07-15 11:35:37,528 DEBUG: Matched '0' indexed hashes
2021-07-15 11:35:37,528 DEBUG: Querying 1 hashes via object_exists
2021-07-15 11:35:37,537 ERROR: unexpected error - 'NoneType' object has no attribute 'strip'                                                                                      
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/isidentical/dvc/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/isidentical/dvc/dvc/command/base.py", line 50, in do_run
    return self.run()
  File "/home/isidentical/dvc/dvc/command/data_sync.py", line 57, in run
    processed_files_count = self.repo.push(
  File "/home/isidentical/dvc/dvc/repo/__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/isidentical/dvc/dvc/repo/push.py", line 44, in push
    pushed += self.cloud.push(objs, jobs, remote=remote)
  File "/home/isidentical/dvc/dvc/data_cloud.py", line 79, in push
    return remote_obj.push(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 57, in wrapper
    return f(obj, *args, **kwargs)
  File "/home/isidentical/dvc/dvc/remote/base.py", line 488, in push
    ret = self._process(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 345, in _process
    dir_status, file_status, dir_contents = self._status(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 193, in _status
    self.hashes_exist(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 145, in hashes_exist
    return indexed_hashes + self.odb.hashes_exist(list(hashes), **kwargs)
  File "/home/isidentical/dvc/dvc/objects/db/base.py", line 468, in hashes_exist
    remote_hashes = self.list_hashes_exists(hashes, jobs, name)
  File "/home/isidentical/dvc/dvc/objects/db/base.py", line 419, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/isidentical/dvc/dvc/objects/db/base.py", line 410, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/isidentical/dvc/dvc/fs/fsspec_wrapper.py", line 84, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/isidentical/ossfs/ossfs/core.py", line 450, in exists
    bucket = self._get_bucket(bucket_name, connect_timeout)
  File "/home/isidentical/ossfs/ossfs/core.py", line 139, in _get_bucket
    return oss2.Bucket(
  File "/home/isidentical/.venv38/lib/python3.8/site-packages/oss2/api.py", line 347, in __init__
    super(Bucket, self).__init__(auth, endpoint, is_cname, session, connect_timeout,
  File "/home/isidentical/.venv38/lib/python3.8/site-packages/oss2/api.py", line 191, in __init__
    self.endpoint = _normalize_endpoint(endpoint.strip())
AttributeError: 'NoneType' object has no attribute 'strip'
------------------------------------------------------------
^CTraceback (most recent call last):
  File "/home/isidentical/dvc/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/isidentical/dvc/dvc/command/base.py", line 50, in do_run
    return self.run()
  File "/home/isidentical/dvc/dvc/command/data_sync.py", line 57, in run
    processed_files_count = self.repo.push(
  File "/home/isidentical/dvc/dvc/repo/__init__.py", line 51, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/isidentical/dvc/dvc/repo/push.py", line 44, in push
    pushed += self.cloud.push(objs, jobs, remote=remote)
  File "/home/isidentical/dvc/dvc/data_cloud.py", line 79, in push
    return remote_obj.push(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 57, in wrapper
    return f(obj, *args, **kwargs)
  File "/home/isidentical/dvc/dvc/remote/base.py", line 488, in push
    ret = self._process(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 345, in _process
    dir_status, file_status, dir_contents = self._status(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 193, in _status
    self.hashes_exist(
  File "/home/isidentical/dvc/dvc/remote/base.py", line 145, in hashes_exist
    return indexed_hashes + self.odb.hashes_exist(list(hashes), **kwargs)
  File "/home/isidentical/dvc/dvc/objects/db/base.py", line 468, in hashes_exist
    remote_hashes = self.list_hashes_exists(hashes, jobs, name)
  File "/home/isidentical/dvc/dvc/objects/db/base.py", line 419, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/isidentical/dvc/dvc/objects/db/base.py", line 410, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/home/isidentical/dvc/dvc/fs/fsspec_wrapper.py", line 84, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/home/isidentical/ossfs/ossfs/core.py", line 450, in exists
    bucket = self._get_bucket(bucket_name, connect_timeout)
  File "/home/isidentical/ossfs/ossfs/core.py", line 139, in _get_bucket
    return oss2.Bucket(
  File "/home/isidentical/.venv38/lib/python3.8/site-packages/oss2/api.py", line 347, in __init__
    super(Bucket, self).__init__(auth, endpoint, is_cname, session, connect_timeout,
  File "/home/isidentical/.venv38/lib/python3.8/site-packages/oss2/api.py", line 191, in __init__
    self.endpoint = _normalize_endpoint(endpoint.strip())
AttributeError: 'NoneType' object has no attribute 'strip'

`info` performance optimization

In ossfs.ls, we first ls the object, then ls the directory. While in info it first ls the directory info then ls the special object. This is because the directory ls can benefit from dircache. Reordering the ls object and directory we can accelerate all of info-related operations (du, stat, size, isdir, isfile, ukey, etc.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.