Code Monkey home page Code Monkey logo

tencentyun / hadoop-cos Goto Github PK

View Code? Open in Web Editor NEW
78.0 42.0 47.0 107.68 MB

hadoop-cos(CosN文件系统)为Apache Hadoop、Spark以及Tez等大数据计算框架集成提供支持,可以像访问HDFS一样读写存储在腾讯云COS上的数据。同时也支持作为Druid等查询与分析引擎的Deep Storage

Home Page: https://cloud.tencent.com/document/product/436/6884?!preview=true&lang=zh

License: MIT License

Java 99.12% Shell 0.88%
apache-hadoop alluxio hadoop-compatible-filsystem tencent-cloud-cos

hadoop-cos's Introduction

HADOOP-COS

功能说明

Hadoop-COS实现了以腾讯云 COS 作为底层文件系统运行上层计算任务的功能,支持使用Hadoop、Spark以及Tez等处理存储在腾讯云COS对象存储系统上的数据。

使用限制

只适用于 COS V5 版本

使用环境

系统环境

Linux 或 Windows 系统

软件依赖

Hadoop-2.6.0及以上版本

NOTE

  1. 目前hadoop-cos已经正式被Apache Hadoop-3.3.0官方集成:https://hadoop.apache.org/docs/r3.3.0/hadoop-cos/cloud-storage/index.html
  2. 在Apache Hadoop-3.3.0 之前版本或CDH集成Hadoop-cos jar 包后,需要重启NameNode才能加载到jar包。
  3. 需要编译具体Hadoop版本的jar包可更改pom文件中hadoop.version进行编译。

安装方法

获取 hadoop-cos 分发包及其依赖

下载地址:hadoop-cos release

安装hadoop-cos

  1. 将hadoop-cos-{hadoop.version}-x.x.x.jar和cos_api-bundle-5.x.x.jar 拷贝到 $HADOOP_HOME/share/hadoop/tools/lib下。

NOTE: 根据hadoop的具体版本选择对应的jar包,若release中没有提供匹配版本的jar包,可自行通过修改pom文件中hadoop版本号,重新编译生成。

  1. 修改 hadoop_env.sh 在 $HADOOP_HOME/etc/hadoop目录下,进入 hadoop_env.sh,增加如下内容,将 cosn 相关 jar 包加入 Hadoop 环境变量:
for f in $HADOOP_HOME/share/hadoop/tools/lib/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
  else
    export HADOOP_CLASSPATH=$f
  fi
done

使用方法

HADOOP配置

修改 $HADOOP_HOME/etc/hadoop/core-site.xml,增加 COS 相关用户和实现类信息,例如:

<configuration>

    <property>
        <name>fs.defaultFS</name>
        <value>cosn://examplebucket-1250000000000</value>
    </property>

    <property>
        <name>fs.cosn.credentials.provider</name>
        <value>org.apache.hadoop.fs.auth.SimpleCredentialProvider</value>
        <description>

            This option allows the user to specify how to get the credentials.
            Comma-separated class names of credential provider classes which implement
            com.qcloud.cos.auth.COSCredentialsProvider:

            1.org.apache.hadoop.fs.auth.SessionCredentialProvider: Obtain the secretId and secretKey from the URI:cosn://secretId:secretKey@example-1250000000000/;
            2.org.apache.hadoop.fs.auth.SimpleCredentialProvider: Obtain the secret id and secret key
            from fs.cosn.userinfo.secretId and fs.cosn.userinfo.secretKey in core-site.xml;
            3.org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider: Obtain the secret id and secret key
            from system environment variables named COS_SECRET_ID and COS_SECRET_KEY.

            If unspecified, the default order of credential providers is:
            1. org.apache.hadoop.fs.auth.SessionCredentialProvider
            2. org.apache.hadoop.fs.auth.SimpleCredentialProvider
            3. org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider
            4. org.apache.hadoop.fs.auth.SessionTokenCredentialProvider
            5. org.apache.hadoop.fs.auth.CVMInstanceCredentialsProvider
            6. org.apache.hadoop.fs.auth.CPMInstanceCredentialsProvider
            7. org.apache.hadoop.fs.auth.EMRInstanceCredentialsProvider
        </description>
    </property>

    <property>
        <name>fs.cosn.userinfo.secretId</name>
        <value>xxxxxxxxxxxxxxxxxxxxxxxxx</value>
        <description>Tencent Cloud Secret Id</description>
    </property>

    <property>
        <name>fs.cosn.userinfo.secretKey</name>
        <value>xxxxxxxxxxxxxxxxxxxxxxxx</value>
        <description>Tencent Cloud Secret Key</description>
    </property>

    <property>
        <name>fs.cosn.bucket.region</name>
        <value>ap-xxx</value>
        <description>The region where the bucket is located</description>
    </property>

    <property>
        <name>fs.cosn.bucket.endpoint_suffix</name>
        <value>cos.ap-xxx.myqcloud.com</value>
        <description>COS endpoint to connect to.
        For public cloud users, it is recommended not to set this option, and only the correct area field is required.</description>
    </property>

    <property>
        <name>fs.cosn.impl</name>
        <value>org.apache.hadoop.fs.CosFileSystem</value>
        <description>The implementation class of the CosN Filesystem</description>
    </property>

    <property>
        <name>fs.AbstractFileSystem.cosn.impl</name>
        <value>org.apache.hadoop.fs.CosN</value>
        <description>The implementation class of the CosN AbstractFileSystem.</description>
    </property>

    <property>
        <name>fs.cosn.tmp.dir</name>
        <value>/tmp/hadoop_cos</value>
        <description>Temporary files will be placed here.</description>
    </property>

    <property>
        <name>fs.cosn.upload.buffer</name>
        <value>mapped_disk</value>
        <description>The type of upload buffer. Available values: non_direct_memory, direct_memory, mapped_disk</description>
    </property>

    <property>
        <name>fs.cosn.upload.buffer.size</name>
        <value>33554432</value>
        <description>The total size of the upload buffer pool. -1 means unlimited.</description>
    </property>

    <property>
        <name>fs.cosn.upload.part.size</name>
        <value>8388608</value>
        <description>The part size for MultipartUpload.
        Considering the COS supports up to 10000 blocks, user should estimate the maximum size of a single file.
        For example, 8MB part size can allow  writing a 78GB single file.</description>
    </property>

    <property>
        <name>fs.cosn.maxRetries</name>
        <value>3</value>
        <description>The maximum number of retries for reading or writing files to
    COS, before we signal failure to the application.</description>
    </property>

    <property>
        <name>fs.cosn.retry.interval.seconds</name>
        <value>3</value>
        <description>The number of seconds to sleep between each COS retry.</description>
    </property>

    <property>
        <name>fs.cosn.read.ahead.block.size</name>
        <value>‭1048576‬</value>
        <description>
            Bytes to read ahead during a seek() before closing and
            re-opening the cosn HTTP connection.
        </description>
    </property>

    <property>
        <name>fs.cosn.read.ahead.queue.size</name>
        <value>8</value>
        <description>The length of the pre-read queue.</description>
    </property>

    <property>
        <name>fs.cosn.customer.domain</name>
        <value></value>
        <description>The customer domain.</description>
    </property>

    <property>
        <name>fs.cosn.server-side-encryption.algorithm</name>
        <value></value>
        <description>The server side encryption algorithm.</description>
    </property>

     <property>
        <name>fs.cosn.server-side-encryption.key</name>
        <value></value>
        <description>The SSE-C server side encryption key.</description>
    </property>

     <property>
        <name>fs.cosn.client-side-encryption.enabled</name>
        <value></value>
        <description>Enable or disable the client encryption function</description>
    </property>
  
    <property>
        <name>fs.cosn.client-side-encryption.public.key.path</name>
        <value>/xxx/xxx.key</value>
        <description>The absolute path to the public key</description>
    </property>
  
      <property>
        <name>fs.cosn.client-side-encryption.private.key.path</name>
        <value>/xxx/xxx.key</value>
        <description>The absolute path to the private key</description>
    </property>

</configuration>

配置项说明

属性键 说明 默认值 必填项
fs.defaultFS 配置hadoop默认使用的底层文件系统,如果想使用cos作为hadoop默认文件系统,则此项应设置为cosn://bucket-appid,此时可以通过文件路径访问cos对象,如/hadoop/inputdata/test.dat。若不想把cos作为hadoop默认文件系统,则不需要修改此项,当需要访问cos上的对象时,则指定完整的uri即可,如cosn://testbucket-1252681927/hadoop/inputdata/test.dat来访问。
fs.cosn.credentials.provider 配置secret id和secret key的获取方式。当前已经支持八种获取方式:1.org.apache.hadoop.fs.auth.SessionCredentialProvider:从请求URI中获取secret id和secret key,其格式为:cosn://{secretId}:{secretKey}@examplebucket-1250000000000/; 2.org.apache.hadoop.fs.auth.SimpleCredentialProvider:从core-site.xml配置文件中读取fs.cosn.userinfo.secretId和fs.cosn.userinfo.secretKey来获取secret id和secret key; 3.org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider:从系统环境变量COS_SECRET_ID和COS_SECRET_KEY中获取;4.org.apache.hadoop.fs.auth.SessionTokenCredentialProvider: 设置token;5.org.apache.hadoop.fs.auth.CVMInstanceCredentialsProvider:利用腾讯云云服务器(CVM)绑定的角色,获取访问 COS 的临时密钥; 6.org.apache.hadoop.fs.auth.CPMInstanceCredentialsProvider:利用腾讯云黑石物理机(CPM)绑定的角色,获取访问 COS 的临时密钥;7.org.apache.hadoop.fs.auth.EMRInstanceCredentialsProvider:利用腾讯云 EMR 实例绑定的角色,获取访问 COS 的临时密钥;8.org.apache.hadoop.fs.auth.RangerCredentialsProvider 使用 ranger 进行获取秘钥。 如果不指定改配置项,默认会按照以下顺序读取: 1.org.apache.hadoop.fs.auth.SessionCredentialProvider; 2.org.apache.hadoop.fs.auth.SimpleCredentialProvider; 3.org.apache.hadoop.fs.auth.EnvironmentVariableCredentialProvider; 4.org.apache.hadoop.fs.auth.SessionTokenCredentialProvider; 5.org.apache.hadoop.fs.auth.CVMInstanceCredentialsProvider; 6.org.apache.hadoop.fs.auth.CPMInstanceCredentialsProvider; 7.org.apache.hadoop.fs.auth.EMRInstanceCredentialsProvider;
fs.cosn.useHttps 配置是否使用https协议。 false
fs.cosn.bucket.endpoint_suffix 指定要连接的COS endpoint,该项为非必填项目。对于公有云COS用户而言,只需要正确填写上述的region配置即可。兼容原配置项:fs.cosn.userinfo.endpoint_suffix。配置该项时请删除fs.cosn.bucket.region配置项endpoint才能生效。
fs.cosn.userinfo.secretId/secretKey 填写您账户的API 密钥信息。可通过 云 API 密钥 控制台 查看。
fs.cosn.impl cosn对FileSystem的实现类,固定为 org.apache.hadoop.fs.CosFileSystem。
fs.AbstractFileSystem.cosn.impl cosn对AbstractFileSy stem的实现类,固定为org.apache.hadoop.fs.CosN。
fs.cosn.posix_extension.enabled 是否开启 CosN 文件系统扩展语义支持,开启后可以支持 seek write 随机写语义以及 HCFS 的 hflush/hSync 语义。但是需要注意的是,开启后会因为读写放大带来性能下降,默认还是需要关闭。 false
fs.cosn.bucket.region 请填写您的地域信息,枚举值为 可用地域 中的地域简称,如ap-beijing、ap-guangzhou等。 兼容原配置项:fs.cosn.userinfo.region。
fs.cosn.tmp.dir 请设置一个实际存在的本地目录,运行过程中产生的临时文件会暂时放于此处。 /tmp/hadoop_cos
fs.cosn.block.size CosN文件系统每个block的大小,默认为128MB ‭134217728‬(128MB)
fs.cosn.upload.buffer CosN文件系统上传时依赖的缓冲区类型。当前支持三种类型的缓冲区:非直接内存缓冲区(non_direct_memory),直接内存缓冲区(direct_memory),磁盘映射缓冲区(mapped_disk)。非直接内存缓冲区使用的是JVM堆内存,直接内存缓冲区使用的是堆外内存,而磁盘映射缓冲区则是基于内存文件映射得到的缓冲区。 mapped_disk
fs.cosn.upload.buffer.size CosN文件系统上传时依赖的缓冲区大小,如果指定为-1,则表示不限制。若不限制缓冲区大小,则缓冲区类型必须为mapped_disk。如果指定大小大于0,则要求该值至少大于等于一个block的大小。兼容原配置项:fs.cosn.buffer.size。 134217728(128MB)
fs.cosn.upload.part.size 分块上传时每个part的大小。由于 COS 的分块上传最多只能支持10000块,因此需要预估最大可能使用到的单文件大小。例如,part size 为8MB时,最大能够支持78GB的单文件上传。 part size 最大可以支持到2GB,即单文件最大可支持19TB。 8388608(8MB)
fs.cosn.upload_thread_pool 文件流式上传到COS时,并发上传的线程数目 10
fs.cosn.io_thread_pool.maxSize 用于限制 IO 线程池的最大线程数 2 * CPU cores + 1
fs.cosn.copy_thread_pool 目录拷贝操作时,可用于并发拷贝和删除文件的线程数目 3
fs.cosn.read.ahead.block.size 预读块的大小 ‭1048576‬(1MB)
fs.cosn.read.ahead.queue.size 预读队列的长度 8
fs.cosn.maxRetries 该配置主要针对读写CosN时候触发频控以及服务端抖动引发的错误进行重试 200
fs.cosn.client.maxRetries 该配置主要针对COS的客户端侧因为弱网络或DNS服务抖动引发的错误进行重试 5
fs.cosn.retry.interval.seconds 每次重试的时间间隔,主要针对服务端错误重试(fs.cosn.maxRetries) 3
fs.cosn.max.connection.num 配置COS连接池中维持的最大连接数目,这个数目与单机读写COS的并发有关,建议至少大于或等于单机读写COS的并发数 1024
fs.cosn.customer.domain 配置COS的自定义域名,默认为空
fs.cosn.server-side-encryption.algorithm 配置COS服务端加密算法,支持SSE-C和SSE-COS,默认为空,不加密
fs.cosn.server-side-encryption.key 当开启COS的SSE-C服务端加密算法时,必须配置SSE-C的密钥,密钥格式为base64编码的AES-256密钥,默认为空,不加密
fs.cosn.client-side-encryption.enabled 是否开启客户端加密,默认不开启。开启后必须配置客户端加密的公钥和私钥。此时无法使用append,truncate接口。 false
fs.cosn.client-side-encryption.public.key.path 客户端加密公钥文件的绝对路径
fs.cosn.client-side-encryption.private.key.path 客户端加密私钥文件的绝对路径
fs.cosn.crc64.checksum.enabled 是否开启CRC64校验。默认不开启,此时无法使用hadoop fs -checksum命令获取文件的CRC64校验值。 false
fs.cosn.crc32c.checksum.enabled 是否开启CRC32c校验。默认不开启,此时无法使用hadoop fs -checksum命令获取文件的CRC32C校验值。只能开启一种校验方式 false
fs.cosn.traffic.limit 上传下载带宽的控制选项,819200 ~ 838860800,单位为bits/s。默认值为-1,表示不限制。 -1

开始使用

命令格式为:hadoop fs -ls -R cosn://examplebucket-1250000000000/<路径>hadoop fs -ls -R /<路径>(配置了fs.defaultFS选项为 cosn:// 后) ,下例中以名称为 hdfs-test-1252681929 的 bucket 为例,可在其后面加上具体路径。

hadoop fs -ls -R cosn://hdfs-test-1252681929/
-rw-rw-rw-   1 root root       1087 2018-06-11 07:49 cosn://hdfs-test-1252681929/LICENSE
drwxrwxrwx   - root root          0 1970-01-01 00:00 cosn://hdfs-test-1252681929/hdfs
drwxrwxrwx   - root root          0 1970-01-01 00:00 cosn://hdfs-test-1252681929/hdfs/2018
-rw-rw-rw-   1 root root       1087 2018-06-12 03:26 cosn://hdfs-test-1252681929/hdfs/2018/LICENSE
-rw-rw-rw-   1 root root       2386 2018-06-12 03:26 cosn://hdfs-test-1252681929/hdfs/2018/ReadMe
drwxrwxrwx   - root root          0 1970-01-01 00:00 cosn://hdfs-test-1252681929/hdfs/test
-rw-rw-rw-   1 root root       1087 2018-06-11 07:32 cosn://hdfs-test-1252681929/hdfs/test/LICENSE
-rw-rw-rw-   1 root root       2386 2018-06-11 07:29 cosn://hdfs-test-1252681929/hdfs/test/ReadMe

运行 MapReduce 自带的 wordcount

注意: 以下命令中 hadoop-mapreduce-examples-2.7.2.jar 是以 2.7.2 版本为例,如版本不同,请修改成对应的版本号。

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount cosn://examplebucket-1250000000000/mr/input cosn://examplebucket-1250000000000/mr/output3

执行成功会返回统计信息,示例如下:

File System Counters
        COSN: Number of bytes read=72
        COSN: Number of bytes written=40
        COSN: Number of read operations=0
        COSN: Number of large read operations=0
        COSN: Number of write operations=0
        FILE: Number of bytes read=547350
        FILE: Number of bytes written=1155616
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=0
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=0
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Map-Reduce Framework
        Map input records=5
        Map output records=7
        Map output bytes=59
        Map output materialized bytes=70
        Input split bytes=99
        Combine input records=7
        Combine output records=6
        Reduce input groups=6
        Reduce shuffle bytes=70
        Reduce input records=6
        Reduce output records=6
        Spilled Records=12
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=653262848
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=36
    File Output Format Counters
        Bytes Written=40

Bucket独立配置

背景: 跨园区访问不同的bucket,每个bucket的配置项不同,为此支持了一次配置,访问多个bucket的需求。 核心: 单独设置bucket维度的配置优先走独立配置项,没有设置独立配置项的走原配置,原配置没有设置的走默认配置。 例如: fs.cosn.upload.buffer的subkey(trim fs.cosn. prefix)是 upload.buffer. 如果bucket-appid是 test-123456, 那么bucket的独立配置项就是fs.cosn.bucket.test-123456.upload.buffer (格式fs.cosn.bucket.{bucket-appid}.{subkey}) 使用: hadoop fs -ls cosn://test-123456/ 注意:

  1. 如果设置了fs.cosn.userinfo.appid则可以通过设置fs.cosn.bucket.{bucket}.{subkey} 使用方式hadoop fs -ls cosn://test/
  2. 如果使用了元数据加速桶,想使用hadoop fs -ls cosn//test/方式访问 除了需要配置fs.cosn.userinfo.appid外需要设置fs.cosn.trsf.fs.ofs.use.short.bucketname为true

简化Bucket独立配置

背景: 由于旧版本OFS JAVA SDK对挂载点格式做了访问限制,只支持bucket-appid方式进行访问, 新版本(1.1.8+)支持bucket方式进行访问,不需要带appid。hadoopcos 8.3.0开始支持 如需cosn://bucket/ posix方式访问元数据加速桶则添加添加配置项: fs.cosn.trsf.fs.ofs.use.short.bucketname true fs.cosn.userinfo.appid 12345678 同理可以简化上一节各Bucket独立配置项通过添加如下配置做到cosn://test/得方式进行访问 fs.cosn.bucket.test.trsf.fs.ofs.use.short.bucketname true fs.cosn.bucket.test.upload.buffer 1024 fs.cosn.bucket.test. *

插件配置优先级说明

插件配置优先级:Bucket独立配置项大于元数据加速桶配置项(fs.cosn.trsf.) 大于默认配置。 插件配置转化顺序: Bucket独立配置项-> 显式配置fs.cosn.trsf. 相关配置则不进行转换-> 没有显式配置fs.cosn.trsf.* 的则会自动转换,例如如下配置项:

  1. fs.cosn.userinfo.appid
  2. fs.cosn.bucket.region
  3. fs.cosn.userinfo.region
  4. fs.cosn.server-side-encryption.algorithm
  5. fs.cosn.server-side-encryption.context

hadoop-cos's People

Contributors

abyswang avatar beanwang24 avatar chenjunjiedada avatar dependabot[bot] avatar happybin92 avatar iabetor avatar liuyongqing avatar panxing4game avatar rickyma avatar suninsky avatar vintmd avatar yuyang733 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hadoop-cos's Issues

org.apache.hadoop.fs.PartialListing NoSuchMethodError

hadoop version:3.1.1
run distcp cmd: sudo -u hdfs hadoop distcp hdfs://xxx/ks3/ods_spider cosn://xxx/
error info:
Error: org.apache.hadoop.fs.PartialListing.(Ljava/lang/String;[Lorg/apache/hadoop/fs/FileMetadata;[Lorg/apache/hadoop/fs/FileMetadata;)V
20/09/28 11:28:26 INFO mapreduce.Job: Task Id : attempt_1601263446127_0001_m_000004_0, Status : FAILED

最后发现是org.apache.hadoop.fs.PartialListing被hadoop common本身的同名的类覆盖,建议重命名该类或者上面新加一个包名(与cos在hadoop cloud storage的一致, https://github.com/apache/hadoop/blob/82ff7bc9abc8f3ad549db898953d98ef142ab02d/hadoop-cloud-storage-project/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/PartialListing.java)

hadoop-cos build with hadoop-3.3.4 fail

when i build hadoop-cos(v8.2.1) with apache hadoop-3.3.4 version for jar pkg, mvn package return error

mvn package:

[INFO] Changes detected - recompiling the module!
[INFO] Compiling 68 source files to /opt/soft/hadoop-cos/target/classes
[INFO] -------------------------------------------------------------
[WARNING] COMPILATION WARNING :
[INFO] -------------------------------------------------------------
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/buffer/CosNMappedBuffer.java:[5,18] sun.nio.ch.FileChannelImpl is inter nal proprietary API and may be removed in a future release
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/buffer/CosNRandomAccessMappedBuffer.java:[6,18] sun.nio.ch.FileChannelI mpl is internal proprietary API and may be removed in a future release
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/buffer/CosNMappedBuffer.java:[48,29] sun.nio.ch.FileChannelImpl is inte rnal proprietary API and may be removed in a future release
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/buffer/CosNMappedBuffer.java:[50,27] sun.nio.ch.FileChannelImpl is inte rnal proprietary API and may be removed in a future release
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/buffer/CosNRandomAccessMappedBuffer.java:[73,23] sun.nio.ch.FileChannel Impl is internal proprietary API and may be removed in a future release
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/cosn/buffer/CosNRandomAccessMappedBuffer.java:[75,21] sun.nio.ch.FileChannel Impl is internal proprietary API and may be removed in a future release
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/CosNFSDataOutputStream.java: Some input files use or override a deprecated A PI.
[WARNING] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/CosNFSDataOutputStream.java: Recompile with -Xlint:deprecation for details.
[INFO] 8 warnings
[INFO] -------------------------------------------------------------
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/CosNSeekableFSDataOutputStream.java:[52,15] abort() in org.apache.hadoop.fs.Co sNSeekableFSDataOutputStream cannot implement abort() in org.apache.hadoop.fs.Abortable
return type void is not compatible with org.apache.hadoop.fs.Abortable.AbortableResult
[INFO] 1 error
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:25 min
[INFO] Finished at: 2023-01-12T11:11:48Z
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-cos: Compilation failure
[ERROR] /opt/soft/hadoop-cos/src/main/java/org/apache/hadoop/fs/CosNSeekableFSDataOutputStream.java:[52,15] abort() in org.apache.hadoop.fs.Co sNSeekableFSDataOutputStream cannot implement abort() in org.apache.hadoop.fs.Abortable
[ERROR] return type void is not compatible with org.apache.hadoop.fs.Abortable.AbortableResult
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

CosN的实现里面是不是要覆盖获取默认端口的方法

我按照你们的配置,提交任务出现了

org.apache.hadoop.fs.InvalidPathException: Invalid path name Wrong FS: cosn://xxxx/emr/hadoop-yarn/staging/hadoop/.staging/job_1555056072016_0008, expected: cosn://xxxx/ 

通过分析可以发现, 问题处在了AbstractFileSystem这行thisPort等于-1,thatPort等于0

public void checkPath(Path path) {
...
    // Ports must match, unless this FS instance is using the default port, in
    // which case the port may be omitted from the given URI
    int thisPort = this.getUri().getPort();
    int thatPort = uri.getPort();
    if (thatPort == -1) { // -1 => defaultPort of Uri scheme
      thatPort = this.getUriDefaultPort();
    }
    if (thisPort != thatPort) {
      throw new InvalidPathException("Wrong FS: " + path + ", expected: "
          + this.getUri());
    }
}


看了逻辑,我们是不是需要在CosN类里面覆盖默认端口getUriDefaultPort?经过修改好像没了这个问题,不知道是我配置不正确还是,你们的代码有bug

public class CosN extends DelegateToFileSystem {
    public CosN(URI theUri, Configuration conf) throws IOException, URISyntaxException {
        super(theUri, new CosFileSystem(), conf, CosFileSystem.SCHEME, false);
    }

    @Override
    public int getUriDefaultPort() {
        return -1;
    }
}

CosN的上传文件的时间戳是不是没有和HDFS的表现得一致?导致出现问题?

我基于你们的配置出现了异常:

2019-04-15 14:45:47,683 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException as:hadoop (auth:SIMPLE) cause:java.io.IOException: Resource cosn://xxx-xxx/user/hadoop/.flink/application_1555078596113_0014/logback.xml changed on src filesystem (expected 1555259286000, was 1555310742000

我们提交flinnk作业的时候,可以看出时间戳不一致导致下载资源文件报错
原因分析
对比YARN的代码,是对比资源时间戳和文件在COS的时间戳导致报错:
文件:FSDownload.java

  private Path copy(Path sCopy, Path dstdir) throws IOException {
    FileSystem sourceFs = sCopy.getFileSystem(conf);
    Path dCopy = new Path(dstdir, "tmp_"+sCopy.getName());
    FileStatus sStat = sourceFs.getFileStatus(sCopy);
    if (sStat.getModificationTime() != resource.getTimestamp()) {
      throw new IOException("Resource " + sCopy +
          " changed on src filesystem (expected " + resource.getTimestamp() +
          ", was " + sStat.getModificationTime());
    }
    if (resource.getVisibility() == LocalResourceVisibility.PUBLIC) {
      if (!isPublic(sourceFs, sCopy, sStat, statCache)) {
        throw new IOException("Resource " + sCopy +
            " is not publicly accessable and as such cannot be part of the" +
            " public cache.");
      }
    }

    FileUtil.copy(sourceFs, sStat, FileSystem.getLocal(conf), dCopy, false,
        true, conf);
    return dCopy;
  }

而flink构造resource的时候,用的是文件的本地时间戳
image
但我发现cos用的是上传的时间,而不是改文件本地修改的最后时间

-rw-rw-rw-   1 hadoop hadoop       2331 2019-04-15 15:14 /user/hadoop/.flink/application_1555078596113_0017/logback.xml

本地文件的时间为:

-rwxrwxrwx 1 root users 2331 Apr 15 00:28 ./conf/logback.xml

所以一直报错expected 1555259286000,即00:28的时间和文件系统不匹配,如果换成hdfs就是正常的logback.xml的时间,所以,这里的时间的表现形式是不是和hdfs不一致呢?并不是完全兼容hdfs的行为?

Add license

The project needs to add a license, which tells the user what kind of license it is

apache hadoop-3.3.0 return java.lang.NoClassDefFoundError: org/apache/hadoop/fs/chdfs/PosixSeekable

apache hadoop-3.3.0 with hadoop-cos v8.2.1

1 core-site.xml(when i use tencents cos config)

fs.cosn.impl
org.apache.hadoop.fs.CosFileSystem


fs.AbstractFileSystem.cosn.impl
org.apache.hadoop.fs.CosN

hadoop fs -ls -R cosn://xxx/ return

2023-01-13 06:38:43,527 INFO fs.RangerCredentialsClient: begin to init ranger client, impl []
2023-01-13 06:38:43,691 INFO fs.CosNativeFileSystemStore: hadoop cos retry times: 200, cos client retry times: 5
2023-01-13 06:38:44,669 INFO fs.CosFileSystem: The cos bucket is the normal bucket.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/chdfs/PosixSeekable
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.fs.CosFileSystem.getActualFileSystemByClassName(CosFileSystem.java:160)
at org.apache.hadoop.fs.CosFileSystem.initialize(CosFileSystem.java:144)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3414)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.chdfs.PosixSeekable
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 31 more

2 core-site.xml (apache hadoop config from tencent cloud website)

fs.cosn.impl
org.apache.hadoop.fs.cosn.CosNFileSystem


fs.AbstractFileSystem.cosn.impl
org.apache.hadoop.fs.cosn.CosN

hadoop fs -ls -R cosn://xxx/ return

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.cosn.CosNFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2665)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3378)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3411)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:158)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3474)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3442)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:524)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.cosn.CosNFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2569)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2663)
... 16 more

3 core-site.xml (apache hadoop website)

<property>
    <name>fs.cosn.impl</name>
    <value>org.apache.hadoop.fs.cosn.CosNFileSystem</value>
    <description>The implementation class of the CosN Filesystem</description>
</property>

<property>
    <name>fs.AbstractFileSystem.cosn.impl</name>
    <value>org.apache.hadoop.fs.cos.CosN</value>
    <description>The implementation class of the CosN AbstractFileSystem.</description>
</property>

fs.AbstractFileSystem.cosn.impl value is org.apache.hadoop.fs.cos.CosN or org.apache.hadoop.fs.cosn.CosN ?

能否不打印多余的日志?

2019-07-19 14:59:53.637 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosFsDataOutputStream  - OutputStream for key 
2019-07-19 14:59:53.637 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosFsDataOutputStream  - OutputStream for key [/user/xxxx/x/root/.flinxxxk/application_1562922929987_0188/shipFiles/jackson-module-paranamer-2.9.7.jar] upload complete
2019-07-19 14:59:53.778 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosNativeFileSystemStore  - store file input stream md5 hash content length
2019-07-19 14:59:53.778 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosNativeFileSystemStore  - store file input stream md5 hash content length
2019-07-19 14:59:53.808 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosFsDataOutputStream  - OutputStream for key [/user/xxxxx/xxxxx/root/.flink/application_1562922929987_0188/shipFiles/commons-logging-1.2.jar] upload complete
2019-07-19 14:59:53.808 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosFsDataOutputStream  - OutputStream for key [/user/xxx/x/root/.flinxxxk/application_1562922929987_0188/shipFiles/commons-logging-1.2.jar] upload complete
2019-07-19 14:59:53.970 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosNativeFileSystemStore  - store file input stream md5 hash content length
2019-07-19 14:59:53.970 [queuedThreadPool-269] INFO  org.apache.hadoop.fs.CosNativeFileSystemStore  - store file input stream md5 hash content length

java.io.IOException: cosn://null : java.lang.IllegalArgumentException: uin and request id must be exist

when fs.cosn.credentials.provider is not specified, the following error.

java.io.IOException: cosn://null : java.lang.IllegalArgumentException: uin and request id must be exist

	at org.apache.hadoop.fs.CosNativeFileSystemStore.handleException(CosNativeFileSystemStore.java:1060)
	at org.apache.hadoop.fs.CosNativeFileSystemStore.initialize(CosNativeFileSystemStore.java:209)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy7.initialize(Unknown Source)
	at org.apache.hadoop.fs.CosFileSystem.initialize(CosFileSystem.java:104)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3354)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3403)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3371)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:477)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
	at org.apache.hadoop.fs.CosFileSystemTest.getFileStatus(CosFileSystemTest.java:15)

Test:

  @Test
  public void getFileStatus() throws IOException {
    FileSystem fs = CosFileSystem.get(new Configuration());
    String pathname = "/test-warehouse/tinytable";
    Path path = new Path(pathname);
    System.out.println(fs.getFileStatus(path));
  }

New authentication: DLFInstanceCredentialsProvider initialization error, there should not be thrown exception.
https://github.com/tencentyun/hadoop-cos/blob/master/src/main/java/org/apache/hadoop/fs/auth/DLFInstanceCredentialsProvider.java#L41

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.