Code Monkey home page Code Monkey logo

daybits's Introduction

daybits

压缩表示日期的一种数据格式。阿里巴巴集团安全部( http://weibo.com/alisec )数据团队在清洗大数据的过程中,经常需要保存精度到天的时间数据,比如说某人在历史哪些天登录过。这样的数据很多,需要关注存储成本。daybits是为解决这个问题而设计的压缩表示日期数据的数据格式。

数据格式

  daybits := <beforeYears>#<years>
  beforeYears := <year>;<year>; ...
  years := <year>;<year>;<year>; ...
  year := <quarter>,<quarter>,<quarter>,<quarter>
  quarter = base64_str

其中#之前是2013以前的数据,如果#不存在,或者#之后的是2013年开始的数据。年中每个季度的数据使用逗号分隔,每个季度的数据是一个base64字符串,原始数据是一个byte数组,每个byte可以表示8天。

odps函数

函数名 返回数据类型 函数类型 UDF实现类 函数介绍
daybits_concat String UDAF com.alibaba.daybits.support.odps.udf.DayBitsConcat 聚合函数,输入参数是日期,返回daybits格式字符串,用于在原始明细数据中构建daybits数据
daybits_merge String UDAF com.alibaba.daybits.support.odps.udf.DayBitsMerge 聚合函数,输入参数是daybits数据,返回是合并之后的daybits数据
daybits_count Bigint UDF com.alibaba.daybits.support.odps.udf.DayBitsCount 查看daybits数据中存在多少天,输入是daybits数据,可以指定开始和结束时间
daybits_exists Boolean UDF com.alibaba.daybits.support.odps.udf.DayBitsExists 查看daybits数据中是否存在指定范围的日期,输入是daybits数据和开始和结束时间
daybits_explain String UDF com.alibaba.daybits.support.odps.udf.DayBitsExplain 把daybits解析为可读的字符串,输入是daybits数据,可以指定开始和结束时间
daybits_first BigInt UDF com.alibaba.daybits.support.odps.udf.DayBitsFirst 返回第日期
daybits_last BigInt UDF com.alibaba.daybits.support.odps.udf.DayBitsLast 返回最后日期
daybits_and String UDF com.alibaba.daybits.support.odps.udf.DayBitsAnd daybits数据的交集
daybits_or String UDF com.alibaba.daybits.support.odps.udf.DayBitsOr dyabits数据的并集
daybits_get Boolean UDFcom.alibaba.daybits.support.odps.udf.DayBitsGet 判断某个日期在daybits中是否存在
daybits_set String UDF com.alibaba.daybits.support.odps.udf.DayBitsSet 设置daybits某一天为true

daybits_concat

用途:聚合函数,用于通过原始数据构建daybits。
函数定义:

STRING daybits_concat(STRING date)

参数:
date yyyymmdd格式日期字符串

返回值: daybits格式字符串

示例:

select member_id, daybits_concat(ds)
from my_table
group by member_id

daybits_merge

用途:聚合函数,用于通过合并多个daybits。
函数定义:

STRING daybits_merge(STRING daybits)

参数:
daybits daybits格式字符串

返回值: daybits格式字符串

示例:

select member_id, daybits_merge(event_trace)
from my_table
group by member_id

daybits_set

用途:设置daybits数据中某一天的值
函数定义:

STRING daybits_set(STRING daybits, STRING date)
STRING daybits_set(STRING daybits, BIGINT date)
STRING daybits_set(STRING daybits, STRING date, BOOLEAN value)
STRING daybits_set(STRING daybits, BIGINT date, BOOLEAN value)

参数:
daybits daybits格式字符串
date yyyymmdd格式日期字符串,或等价整数,比如20140701

返回值: daybits格式字符串

示例:

select member_id, daybits_set(event_trace, '20140322')
from my_table

select member_id, daybits_set(event_trace, 20140322)
from my_table

select member_id, daybits_set(event_trace, '20140322', false)
from my_table

select member_id, daybits_set(event_trace, 20140322, false)
from my_table

daybits_get

用途: 判断daybits数据中某一天的值
函数定义:

 BOOLEAN daybits_get(STRING daybits, STRING date)
 BOOLEAN daybits_get(STRING daybits, BIGINT date)

参数:
daybits daybits格式字符串
date yyyymmdd格式日期字符串,或等价整数,比如20140701

返回值: 是否已经设置

示例:

select daybits_get(';AAAAAChCywMgAg==', 20140205) from dual

select daybits_get(';AAAAAChCywMgAg==', '20140205‘) from dual

daybits_count

用途: 判断daybits数据中存在的天数
函数定义:

 BIGINT daybits_count(STRING daybits)
 BIGINT daybits_count(STRING daybits, STRING start)
 BIGINT daybits_count(STRING daybits, STRING start, STRING end)
 BIGINT daybits_count(STRING daybits, BIGINT start)
 BIGINT daybits_count(STRING daybits, BIGINT start, BIGINT end)

参数:
daybits daybits格式字符串
start 开始日期 yyyymmdd格式日期字符串,或等价整数,比如20140701
end 结束日期 yyyymmdd格式日期字符串,或等价整数,比如20140701

返回值: 是否已经设置

示例:

-- ;AAAAAChCywMgAg==的结果是: 20140205,20140207,20140211,20140216,20140218,20140219,20140221,20140224,20140225,20140226,20140227,20140311,20140315
select daybits_count(';AAAAAChCywMgAg==') from dual -- 返回14

select daybits_count(';AAAAAChCywMgAg==', 20140205) from dual -- 返回13

select daybits_count(';AAAAAChCywMgAg==', 20140205, 20140207) from dual -- 返回2

-- 查找20140301~20140331期间出现的记录
select *
from my_tabel
where daybits_count(event_trace, 20140301, 20140331) > 0

daybits_exists

用途: 判断daybits数据中存在的天数
函数定义:

 BOOLEAN daybits_exists(STRING daybits)
 BOOLEAN daybits_exists(STRING daybits, STRING start)
 BOOLEAN daybits_exists(STRING daybits, STRING start, STRING end)
 BOOLEAN daybits_exists(STRING daybits, BIGINT start)
 BOOLEAN daybits_exists(STRING daybits, BIGINT start, BIGINT end)

参数:
daybits daybits格式字符串
start 开始日期 yyyymmdd格式日期字符串,或等价整数,比如20140701
end 结束日期 yyyymmdd格式日期字符串,或等价整数,比如20140701

返回值: 是否存在

示例:

select daybits_exists(';AAAAAChCywMgAg==', 20140206, 20140207) from dual -- 返回true

-- 查找20140301~20140331期间出现的记录
select *
from my_tabel
where daybits_exists(event_trace, 20140301, 20140331)

daybits_explain

用途: 将daybits字符串解析为可读的日期字符串
函数定义:

 STRING daybits_explain(STRING daybits)
 STRING daybits_explain(STRING daybits, STRING start)
 STRING daybits_explain(STRING daybits, STRING start, STRING end)
 STRING daybits_explain(STRING daybits, BIGINT start)
 STRING daybits_explain(STRING daybits, BIGINT start, BIGINT end)

参数:
daybits daybits格式字符串
start 开始日期 yyyymmdd格式日期字符串,或等价整数,比如20140701
end 结束日期 yyyymmdd格式日期字符串,或等价整数,比如20140701

返回值: 以逗号分隔的日期

示例:

-- 返回 20140205,20140207,20140211,20140216,20140218,20140219,20140221,20140224,20140225,20140226,20140227,20140311,20140315
select daybits_explain(';AAAAAChCywMgAg==') from dual

select daybits_explain(';AAAAAChCywMgAg==', 20140205, 20140207) from dual -- 返回20140205,20140207

daybits_first

用途: 返回daybits字符串中的首次日期
函数定义:

 BIGINT daybits_first(STRING daybits)
 BIGINT daybits_first(STRING daybits, STRING start)
 BIGINT daybits_first(STRING daybits, STRING start, STRING end)
 BIGINT daybits_first(STRING daybits, BIGINT start)
 BIGINT daybits_first(STRING daybits, BIGINT start, BIGINT end)

参数:
daybits daybits格式字符串
start 开始日期 yyyymmdd格式日期字符串,或等价整数,比如20140701
end 结束日期 yyyymmdd格式日期字符串,或等价整数,比如20140701

返回值: 返回yyyymmdd格式的整数

示例:

select daybits_first(';AAAAAChCywMgAg==') from dual -- 返回 20140205

select daybits_first(';AAAAAChCywMgAg==', 20140206, 20140207) from dual -- 返回20140207

daybits_last

用途: 返回daybits字符串中的末次日期
函数定义:

 BIGINT daybits_last(STRING daybits)
 BIGINT daybits_last(STRING daybits, STRING start)
 BIGINT daybits_last(STRING daybits, STRING start, STRING end)
 BIGINT daybits_last(STRING daybits, BIGINT start)
 BIGINT daybits_last(STRING daybits, BIGINT start, BIGINT end)

参数:
daybits daybits格式字符串
start 开始日期 yyyymmdd格式日期字符串,或等价整数,比如20140701
end 结束日期 yyyymmdd格式日期字符串,或等价整数,比如20140701

返回值: 返回yyyymmdd格式的整数

示例:

select daybits_last(';AAAAAChCywMgAg==') from dual -- 返回 20140315

select daybits_last(';AAAAAChCywMgAg==', 20140206, 20140207) from dual -- 返回20140207

daybits_and

用途: 返回两个daybits数据的交集
函数定义:

 STRING daybits_and(STRING daybits_a, STRING daybits_b)

参数:
daybits_a daybits格式字符串
daybits_b daybits格式字符串

返回值: 返回两个daybits数据的交集,daybits格式字符串

示例:

select daybits_and(';AAAAAChCywMgAg==', ';AAAAAChCywMgAg==') from dual -- 返回 ';AAAAAChCywMgAg=='

daybits_or

用途: 返回两个daybits数据的并集
函数定义:

 STRING daybits_or(STRING daybits_a, STRING daybits_b)

参数:
daybits_a daybits格式字符串
daybits_b daybits格式字符串

返回值: 返回两个daybits数据的并集,daybits格式字符串

示例:

select daybits_or(';AAAAAChCywMgAg==', ';AAAAAChCywMgAg==') from dual -- 返回 ';AAAAAChCywMgAg=='

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.