Code Monkey home page Code Monkey logo

cikm_analyticup_2017's Introduction

CIKM_AnalytiCup_2017

This repo discribes the solution of Team 怀北村明远湖. CIKM AnalytiCup 2017 is an open competition that is sponsored by Shenzhen Meteorological Bureau, Alibaba Group and CIKM2017. Our team got the third place in the first phrase. And in the second phrase we got the fourth place.

Introduction

Short-term precipitation forecasting such as rainfall prediction is a task to predict a short-term rainfall amount based on current observations. In this challenge, sponsors provide a set of radar maps at different time spans where each radar map covers radar reflectivity of a target site and its surrounding areas. Radar maps are measured at different time spans, i.e., 15 time spans with an interval of 6 minutes, and different heights, i.e., 4 heights, from 0.5km to 3.5km with an interval of 1km; Each radar map covers an area of 101km*101km around the site. The area is marked as 101*101 grids, and the target site is located at the centre, i.e. (50, 50).

Our task here is to predict the total rainfall amount on the ground between future 1-hour and 2-hour for each target site.In this challenge, we combine Random Forestry, XGBoost and Bidirectional Gated Recurrent Units (GRUs) into an ensemble model to tackle this problem and achieve satisfying result.

Data Process

Percentile Method

A statistical method was applied to reduce the dimension of radar data. For a single radar map, we pick the 25th, 50th, 75th, 100th percentile of reflectivity values in various scales of neighborhood around the target site from center to the whole map.

"Wind" Methond

We first handle the original data (15*4*101*101) into a small size of data (15*4*10*10). Then shrink the data into 15*4*6*6 features through judging the wind direction. The entire preprocess learns from the idea of CNN, especially the convolutional calculation and max polling.

We take the fourth layer of data to determine the wind direction. Then, in order to calculating the resulting wind direction, we carry out two ways of choosing representative data. The first one uses the maximum value in each 10*10 frame as the representation. The second one takes the average of the largest five data instead. After selecting the representative data, we determine the wind direction by calculating the deviation between the initial position and the following frames, voting the moving direction, finally get the maximum votes as the resulting wind direction based on the given thresholds.

Model

Random Forestry, XGBoost and Bidirectional GRUs are utilized for model ensemble.

Requirements

  • Python 3.6
  • Keras
  • XGBoost
  • sklearn

Dataset

cikm_analyticup_2017's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cikm_analyticup_2017's Issues

数据缺失问题

之前有人问您 按照您说的我把main文件check_code('simple', 'online')注释,把check_code('all', 'no')打开之后,有特定文件指向,是少文件吗?还是运行处理数据代码时过程出现了问题?有没有流程文件可供参考呢?

请教:运行报错

运行main.py生成train data,最后一条如下:

['train_8152', 13.76, 23.04, 79.52, 0.0, 0.0, 0.0, 95.72, 111.44, 47.6, 23.72, 10.12, 1.92, 128.96, 113.48, 64.28, 10.84, 20.56, 5.72, 120.56, 67.64, 69.6, 75.56, 18.68, 4.0, 15.28, 10.92, 138.44, 98.44, 10.56, 2.0, 11.6, 27.56, 23.24, 74.24, 8.56, 0.88, 18.56, 35.24, 103.16, 2.12, 0.0, 0.0, 122.72, 92.6, 75.8, 30.2, 3.04, 1.52, 123.92, 106.28, 68.72, 12.12, 26.72, 3.04, 127.04, 88.76, 48.52, 84.92, 28.6, 0.0, 22.52, 20.4, 124.48, 128.6, 22.08, 1.76, 10.32, 24.68, 20.08, 88.52, 19.84, 0.0, 52.68, 46.24, 93.44, 1.76, 0.0, 0.0, 116.96, 84.92, 77.36, 25.44, 10.2, 4.8, 118.16, 99.08, 70.04, 30.72, 39.48, 0.0, 130.28, 93.68, 33.2, 120.44, 36.28, 0.0, 29.12, 26.68, 112.64, 142.04, 21.96, 0.0, 10.96, 28.16, 19.8, 70.6, 26.48, 0.0, 61.44, 36.04, 89.6, 3.76, 0.0, 4.8, 128.24, 87.32, 80.48, 38.48, 13.64, 6.64, 112.4, 95.24, 71.24, 55.88, 50.96, 8.4, 127.28, 96.8, 21.08, 133.28, 60.92, 6.2, 31.16, 25.64, 67.04, 130.16, 44.96, 0.0, 20.28, 24.32, 27.8, 71.16, 42.04, 1.04, 94.48, 64.28, 81.92, 18.72, 0.0, 2.2, 134.24, 90.2, 93.92, 43.64, 4.76, 9.56, 102.68, 91.76, 75.44, 58.64, 68.48, 10.72, 135.8, 98.0, 33.8, 130.76, 78.44, 7.44, 27.12, 21.08, 72.8, 124.16, 43.76, 0.0, 21.2, 22.2, 16.8, 75.92, 41.12, 69.16, 116.76, 73.16, 82.04, 40.56, 0.0, 0.0, 132.32, 97.52, 115.04, 71.6, 18.44, 0.0, 87.32, 90.32, 83.84, 74.48, 105.68, 11.0, 129.2, 91.4, 17.44, 124.52, 87.08, 18.88, 23.8, 15.88, 32.28, 124.52, 76.84, 26.04, 19.52, 23.0, 30.72, 56.32, 61.52, 122.6, 129.32, 95.6, 101.36, 44.44, 0.0, 0.0, 118.16, 90.56, 127.4, 72.68, 54.08, 1.92, 57.84, 81.92, 86.48, 97.4, 138.68, 26.16, 114.32, 79.28, 41.56, 132.2, 112.88, 46.96, 30.88, 22.16, 13.24, 97.16, 81.2, 52.36, 23.36, 25.4, 41.96, 39.4, 59.96, 110.48, 131.36, 105.8, 114.08, 31.28, 0.0, 2.64, 113.48, 110.24, 137.72, 71.84, 59.36, 0.0, 35.96, 66.68, 79.4, 96.08, 140.48, 16.6, 109.68, 68.72, 61.88, 124.4, 126.8, 58.6, 33.52, 2.56, 15.2, 101.0, 84.68, 51.92, 17.56, 22.52, 36.92, 11.72, 60.88, 118.4, 136.28, 137.84, 133.4, 23.16, 7.64, 0.0, 118.64, 117.68, 135.32, 70.76, 57.56, 3.76, 55.56, 63.08, 81.56, 144.44, 128.72, 17.76, 108.44, 72.8, 55.28, 117.44, 110.72, 80.76, 36.68, 8.4, 8.0, 94.16, 83.96, 54.32, 24.24, 23.12, 36.16, 4.48, 79.08, 129.32, 137.0, 147.2, 136.64, 60.56, 5.12, 0.0, 118.16, 114.32, 132.68, 80.36, 61.76, 2.4, 20.4, 59.6, 81.32, 148.4, 134.72, 41.68, 93.36, 75.32, 65.24, 113.6, 107.72, 97.08, 26.36, 19.64, 9.16, 78.88, 83.6, 59.12, 28.88, 25.4, 44.32, 20.48, 64.28, 125.96, 131.12, 145.64, 140.72, 86.52, 4.52, 0.76, 122.24, 84.2, 120.2, 109.28, 103.88, 1.92, 23.84, 53.84, 86.24, 132.2, 150.32, 71.8, 79.16, 72.68, 57.2, 79.72, 116.24, 88.04, 39.84, 45.84, 14.24, 25.0, 89.96, 88.16, 33.8, 51.32, 24.96, 23.12, 62.44, 122.24, 127.08, 156.92, 138.08, 81.32, 15.8, 0.0, 110.72, 75.92, 106.04, 114.68, 117.44, 59.36, 47.4, 26.56, 76.64, 133.76, 148.88, 71.24, 60.8, 61.4, 46.92, 65.72, 125.6, 92.84, 42.8, 38.44, 52.68, 8.28, 83.0, 106.28, 30.92, 56.6, 38.24, 0.0, 45.2, 106.84, 6.5]

之后就会报错,请教下是什么问题?报错信息信息如下:
File "/home/simon/source/code/CIKM-Cup-2017-master/huaibeicun/model/main.py", line 64, in
check_code('all','no')

File "/home/simon/source/code/CIKM-Cup-2017-master/huaibeicun/model/main.py", line 26, in check_code
train_add = dp.dataprocess(trainfile, data_type='train', windversion='old')

File "../dataprocess/data_process8.py", line 280, in dataprocess
id_label, con_mat = train_convolution(line, data_type)

File "../dataprocess/data_process8.py", line 220, in train_convolution
mat = np.array(record).reshape(15,4,101,101)

ValueError: cannot reshape array of size 251031 into shape (15,4,101,101)

数据不全

main.py 调用的时候缺少很多 .csv 文件,是压缩包里数据不全的原因吗?
以及很多文件路径都直接写死在了文件里,如:

/home/Team4/Team4/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.