Light

uchihashikenshi / attention_time Goto Github PK

View Code? Open in Web Editor NEW

22.0 1.0 10.0 8.21 MB

Python implementation of a time-series model with (optional) attention where the encoder is CNN, decoder is LSTM, and more.

Jupyter Notebook 98.58% Python 1.42%

attention_time's Issues

時間で分けたデータセットの作成

要件

train:testの分割で, これまでWebサイト単位で分けていたものを時間で分けたものについても実験する
より実際の使用シーンに近そうではある

実装

コード整形

要件

データ生成部分について, 時系列の標準表現を得てから(get_timeseries_normal_representation.py)それに対してトレーニング用データの生成スクリプト(data.py)を適用する形に変更.
統計情報を表示するスクリプト(stats.py)を作成
データの分割方法を複数指定できるように拡張(時間分割, サイト分割の二種類で当面はOK)

実装

data/ folder以下に実装.
data/dataname/ folder以下にデータを保存する感じ
BaseData classを継承した形でdata.pyを実装

新データで実験

要件

tweet hashtagsとmemetrackerでの実験結果を追加

実装

ipython notebookに直書き

XGBoostへの換装

要件

scikit-learnのGBDTよりXGBoostのほうがよく使われていそうだったのでそちらに変更.

実装

xgboostというライブラリがあるのであるのでそれを使うだけでたぶんok

note

Python XGBoost の変数重要度プロット / 可視化の実装: http://sinhrks.hatenablog.com/entry/2015/08/27/000235 にしたがってやると良さそう

学習データの作成方法変更

要件

データセットを分割して学習データとテストデータを作成する際, strideを入力次元(30)以下の値にすると, n+1番目の入力にn番目の教師(の情報の一部)が少し含まれてしまう.
bookmarkデータではあまり影響が無いように見えたので放置していたが, さすがにサチり過ぎなので学習が進めば影響が出てくる感じなのかもしれない

実装

PreprocessingでWebページ抽出の段階でtrainとtestを分ける

note

CNNで多次元同時正解確率取得

要件

multilabel CNNで多次元が同時に正解する確率を求めたい
そもそも今バグってるっぽいのでそれを直す

実装

accuracyに多次元同時正解確率を追加する

XGBoostの結果可視化

要件

実装

重要な変数の可視化
決定木の可視化
- こちらから決定木を指定するのではなく, あるデータについて貢献度の高い決定木であったり, 平均して重要度の高い決定木を自動で発見する仕組みが欲しい

note

Python XGBoost の変数重要度プロット / 可視化の実装: http://sinhrks.hatenablog.com/entry/2015/08/27/000235 にしたがってやってみる

盛り上がりの長さ推定

実験設定の見直し

要件

実験をシステマティックにやる
scikit-learnのcross validationとかgrid searchとかを使う？
テストデータもリサンプリングされている問題を解決したい

実装

kerasだとgrid searchとかはラッパーっぽく使えたはず

would you like use english? thanks!

kNNでベースライン作成

要件

学習器としてkNNを追加してベースラインにする

実装

model/kNN/を作ってそこにコードを置いとく
コマンドラインから呼べる感じには後でまとめてする
metricはeuclideanとDTWくらいで
data_feed的なクラスを作って継承する方がよさげ

モデルの基底クラス作成

要件

CNNやLSTMの基底クラスを作り, 直上ディレクトリに格納する

実装

直上ディレクトリをimportする際, ..baseというように双対パスによる指定が出来ないっぽい
ディレクトリを分けるのをやめるべきな気もする(CNN, LSTM, seq2seqなどを全て同じディレクトリで管理
keras実装だとモデル定義は1ファイルでできるので問題なさそう

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.