Code Monkey home page Code Monkey logo

jdata's Introduction

JData

修改之后的代码,放在这里

  • 代码train.py结果 线上 0.07
  • cutDataSet.py 根据时间切分数据集
  • fetchSamples.py 抽取正负样本
  • generateFeature.py 处理用户数据和评论数据,并产生特征集合

jdata's People

Contributors

xls1994 avatar zeco-01 avatar

Stargazers

 avatar  avatar wuliqi avatar DK.Pino avatar  avatar yanziang avatar

Watchers

James Cloos avatar

jdata's Issues

pandas.read_csv读取数据慢的不能忍

pandas.read_csv读取数据巨慢,可以考虑搭建一个数据仓库,然后SQL语言访问。

另外一个方法是分块读取

#读取csv文件
def get_data(fname, chunk_size=100001):
    # 为什么读入数据的时候不直接df = pandas.read_csv(fname, header=0, usecols=["sku_id", "type"])
    # 因为文件太大了
    reader = pandas.read_csv(fname, header=0, iterator=True)
    chunks = []
    loop = True
    loop_i = 0
    while loop:
        try:
            loop_i = loop_i + 1
            logging.warning('loop:' + str(loop_i))
            chunk = reader.get_chunk(chunk_size)
            chunks.append(chunk)
        except StopIteration:
            loop = False
            logging.warning(
                str(StopIteration) + "(读取完毕)Iteration is stopped" + 'loop:' +
                str(loop_i))
        except KeyboardInterrupt:
            loop = False
            logging.warning(
                str(KeyboardInterrupt) +
                "(读取手动中断)Iteration is stopped,中断读取,开始下一步" + 'loop:' + str(
                    loop_i))
    df_ac = pandas.concat(chunks, ignore_index=True)
    return df_ac

```
`
PS:pandas.read_csv读取巨大文件时候极其吃内存

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.