Code Monkey home page Code Monkey logo

sneaky's People

Contributors

zhehaowang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sneaky's Issues

feed: optional (all) retrieved data storage

It happened a few times that we missed some data we could've retrieved from the response, e.g. style id, release date.
Instead of re-fetching all the data on each run to retrieve those we should expose a flag to save the entirety of the response we get and when looking for / experimenting with something we don't have to refetch. We'd also have a golden copy to test parsing with.

strategy scoring model: cut off by mid-margin-rate 30%, and discount suspiciously lucrative items

Often times when current crossing margin rate exceeds 100% the pair is over-priced at du: nearby sizes all sell at a much lower price.
We should discount overly lucrative pairs as a start in our scoring.

In the longer term, an ideal solution would be to use a more up-to-date du api to keep track of historical transaction px. Or coming up with a reasonable estimate according to prices of nearby sizes. Historical listing prices may help as well.

strategy: research price volatility

E.g. du demonstrated a 30% swing in a day on a recently released pair. Our turnaround time is long enough that we need to be careful about gauging volatility while we hold the product.

accounting: annotation

It's useful to under what prediction we made the decision to bid for a pair, and how much we want to list at to make a satisfactory amount. Such should be written as annotation (current by hand?) into a shared accounting doc.

feed: log exceptions and summary

Instead of print we should store url and exceptions in an exceptions log, and report how many page parsed, how many succeed, and how many exception'ed.

feed: stockxapi rate limit

stockx appears to be one of those sites constantly upgrading their anti-bot mechanism.
On 06/02/19 my auth requests get through if they have User-Agent set.
On 06/09/19 I had to add Referrer, Origin and Content-Type.
On 06/12/19 I had to add these to get_details requests as well, and I still get 403 after the first few requests. As a short term solution perhaps a rate limit, or multiple sources, will do.

Goal of this is to be able to continue scraping stockx uninterruptedly. I can think of

  • either add a rate limit on our side, or
  • find out if additional fields can just let our requests get through, or
  • switch to a different framework with such support built in

I believe ultimately they want people to use their api but what I'm doing now is probably too brutal.

@djian618

feed: compare historical px with listing px

we added support for historical transaction time lately. in scoring mechanism we should consider discounting it if there's a large gap between historical prices and current listing px. Current listing px could be a seller charging unreasonably high.

feed: du

reference: https://github.com/levislin2016/du-app-api

Hurdle 1: log in. Without a web interface, it seems the best we can do is to MitM ourselves: manual proxy to laptop, install a cert on phone, capture traffic and decrypt https headers on laptop.
There are commercial apps out there providing an integrated solution: Fiddler and Charles Proxy both do this. With the latter we were able to see what exactly phone sent to authenticate itself, although replicating that didn't seem to work yet: 'pwd' based log in appeared no longer present in Du v4.1.1.
We could either try a few more combinations with our iOS Du v4.1.1, or switch to an Android Du v3.5.5 and try pwd log in on that.

strategyv2: investigate irrational historical transaction prices

Feed has confirmed transaction history like this one:
Du 304775-101 saw a 42.5 size buy at 3000CNY on Dec 23, and the immediate one before was 5079CNY on Dec 12.
Such big discrepancy will hurt how well to-last conditions can make decision.

Similar investigation should be

  • conducted on the field "du_listing_bid", and to
  • discount findings where stockx spread is absurdly wide.

These are probably the biggest blockers before autobid.

feed: more reasonable structure for rescraping: static-map sampling

one way to facilitate market data study is to instead of rescraping all each time we run a scraper, keep the thing running all the time.
When we get a quote we schedule an event one day later to get another get request for the same item one day later.

A few advantages of this way:

  • facilitate market data study
  • fewer botched run clean-ups
  • pushes for / could benefit from a db-based storage
  • if done right, maybe could help with stockx perimeterx situation?
  • helps with combining everything into a single (more or less 'idempotent') decision making unit

Static mapping (url we identified from each site that we would care about) could be scraped at a lower frequency. Each time we start we start by sending requests to those we have missed that last mark on, and works on scheduled events from there.

This unfortunately requires an overhaul of the current system.
We should also consider the system more carefully with this.

Investigate drop in matching rate

In aa30861 we noticed match rate between sx and fc dropped from

'found 187 model matches, 2149 (model, size) matches'

to

'found 71 model matches, 998 (model, size) matches'

This may be associated with the latest scraping missing some items, or faulty / stricter logic since late.
This item is to further investigate if this drop is expected.

feed: du historical transaction prices

As it turns out the current version / api endpoint we use does not have historical transaction price in its response: we only have time, size and buyer.
We played around with version string in request header and get params, but didn't get a different response.
Presumably it's the API endpoint that matters.
https://m.poizon.com/product/lastSoldList? --> https://app.poizon.com/api/v1/app/product/ice/lastSoldList
Our current sign value does not with the second api endpoint: 500 or 403 are both triggered.

du: smarter starting point

It seems we now do a dumb full-site listing, often times finding t-shirts etc.
We should have a smarter query to filter down to shoes.

strategy reporting: add 30% CNY price benchmark to report

Jian found this benchmark to be particularly useful in estimating if we want to bid on an du-overpriced pair: idea being even if we don't sell at the overpriced value, we can still make a decent margin. We should prioritize implementing this.

strategy: volume indication

With du integrated, stockx's past trades may not be a good indicator for volume in Chinese market. We should try to extract more info from du / fc responses.

stockx: parsing failed

after stockx update last week, our api endpoint is no longer working.
query still works but getting individual items / book building is busted.

feed: static mapping where possible

We expect much of the data we retrieve to be fairly static: style id, name, release date, color, and even url. Just the price and volume / transaction history are not. We could save some queries / traffic if we cache the static ones.

feedv2: investigate match rate between du and stockx

The current query model leaves many entries unmatched:
dropping 587 keys in du.mapping.20191221-150959.csv not present in current intersection
dropping 1051 keys in stockx.mapping.2019-12-21T18:58:50.487Z.csv
write 213 merged result to merged.csv

This is unideal and should be investigated.

feed: original style ID is broken in email reports

Since our update to use sanitized style ID in sx scrape results we lose the original style ID in email reports.
It's not critical now as we have sell.flightclub.com url for all our items, but we should still bring original style ID back.

feedv2: investigate data drop

A strategy run suggests the following:

total (style_id, size) pairs 28464
total (style_id, size) pairs 3579 with data
total (style_id, size) pairs 3579 with fresh data
total (style_id, size) pairs 1379 with fresh transactions
total (style_id, size) pairs 524 satisfying profit cutoff ratio (bid to last) of 0.01

We need to investigate why the number dropped by so much in the has-data filter, right after fresh update runs.

feed: du clean up

Clean up unneeded code in du scrapy and add comments.
We should also store more info like past transactions even though we may not use them now.

history / goal / roadmap: feed + strategy

If we implement #31, we can

  • combine feed and strategy into one binary,
  • have a stream of sampled data, and
  • a reasonable enough scoring mechanism

s.t.

  • given a risk configuration (maxpos, flatvalue)

the binary can

  • find optimal models to allocate asset on
  • automatically place bids, or provide a human with high level stats and ask for confirmation

A human only needs to configure the allocation, approve the program's decisions, and ship the shoes out.

feed: stockx feed organization

most of what we care lives in main. we should specify what it can do and what we care, make building book and those we don't care optional, and refactor some functionality out of main.

strategy: du's fees

take into account du's fees (3.5%, 1%, 8, 15, 10 CNY) for profit margin calculation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.