The sneaky from zhehaowang

feed: optional (all) retrieved data storage

It happened a few times that we missed some data we could've retrieved from the response, e.g. style id, release date.
Instead of re-fetching all the data on each run to retrieve those we should expose a flag to save the entirety of the response we get and when looking for / experimenting with something we don't have to refetch. We'd also have a golden copy to test parsing with.

strategy scoring model: cut off by mid-margin-rate 30%, and discount suspiciously lucrative items

Often times when current crossing margin rate exceeds 100% the pair is over-priced at du: nearby sizes all sell at a much lower price.
We should discount overly lucrative pairs as a start in our scoring.

In the longer term, an ideal solution would be to use a more up-to-date du api to keep track of historical transaction px. Or coming up with a reasonable estimate according to prices of nearby sizes. Historical listing prices may help as well.

architecture: shared functionality / library refactor

feed: reliable release date

feed: reasonable error handling and graceful exits / restarts. don't lose data on exception.

strategy: research price volatility

E.g. du demonstrated a 30% swing in a day on a recently released pair. Our turnaround time is long enough that we need to be careful about gauging volatility while we hold the product.

accounting: annotation

It's useful to under what prediction we made the decision to bid for a pair, and how much we want to list at to make a satisfactory amount. Such should be written as annotation (current by hand?) into a shared accounting doc.

feed: log exceptions and summary

Instead of print we should store url and exceptions in an exceptions log, and report how many page parsed, how many succeed, and how many exception'ed.

feed: set up stable cron rescrape and email notification

feed: stockxapi rate limit

stockx appears to be one of those sites constantly upgrading their anti-bot mechanism.
On 06/02/19 my auth requests get through if they have User-Agent set.
On 06/09/19 I had to add Referrer, Origin and Content-Type.
On 06/12/19 I had to add these to get_details requests as well, and I still get 403 after the first few requests. As a short term solution perhaps a rate limit, or multiple sources, will do.

Goal of this is to be able to continue scraping stockx uninterruptedly. I can think of

either add a rate limit on our side, or
find out if additional fields can just let our requests get through, or
switch to a different framework with such support built in

I believe ultimately they want people to use their api but what I'm doing now is probably too brutal.

@djian618

feed: compare historical px with listing px

we added support for historical transaction time lately. in scoring mechanism we should consider discounting it if there's a large gap between historical prices and current listing px. Current listing px could be a seller charging unreasonably high.

strategy: women shoe sizes

With du we introduced size conversion. This should be per-brand per-gender, right now we assume men.

strategy: discrepancies in mid profit rate calc

2596c9b introduced discrepancies in mid profit rate calc: the 30p marks in report and mid-price-percent in report have different treatment of du flat fees and shipping fees, presumably.
We should unify these.

feed: du

reference: https://github.com/levislin2016/du-app-api

Hurdle 1: log in. Without a web interface, it seems the best we can do is to MitM ourselves: manual proxy to laptop, install a cert on phone, capture traffic and decrypt https headers on laptop.
There are commercial apps out there providing an integrated solution: Fiddler and Charles Proxy both do this. With the latter we were able to see what exactly phone sent to authenticate itself, although replicating that didn't seem to work yet: 'pwd' based log in appeared no longer present in Du v4.1.1.
We could either try a few more combinations with our iOS Du v4.1.1, or switch to an Android Du v3.5.5 and try pwd log in on that.

strategyv2: csv / console / email report. serialization module

send as attachment a csv file. split out report object serialization.

strategyv2: investigate irrational historical transaction prices

Feed has confirmed transaction history like this one:
Du 304775-101 saw a 42.5 size buy at 3000CNY on Dec 23, and the immediate one before was 5079CNY on Dec 12.
Such big discrepancy will hurt how well to-last conditions can make decision.

Similar investigation should be

conducted on the field "du_listing_bid", and to
discount findings where stockx spread is absurdly wide.

These are probably the biggest blockers before autobid.

feed: more reasonable structure for rescraping: static-map sampling

one way to facilitate market data study is to instead of rescraping all each time we run a scraper, keep the thing running all the time.
When we get a quote we schedule an event one day later to get another get request for the same item one day later.

A few advantages of this way:

facilitate market data study
fewer botched run clean-ups
pushes for / could benefit from a db-based storage
if done right, maybe could help with stockx perimeterx situation?
helps with combining everything into a single (more or less 'idempotent') decision making unit

Static mapping (url we identified from each site that we would care about) could be scraped at a lower frequency. Each time we start we start by sending requests to those we have missed that last mark on, and works on scheduled events from there.

This unfortunately requires an overhaul of the current system.
We should also consider the system more carefully with this.

feed: book deserialization

Investigate drop in matching rate

In aa30861 we noticed match rate between sx and fc dropped from

'found 187 model matches, 2149 (model, size) matches'

to

'found 71 model matches, 998 (model, size) matches'

This may be associated with the latest scraping missing some items, or faulty / stricter logic since late.
This item is to further investigate if this drop is expected.

feed: investigate size 1 entries. are they misparsed?

feed: du historical transaction prices

As it turns out the current version / api endpoint we use does not have historical transaction price in its response: we only have time, size and buyer.
We played around with version string in request header and get params, but didn't get a different response.
Presumably it's the API endpoint that matters.
https://m.poizon.com/product/lastSoldList? --> https://app.poizon.com/api/v1/app/product/ice/lastSoldList
Our current sign value does not with the second api endpoint: 500 or 403 are both triggered.

strategy: 30p max bid mark

du: smarter starting point

It seems we now do a dumb full-site listing, often times finding t-shirts etc.
We should have a smarter query to filter down to shoes.

strategy reporting: add 30% CNY price benchmark to report

Jian found this benchmark to be particularly useful in estimating if we want to bid on an du-overpriced pair: idea being even if we don't sell at the overpriced value, we can still make a decent margin. We should prioritize implementing this.

strategy: modeling prices for sold out sizes

architecture: reasonable format and storage for future market data research and identifying / autocleaning botched runs

feed: goat

strategy: volume indication

With du integrated, stockx's past trades may not be a good indicator for volume in Chinese market. We should try to extract more info from du / fc responses.

stockx: parsing failed

after stockx update last week, our api endpoint is no longer working.
query still works but getting individual items / book building is busted.

strategy: show px in CNY as well

strategy: integrate du

feed: static mapping where possible

We expect much of the data we retrieve to be fairly static: style id, name, release date, color, and even url. Just the price and volume / transaction history are not. We could save some queries / traffic if we cache the static ones.

strategy: improve matching rate, drop false positive rate

Idea: use product IDs available on stockx and flightclub

accounting: inventory tracking

strategy: volume-discount and turnaround-time-discount modeling

feedv2: unify time representation, records sorting, fetch du ask_price in update mode, clarify du bid_price in update mode

Both js and python should use iso8601.

strategy: filter out pre-release models

Pre-release models are deemed volatile and risky given our long holding time. For now we should exclude them from our results, after we figure out #22 .

architecture: documentation

feedv2: investigate match rate between du and stockx

The current query model leaves many entries unmatched:
dropping 587 keys in du.mapping.20191221-150959.csv not present in current intersection
dropping 1051 keys in stockx.mapping.2019-12-21T18:58:50.487Z.csv
write 213 merged result to merged.csv

This is unideal and should be investigated.

feed: extracting more info from website / json response

We need the per-model-size last sales at price from stockx.

feedv2: investigate du feed sizes fail to parse, data for none sizes, drop funky gender logic

gender can be supplied from stockx response.
du feed with static size eu <-> us mapping seems not to work out for many models. Investigate why.
du writes None.json for size None, this definitely seems funky.

feed: original style ID is broken in email reports

Since our update to use sanitized style ID in sx scrape results we lose the original style ID in email reports.
It's not critical now as we have sell.flightclub.com url for all our items, but we should still bring original style ID back.

feedv2: investigate data drop

A strategy run suggests the following:

total (style_id, size) pairs 28464
total (style_id, size) pairs 3579 with data
total (style_id, size) pairs 3579 with fresh data
total (style_id, size) pairs 1379 with fresh transactions
total (style_id, size) pairs 524 satisfying profit cutoff ratio (bid to last) of 0.01

We need to investigate why the number dropped by so much in the has-data filter, right after fresh update runs.

strategyv2: work with output of feedv2

feed: du clean up

Clean up unneeded code in du scrapy and add comments.
We should also store more info like past transactions even though we may not use them now.

strategyv2: query stockx again if required.

This would allow us to display data like volatility, high, low, etc, which should help decision making.

history / goal / roadmap: feed + strategy

If we implement #31, we can

combine feed and strategy into one binary,
have a stream of sampled data, and
a reasonable enough scoring mechanism

s.t.

given a risk configuration (maxpos, flatvalue)

the binary can

find optimal models to allocate asset on
automatically place bids, or provide a human with high level stats and ask for confirmation

A human only needs to configure the allocation, approve the program's decisions, and ship the shoes out.

zhehaowang / sneaky Goto Github PK

sneaky's People

Contributors

Stargazers

Watchers

Forkers

sneaky's Issues

Recommend Projects

Recommend Topics

Recommend Org