zhehaowang / sneaky Goto Github PK
View Code? Open in Web Editor NEWFeed and strategy for cross-venue Sneakers trading (Du, StockX).
License: GNU Lesser General Public License v3.0
Feed and strategy for cross-venue Sneakers trading (Du, StockX).
License: GNU Lesser General Public License v3.0
It happened a few times that we missed some data we could've retrieved from the response, e.g. style id, release date.
Instead of re-fetching all the data on each run to retrieve those we should expose a flag to save the entirety of the response we get and when looking for / experimenting with something we don't have to refetch. We'd also have a golden copy to test parsing with.
Often times when current crossing margin rate exceeds 100% the pair is over-priced at du: nearby sizes all sell at a much lower price.
We should discount overly lucrative pairs as a start in our scoring.
In the longer term, an ideal solution would be to use a more up-to-date du api to keep track of historical transaction px. Or coming up with a reasonable estimate according to prices of nearby sizes. Historical listing prices may help as well.
E.g. du demonstrated a 30% swing in a day on a recently released pair. Our turnaround time is long enough that we need to be careful about gauging volatility while we hold the product.
It's useful to under what prediction we made the decision to bid for a pair, and how much we want to list at to make a satisfactory amount. Such should be written as annotation (current by hand?) into a shared accounting doc.
Instead of print
we should store url and exceptions in an exceptions log, and report how many page parsed, how many succeed, and how many exception'ed.
stockx appears to be one of those sites constantly upgrading their anti-bot mechanism.
On 06/02/19 my auth requests get through if they have User-Agent set.
On 06/09/19 I had to add Referrer, Origin and Content-Type.
On 06/12/19 I had to add these to get_details requests as well, and I still get 403 after the first few requests. As a short term solution perhaps a rate limit, or multiple sources, will do.
Goal of this is to be able to continue scraping stockx uninterruptedly. I can think of
I believe ultimately they want people to use their api but what I'm doing now is probably too brutal.
we added support for historical transaction time lately. in scoring mechanism we should consider discounting it if there's a large gap between historical prices and current listing px. Current listing px could be a seller charging unreasonably high.
With du we introduced size conversion. This should be per-brand per-gender, right now we assume men.
2596c9b introduced discrepancies in mid profit rate calc: the 30p marks in report and mid-price-percent in report have different treatment of du flat fees and shipping fees, presumably.
We should unify these.
reference: https://github.com/levislin2016/du-app-api
Hurdle 1: log in. Without a web interface, it seems the best we can do is to MitM ourselves: manual proxy to laptop, install a cert on phone, capture traffic and decrypt https headers on laptop.
There are commercial apps out there providing an integrated solution: Fiddler and Charles Proxy both do this. With the latter we were able to see what exactly phone sent to authenticate itself, although replicating that didn't seem to work yet: 'pwd' based log in appeared no longer present in Du v4.1.1.
We could either try a few more combinations with our iOS Du v4.1.1, or switch to an Android Du v3.5.5 and try pwd log in on that.
send as attachment a csv file. split out report object serialization.
Feed has confirmed transaction history like this one:
Du 304775-101 saw a 42.5 size buy at 3000CNY on Dec 23, and the immediate one before was 5079CNY on Dec 12.
Such big discrepancy will hurt how well to-last conditions can make decision.
Similar investigation should be
These are probably the biggest blockers before autobid.
one way to facilitate market data study is to instead of rescraping all each time we run a scraper, keep the thing running all the time.
When we get a quote we schedule an event one day later to get another get request for the same item one day later.
A few advantages of this way:
Static mapping (url we identified from each site that we would care about) could be scraped at a lower frequency. Each time we start we start by sending requests to those we have missed that last mark on, and works on scheduled events from there.
This unfortunately requires an overhaul of the current system.
We should also consider the system more carefully with this.
In aa30861 we noticed match rate between sx and fc dropped from
'found 187 model matches, 2149 (model, size) matches'
to
'found 71 model matches, 998 (model, size) matches'
This may be associated with the latest scraping missing some items, or faulty / stricter logic since late.
This item is to further investigate if this drop is expected.
As it turns out the current version / api endpoint we use does not have historical transaction price in its response: we only have time, size and buyer.
We played around with version string in request header and get params, but didn't get a different response.
Presumably it's the API endpoint that matters.
https://m.poizon.com/product/lastSoldList? --> https://app.poizon.com/api/v1/app/product/ice/lastSoldList
Our current sign value does not with the second api endpoint: 500 or 403 are both triggered.
It seems we now do a dumb full-site listing, often times finding t-shirts etc.
We should have a smarter query to filter down to shoes.
Jian found this benchmark to be particularly useful in estimating if we want to bid on an du-overpriced pair: idea being even if we don't sell at the overpriced value, we can still make a decent margin. We should prioritize implementing this.
With du integrated, stockx's past trades may not be a good indicator for volume in Chinese market. We should try to extract more info from du / fc responses.
after stockx update last week, our api endpoint is no longer working.
query still works but getting individual items / book building is busted.
We expect much of the data we retrieve to be fairly static: style id, name, release date, color, and even url. Just the price and volume / transaction history are not. We could save some queries / traffic if we cache the static ones.
Idea: use product IDs available on stockx and flightclub
Both js and python should use iso8601.
Pre-release models are deemed volatile and risky given our long holding time. For now we should exclude them from our results, after we figure out #22 .
The current query model leaves many entries unmatched:
dropping 587 keys in du.mapping.20191221-150959.csv not present in current intersection
dropping 1051 keys in stockx.mapping.2019-12-21T18:58:50.487Z.csv
write 213 merged result to merged.csv
This is unideal and should be investigated.
We need the per-model-size last sales at price from stockx.
Since our update to use sanitized style ID in sx scrape results we lose the original style ID in email reports.
It's not critical now as we have sell.flightclub.com url for all our items, but we should still bring original style ID back.
A strategy run suggests the following:
total (style_id, size) pairs 28464
total (style_id, size) pairs 3579 with data
total (style_id, size) pairs 3579 with fresh data
total (style_id, size) pairs 1379 with fresh transactions
total (style_id, size) pairs 524 satisfying profit cutoff ratio (bid to last) of 0.01
We need to investigate why the number dropped by so much in the has-data filter, right after fresh update runs.
Clean up unneeded code in du scrapy and add comments.
We should also store more info like past transactions even though we may not use them now.
This would allow us to display data like volatility, high, low, etc, which should help decision making.
If we implement #31, we can
s.t.
the binary can
A human only needs to configure the allocation, approve the program's decisions, and ship the shoes out.
most of what we care lives in main. we should specify what it can do and what we care, make building book and those we don't care optional, and refactor some functionality out of main.
take into account du's fees (3.5%, 1%, 8, 15, 10 CNY) for profit margin calculation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.