Code Monkey home page Code Monkey logo

Comments (12)

eprbell avatar eprbell commented on May 29, 2024 1

The way to fix that is to add a .pyi file in the src/stubs directory, which contains a few examples of pyi stubs for other libraries that don't have typing information. The .pyi file contains typed "declarations" for the symbols you're using (no need to add declarations for everything in the library, just what you need).

from dali-rp2.

eprbell avatar eprbell commented on May 29, 2024

Thanks for reporting a typo and for the detailed thread: I'm always happy when people get into the guts of the code :-) . I'll fix the typo: the existing code should work even with the different dataframe format, because it accesses only line 0.

Curious: have you actually hit the 1-day case? The code keeps trying larger and larger time windows until it does find some price data. I was hoping Coinbase Pro had historical prices with 1-min precision most of the time and that we almost never needed more than 300 or 900 seconds, so that's why I was asking if you found a case in which the code actually reached the 84600 value.

The reason I picked the highest price is just determinism and ability to explain easily what is happening (e.g. during an audit): I want the code to always pick a value that reflects a simple unchanging principle. Other such principles could be: lowest price, open price, close price, etc. This makes it easy to explain what the code is doing. I would disagree that highest price is not representative: we don't know what the actual price was within the selected time window (whether it's 1-min or 900-min or other), so this means it could be any of the prices within that window and therefore high price is as valid a guess as any other (i.e. low, open, close).

In your algorithm I see a couple of issues (correct me if I'm missing something):

  • the selected window is the smallest for which there is data (the code tries with 1 min and if there is no data, it retries with 5 mins, etc.), so there is no middle timestamp;
  • the principle for selecting the price is hard to explain (again, audit).

However I like your idea of being able to pass the "selection principle" as a parameter to the -s switch: perhaps -s could accept an argument like open, close, low, high. Unfortunately I won't have the time to implement it in the near future, but if you'd like to submit a PR for this, I'll be happy to review it!

from dali-rp2.

stevendavis avatar stevendavis commented on May 29, 2024

I'll be happy to work on a PR and contribute something besides bombarding you with questions.

I've never hit the 1-day case, it's something I noticed when I cut and pasted the code into a script to get a better understanding of how it worked. I'm thinking it will be a good idea to generate a more descriptive log message to document the source of the price lookup. This will tell us how far the retry mechanism has to go to find a price.

Something to be aware of with the highest price approach is that as the retry mechanism increases the duration of the quotation bar, the open timestamp and the open price stay the same, but the high price strays further and further away from the open price and probably further and further away from the actual trade price. You can see this in the ETH examples above. The start time of the quotation bar is never more than one minute away from the actual trade time (assuming the trade time is known precisely).

Thinking about my proposed algorithm a bit more, I think an "interpolated" price could work well -- a weighted average of the open price and the close price, where the weights always add to 1.0. If the actual trade time is at the start of the quotation bar, the interpolated price will match the open price. If the actual trade time is at the end of the quotation bar, the interpolated price will match the close price. As the actual trade time moves farther away from the quotation start time, the relative weight of the open price decreases and the weight of the close price increases proportionally. Sounds complicated, but it's really very simple.

Are you opposed to adding a section to the config file? We may add multiple historical quote providers in the future and may need additional configuration parameters to control the price lookup behavior. For now, I'm envisioning something like this:

[price_lookup]
selection = open
on_bad_quotes = ignore  # or error

I just noticed something else. Something looks not right in this code. It looks like the first 3 lines don't have any effect. The last line always overwrites the notes variable. I can address this in the PR as well. What is the intended behavior?

    notes: str = ""
    if transaction.notes:
        notes = f"{notes}; "
    notes = f"spot_price read from Coinbase Pro; {transaction.notes if transaction.notes else ''}"

from dali-rp2.

eprbell avatar eprbell commented on May 29, 2024

I'll be happy to work on a PR and contribute something besides bombarding you with questions.

Awesome! And keep the questions/comments coming: they're very helpful in improving both code and docs.

I've never hit the 1-day case, it's something I noticed when I cut and pasted the code into a script to get a better understanding of how it worked. I'm thinking it will be a good idea to generate a more descriptive log message to document the source of the price lookup. This will tell us how far the retry mechanism has to go to find a price.

Better logs: sounds good.

Something to be aware of with the highest price approach is that as the retry mechanism increases the duration of the quotation bar, the open timestamp and the open price stay the same, but the high price strays further and further away from the open price and probably further and further away from the actual trade price. You can see this in the ETH examples above. The start time of the quotation bar is never more than one minute away from the actual trade time (assuming the trade time is known precisely).

Thinking about my proposed algorithm a bit more, I think an "interpolated" price could work well -- a weighted average of the open price and the close price, where the weights always add to 1.0. If the actual trade time is at the start of the quotation bar, the interpolated price will match the open price. If the actual trade time is at the end of the quotation bar, the interpolated price will match the close price. As the actual trade time moves farther away from the quotation start time, the relative weight of the open price decreases and the weight of the close price increases proportionally. Sounds complicated, but it's really very simple.

My main concern with an interpolated price is that it's created artificially and doesn't necessarily reflect real price (the interpolated price may not even have occurred in reality). For the same reason I have no problem with high, low, open, close because they are actual numbers returned by the exchange.

Are you opposed to adding a section to the config file? We may add multiple historical quote providers in the future and may need additional configuration parameters to control the price lookup behavior. For now, I'm envisioning something like this:

[price_lookup]
selection = open
on_bad_quotes = ignore  # or error

Not opposed: it's a good idea. Perhaps add the provider as well.

I just noticed something else. Something looks not right in this code. It looks like the first 3 lines don't have any effect. The last line always overwrites the notes variable. I can address this in the PR as well. What is the intended behavior?

    notes: str = ""
    if transaction.notes:
        notes = f"{notes}; "
    notes = f"spot_price read from Coinbase Pro; {transaction.notes if transaction.notes else ''}"

Ouch, nice catch. This is a vestige of an older implementation. You're right that the correct implementation is just line 4: the first 3 lines can be removed.

from dali-rp2.

stevendavis avatar stevendavis commented on May 29, 2024

The historical_price_cache is storing only a single quote, currently the high price, for the AssetAndTimestamp. To allow the user to select between open, close, high, low, or some kind of average price, we will need to store all 4 prices. This has some impacts for users with an existing cache. 1) For example, if the user wants to set the spot price to the open price, we cannot use the cached the high price, we need to go back to CoinBase Pro and fetch the prices again. 2) Some users might be upset if we destructively overwrite the existing cache. Maybe we will need a way to detect a single-price cache and non-destructively upgrade to a multi-price cache when loading and saving the cache. Not sure what is the best approach here. @jamesbaber1 has a different caching approach mentioned in #7 (comment).

from dali-rp2.

eprbell avatar eprbell commented on May 29, 2024

The existing DaLI cache is fairly simple and doesn't even have the ability to check for new values in the native source. As it is it's meant to be used by developers as a way to speedup runtime. It's not that useful to users, at least until we implement the ability to update the cache with new values.

I would suggest a simple approach like this:

  • read element from cache;
  • if the element is not a 4-value list, then ignore it and overwrite the cache with the 4-value list.

With this approach the worst that can happen is that one run will be slower than the user expects.

from dali-rp2.

jamesbaber1 avatar jamesbaber1 commented on May 29, 2024

Currently how I have implemented my cache is in several levels. price history, raw trades, and then normalized trades. Which has help with debugging and also minimizing web requests when re-running.

The price history cache works like this:

  • Fetch the 1m ohlcv data for a pair according to the delta of time around the trade which in my case I set to the day, so all minutes from the day.
  • Currently using Panada's nearest method to match the trade time stamp with the closest minute that was available. (Some days are missing minutes for periods of time so sometimes a few minutes off is a good as you can do)
  • Save the data frame to a csv
  • next time price history is requested for the pair that csv is loaded first nearst value is scanned and if it is a minute off it requests data from the exchange and concatenates them, drops duplicates and reindexes the data frame then saves it again.

I'll have to run some speed tests,but there is alot of optimization that I could still do. At the moment just trying to ensure I'm using the most accurate data available.

Also was thinking about this too, cause so far have been using the close, but was going to circle back to this. Obviously your trade took place inside some candle stick resolution, and letting the user have a few options would be nice. Like say if the user want to always over report it chooses the highest value out of ohcl on a buy and the lowest value on the sell. Or more of an average etc like @stevendavis mentioned.

from dali-rp2.

stevendavis avatar stevendavis commented on May 29, 2024

I'm still becoming familiar with the developer guidelines and the new tools I have not used on other projects.

Mypy is generating the error below. What is right way to fix this?

$ mypy src tests
src\dali\transaction_resolver.py:19:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker  [import]
    import pandas as pd
    ^
src\dali\transaction_resolver.py:19:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 32 source files)

from dali-rp2.

jamesbaber1 avatar jamesbaber1 commented on May 29, 2024

Ah good to know. I have never attempted to package pandas a dependency before. Yea if you haven't used pandas before, it has some great use cases, especially when it comes to working with tables. Don't know if I could make a faster sorting and searching algorithm with just python lists for this use case, but I know pandas is pretty optimized and most of the compute is done in the binary and not python, so I figured it would be pretty fast, but would need to do some speed tests against a python only implementation to verify tho.

Either way its seems 90% of the time is spent on web requests and file I/O

from dali-rp2.

stevendavis avatar stevendavis commented on May 29, 2024

@eprbell Thanks for the tip. It was a struggle, but I was eventually able to stop the MyPy errors.

I've run into a snag with the unit tests. I modified the code to force UTC timezone, and now the TestODSOutputDiff unit tests are failing. I may be mistaken, but I think the unit test expected results are incorrect and will need to be updated. What do you think?

        transaction_utc_timestamp = transaction.timestamp_value.astimezone(timezone.utc)  # CB Pro API expects UTC only
        from_timestamp: str = transaction_utc_timestamp.strftime("%Y-%m-%d-%H-%M")
        retry_count: int = 0
        while retry_count < len(time_granularity):
            try:
                seconds = time_granularity[retry_count]
                to_timestamp: str = (transaction_utc_timestamp  + timedelta(seconds=seconds)).strftime("%Y-%m-%d-%H-%M")

from dali-rp2.

eprbell avatar eprbell commented on May 29, 2024

I'm not sure: can you post the code in a PR so I can see it?

from dali-rp2.

stevendavis avatar stevendavis commented on May 29, 2024

Closing after recent PRs.
#31
#36
#44

from dali-rp2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.