Code Monkey home page Code Monkey logo

Comments (7)

jongillham avatar jongillham commented on August 27, 2024

Hi @astec and thanks for your interest. You are right, caching is hard and it's going to take me a while to get the nds cache strategy back in my head so please forgive me if the following is not 100% accurate.

First of all I think your analysis of step 7 is incorrect which might explain your issue with the nds strategy:

  1. nds.putMulti() executes deferrer that removes from memcache tx.lockMemcacheItems.

I think this statement is wrong. Within a transaction a lock is only placed here when we are sure the transaction will succeed and it is never removed. Although it expires after memcacheLockTime which happens to be 32 seconds (2 seconds longer than the maximum time a transaction can last).

However, outside a transaction memcache lock is removed in the putMulti method as you suggest. Removing the lock from within the nds.putMulti method within a transaction would not work for a couple of reasons, one being a subsequent function within the transaction might update the same entity but we would have already removed the lock.

The reason we create memcache lock items in the first place is to ensure that any stale cache is guaranteed to be removed before a subsequent Put and not returned by Get. You can see the lock being used in the lockMemcache and loadMemcache functions.

If we used your suggestion and the actual datastore.Put failed but not the memcache.Set then we would have updated items in memcache but not the datastore and breach the consistency guarantees.

I hope this helps. It's fun to think about all the edge cases and error pathways in order to keep cache consistency. Maybe/hopefully you can come up with a better strategy!

from nds.

trakhimenok avatar trakhimenok commented on August 27, 2024

@jongillham,

First of all sorry I made some links to my fork instead of your repo and some lead to wrong lines due my recent commit - edited to permalinks.

I agree that statement 7 is incorrect, overlooked the if condition.

I still believe you can save entities to memcache on Put() consistently within nds.RunInTransaction().

Consider this flow:

  1. Instantiate tx before calling datastore.RunInTransaction(), at line 28 instead of line 30.
  2. During PutMulti() within transaction lock the items in memcache (same as it's done for non-transactional flow now)
  3. Instead of returning immediately at line 29 check for error and if nil call memcacheSetMulti for succesfully locked items.
  4. If transaction or memcacheSetMulti() failed probably try to delete locked items?
  5. Profit!?

If put/transaction failed we don't save to memcache. While tx is in progress other process would not override the items as it's locked. If memcache lock was not successful we simply do not put entities to memcache and basically fall back to current behaviour. If everything works as intended (I guess would be 99.99% in my use case) we gain 50% less datastore reads for read-tx_read-tx_put scenarios.

Am I missing anything?

from nds.

jongillham avatar jongillham commented on August 27, 2024

I like that idea and from what I can see this would work 99.99% of the time as you suggest. However there is a 0.01% race condition that would lead to cache inconsistency. I imagine that under high instance and memcache load it could become an issue and incredibly difficult to debug.

Consider the scenario that will cause cache consistency:
t = 0s: A transaction is created and a Put is called within it. A memcache lock as you suggest in point 2 is created. This lock lasts for 32 seconds at most but can be shorter if it is purged.
t = 30s: Processing within the transaction commits as with your point 3.
t = 30s - 32s: The instance decides to run another goroutine without memcacheSetMulti being called.
t = 32s: The original lock at t = 0s expires from memcache in the best case.
t = 33s: Another datastore Put updates the entity referenced at t = 0 with a new value.
t = 34s: Finally point 4 gets run because the instance has started running our original goroutine again and memcacheSetMulti is called, it now replaces the new memcache value set at t = 33s with the old one stale one that was originally added at t = 0.

I often see long pauses and/or memcache purges between App Engine RPC calls that could make this race condition a reality.

from nds.

trakhimenok avatar trakhimenok commented on August 27, 2024

At first I thought it's a very good point and we are doomed.

But after considerations I think there is no race or it can be mitigated.

Basically we don't care about evicted or expired lock items in memcache. We just need to know if the lock was successful in first place so we can try to do CompareAndSwap later on transaction success.

If the lock evicted/expired while the 1st transaction is in progress and the 2nd transaction overwrites entity with another value then:

  1. If 2nd datastore transaction was successful the 1st one will fail on commit (thanks to datastore), so the 1st one would not write to memcache at all. (so no race?)

  2. Even if it's not failed it's still should not be an issue as we should use memcache .CompareAndSwapMulti(). Seems that would require another memcache.GetMulti() right after transaction successful. When we got the items before CompareAndSwap we should check it's our own locks (that mechanics you already have with random numbers). If our own - CompareAndSwap, if not (I don't see how we can get here) - delete just in case.

from nds.

jongillham avatar jongillham commented on August 27, 2024
  1. If 2nd datastore transaction was successful the 1st one will fail on commit (thanks to datastore), so the 1st one would not write to memcache at all. (so no race?)

Unfortunately the datastore transaction contention semantics will not help us at this point because it would have already committed the transaction at this line which you mention in point 3 in the comment above.

  1. Even if it's not failed it's still should not be an issue as we should use memcache .CompareAndSwapMulti(). Seems that would require another memcache.GetMulti() right after transaction successful. When we got the items before CompareAndSwap we should check it's our own locks (that mechanics you already have with random numbers). If our own - CompareAndSwap, if not (I don't see how we can get here) - delete just in case.

I currently can't think of a reason why this wouldn't work! I'm going to have to think about its interaction with Get and Delete datastore operations some more.

from nds.

trakhimenok avatar trakhimenok commented on August 27, 2024

Good to know we have a hope.

I don’t get your point reagrads datastore tx semantic. As i stated before the memcache check & put should happen outside of datastore tx.

Instead of “return datastore.RunInTransaction()” you should do “err = datastore.RunInTransaction()“ and if err is nil do the memcache checks & put. If the 2nd transaction committed the 1st should fail, we get err!=nil and instead of saving to memcache do the delete from memcache (to clear locks in case if failed due to non conflict). They can’t both return err==nil if updating same entity in parallel, isn’t it? If they do it’s mean they are not overlapping and we are fine to put to memcache. Of course developer should first call Get() within transaction but its the same requirement as for nds-less code.

So I still don’t see race here.

P.S.
Sorry for formatting, writing from phone.

from nds.

trakhimenok avatar trakhimenok commented on August 27, 2024

By the way probably the tx.locItems should be a map by key rather a slice to handle tx autoretries gracefully.

I suspect that if the f() gets executed multiple times you can get duplicates in tx.lockitems and get unpredictable behavior.

from nds.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.