Code Monkey home page Code Monkey logo

locks's People

Contributors

benoitc avatar timclassic avatar uwiger avatar x0id avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

locks's Issues

Deadlock due to outdated lock_info

I was testing locks using my testcase.
I believe that there is a bug in the lock_info handling of locks_server and locks_agent, which may cause deadlock.

My testcase has 3 concurrent clients/agents, namely C1, C2, and C3, and 3 locks, [1], [2], and [3].

  • C1 requests locks in the order of [[1], [2], [3]]
  • C2 requests locks in the order of [[2], [3], [1]]
  • C3 requests locks in the order of [[3], [1], [2]]

Here is how the bug happened (in sketch):

  1. C1, C2, and C3 competed on locks.
    Due to the deadlock resolving algorithm, C1, C2 eventually acquired all locks and finished.

  2. In the resolution process, C3 got lock_info of [2] (due to locks_agent:send_indirects/1)
    even C3 hadn't reach the point of requesting it, which means C3 was not in [2]'s queue.

  3. The locks_server remove the local lock_info entry of [2] since the queue is empty now.
    This effectively resets the vsn of the lock_info.

  4. C3 started requesting [2], but the locks_server would respond with lock_info that
    had lower vsn than what C3 was told with. Thus C3 got stuck.

I've tried to fix by not removing lock_info entries in locks_server, but my fix seems to fail the test in other ways. Maybe this breaks the algorithm?

In lock_objects the last lock_nowait is missing the last parameter, Req?

-spec lock_objects(pid(), objs()) -> ok.
%%
lock_objects(Agent, Objects) ->
lists:foreach(fun({Obj, Mode}) when Mode == read; Mode == write ->
lock_nowait(Agent, Obj, Mode);
({Obj, Mode, Where}) when Mode == read; Mode == write ->
lock_nowait(Agent, Obj, Mode, Where);
({Obj, Mode, Where, Req})
when (Mode == read orelse Mode == write)
andalso (Req == all
orelse Req == any
orelse Req == majority
orelse Req == majority_alive
orelse Req == all_alive) ->
lock_nowait(Agent, Obj, Mode, Where);
(L) ->
error({illegal_lock_pattern, L})
end, Objects).

read vs write locks

I have a quick question corresponding to the way locks are shared or exclusive.

Suppose I have write locks on [Resource, OID] and requesting a read lock on [Resource]. Will the clients be able to handle reads while the client that requested a write is modifying the resource? Or the write lock will exclude all reads on the resource?

Application *locks* has stopped on double write-lock

I have two actors which works approximately in the same time. Each of them begins transaction. Each of them acquires read lock on the same oid(). Then first tries to upgrade read lock to write lock. Second does the same and application crashes immediately:

Logs of the first actor:

Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
(n1@dch-mbp)1> application:ensure_all_started(locks).
{ok,[locks]}
(n1@dch-mbp)2> {Agent, TrRes} = locks:begin_transaction().
{<0.46.0>,{ok,[]}}
(n1@dch-mbp)3> locks:lock(Agent, [table], read).
{ok,[]}
(n1@dch-mbp)4> locks:lock(Agent, [table], write).
=ERROR REPORT==== 21-Oct-2015::14:45:19 ===                                                                                                                                                                [20/376]
** Generic server locks_server terminating 
** Last message in was {'$gen_cast',{surrender,[table],<0.55.0>}}
** When Server state == {st,{locks_server_locks,locks_server_agents},
                            {dict,2,16,16,8,80,48,
                                  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                   [],[]},
                                  {{[],[],[],[],[],[],[],
                                    [[<0.55.0>|#Ref<0.0.0.76>]],
                                    [],[],[],[],[],[],
                                    [[<0.46.0>|#Ref<0.0.0.69>]],
                                    []}}},
                            <0.44.0>}
** Reason for termination == 
** {function_clause,[{locks_server,queue_entries_,
                                   [[{entry,<0.55.0>,<0.53.0>,4,direct}]],
                                   [{file,"src/locks_server.erl"},{line,211}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,214}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,214}]},
                     {locks_server,queue_entries_,1,
                                   [{file,"src/locks_server.erl"},{line,212}]},
                     {locks_server,queue_entries,1,
                                   [{file,"src/locks_server.erl"},{line,207}]},
                     {locks_server,notify,3,
                                   [{file,"src/locks_server.erl"},{line,193}]},
                     {locks_server,handle_cast,2,
                                   [{file,"src/locks_server.erl"},{line,142}]},
                     {gen_server,handle_msg,5,
                                 [{file,"gen_server.erl"},{line,604}]}]}

=INFO REPORT==== 21-Oct-2015::14:45:19 ===
    application: locks
    exited: shutdown
    type: temporary
** exception error: {cannot_lock_objects,[{req,[table],
                                               read,
                                               ['n1@dch-mbp'],
                                               0,all},
                                          {req,[table],write,['n1@dch-mbp'],1,all}]}
     in function  locks_agent:await_reply/1 (src/locks_agent.erl, line 397)
     in call from locks_agent:lock_/6 (src/locks_agent.erl, line 380)
(n1@dch-mbp)5> application:which_applications().
[{stdlib,"ERTS  CXC 138 10","1.19.4"},
 {kernel,"ERTS  CXC 138 10","2.16.4"}]

Logs of the second actor:

Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
(n2@dch-mbp)1> 
User switch command
 --> r 'n1@dch-mbp'
 --> c
Eshell V5.10.4  (abort with ^G)
(n1@dch-mbp)1> {Agent, TrRes} = locks:begin_transaction().
{<0.55.0>,{ok,[]}}
(n1@dch-mbp)2> locks:lock(Agent, [table], read).
{ok,[]}
(n1@dch-mbp)3> locks:lock(Agent, [table], write).
** exception error: {cannot_lock_objects,[{req,[table],
                                               read,
                                               ['n1@dch-mbp'],
                                               0,all},
                                          {req,[table],write,['n1@dch-mbp'],1,all}]}
     in function  locks_agent:await_reply/1 (src/locks_agent.erl, line 397)
     in call from locks_agent:lock_/6 (src/locks_agent.erl, line 380)

I am new to locks so I am trying to learn how it works. In some sense I need lock upgrade functionality, that is why I was curious how it works. Maybe I miss something and what I did goes against very basics of what locks should do.

locks_leader cannot coexist with nodes not running locks application

locks_leader makes the assumption that all connected nodes are running the 'locks' application. If a node not running 'locks' connects to a node running a locks_leader process, the locks_leader process deadlocks.

Steps to reproduce

Start node 'a'.
Start the 'locks' application.
Start a locks_leader process.

Observe that the locks_leader process on node 'a' is the leader and responsive.

Start named node 'b'. Connect to 'a'.

Observe that the locks_leader process on node 'a' is now stuck in safe_loop and no longer responds to normal messages.

Investigation

locks_leader receives nodeup message, processed on line 558 of locks_leader.erl
The new node is not in nodes, so include_node (line 693) is called.
include_node calls locks_agent:lock_nowait
locks_agent sends a {locks_agent, _, 'waiting'} message, handled on line 571
Process gives up leadership, causing it to enter safe_loop, but a response from the newly connected node will never come since it is not running 'locks'.

I'm not sure what to do next. I don't know the locks application well enough to attempt a fix. Any guidance would be helpful.

cancel lock? what's needed

How could we cancel/release a lock before the end of the transaction? Any idea how it could be implemented?

Crash releasing write lock with outstanding read lockers

I was playing with locks to see if I could use it to help me synchronize shutdown of a process that may have in-flight new requests. On a single node I started three lock agents. I first took out a write lock with one, and then requested read locks with the other two, and they blocked (as expected).

Then, I ended the transaction for the agent that had the write lock with end_transaction/1, and both of the blocking read locks crashed.

To replicate, I added the following test to locks_tests.erl, added it to run_test_/0, and ran it:

one_lock_wrr_clients() ->
    L = [?MODULE, ?LINE],
    script([1,2,3],
           [{1, ?LINE, locks, lock, ['$agent', L, write], match({ok,[]})},
            {2, ?LINE, locks, lock_nowait, ['$agent', L, read], match(ok)},
            {3, ?LINE, locks, lock_nowait, ['$agent', L, read], match(ok)},
            {1, ?LINE, locks, end_transaction, ['$agent'], match(ok)},
            {2, ?LINE, locks, await_all_locks, ['$agent'],
             match({have_all_locks, []})},
            {3, ?LINE, locks, await_all_locks, ['$agent'],
             match({have_all_locks, []})}
           ]).

I am not confident that I should see {have_all_locks, []} for both calls to await_all_locks/1, but that doesn't really matter, because I get the following crash:

=ERROR REPORT==== 16-Oct-2015::01:14:18 ===
    locks_agent: aborted
    reason: function_clause
    trace: [{locks_agent,lock_holder,
                         [[{w,[{entry,<0.81.0>,<0.80.0>,3,direct},
                               {entry,<0.79.0>,<0.78.0>,2,direct}]}]],
                         [{file,"src/locks_agent.erl"},{line,1080}]},
            {locks_agent,'-pp_locks/1-lc$^0/1-0-',1,
                         [{file,"src/locks_agent.erl"},{line,1077}]},
            {locks_agent,all_locks_status,1,
                         [{file,"src/locks_agent.erl"},{line,1068}]},
            {locks_agent,check_if_done,2,
                         [{file,"src/locks_agent.erl"},{line,933}]},
            {locks_agent,handle_call,3,
                         [{file,"src/locks_agent.erl"},{line,509}]},
            {locks_agent,handle_msg,2,
                         [{file,"src/locks_agent.erl"},{line,266}]},
            {locks_agent,loop,1,[{file,"src/locks_agent.erl"},{line,261}]},
            {locks_agent,agent_init,3,
                         [{file,"src/locks_agent.erl"},{line,228}]}]
ERROR: {mismatch,
           [exit,
            {{function_clause,
                 [{locks_agent,agent_init,3,
                      [{file,"src/locks_agent.erl"},{line,235}]}]},
             {gen_server,call,[<0.79.0>,await_all_locks,infinity]}},
            normal,[]]}

Even though this crash was created with a lock_nowait/await_all_locks pair, it is exactly the same crash I received in my original testing.

The analysis I have done so far makes me think this is related to lock upgrades from read to write. Are lock upgrades supposed to happen automatically? Am I fundamentally misunderstanding the API? I don't know yet if this crash occurs if each agent is on a different node.

When using more than on 2 nodes, first 2 nodes are hanged and not responding to gen_server calls

After I started 2 locks on 2 nodes, everything is working as expected: One node become a leader. locks_leader process on both nodes is in gen_server mode(current function is gen_server/loop). When I starting 3rd node, first 2 nodes become unresponsive to gen_server calls(current function on lock_leader process on this 2 nodes is lock_leader/safe_loop). lock_leader is waiting message have_all_locks, but not getting it.

I modify check_if_done function(line 787) in locks_agent.erl to resolve it

check_if_done(#state{pending = Pending} = State, Msgs) ->
case ets:info(Pending, size) of
0 ->
Msg = {have_all_locks, []},
notify_msgs([Msg|Msgs], have_all(State));
_ ->
check_if_done_(State, Msgs)
end.

After change, Leader node is in gen_server/loop, but other 2 nodes are in gen_leader/safe_loop.

Modified function get_locks (line 1194) to handle case when ets table is empty
get_locks([H|T], Ls) ->
case ets_lookup(Ls, H) of
[L] -> [L | get_locks(T, Ls)];
[] -> get_locks(T, Ls)
end;
get_locks([], _) ->
[].

After all that, I still having issues and continue to debug. Could you please check, and tell me if I am on the right path. Thank you

compile error in r16

rebar compile
==> examples (compile)
==> locks (compile)
src/locks_agent.erl: error in parse transform 'locks_watcher': {{badmatch,
{ok,
{locks_watcher,
[{abstract_code,
no_abstract_code}]}}},
[{locks_watcher,get_exprs,2,
[{file,"src/locks_watcher.erl"},
{line,111}]},
{locks_watcher,transform,1,
[{file,"src/locks_watcher.erl"},
{line,71}]},
{locks_watcher,transform,1,
[{file,"src/locks_watcher.erl"},
{line,96}]},
{locks_watcher,transform,1,
[{file,"src/locks_watcher.erl"},
{line,100}]},
{locks_watcher,transform,1,
[{file,"src/locks_watcher.erl"},
{line,95}]},
{locks_watcher,transform,1,
[{file,"src/locks_watcher.erl"},
{line,98}]},
{locks_watcher,transform,1,
[{file,"src/locks_watcher.erl"},
{line,98}]},
{locks_watcher,transform,1,
[{file,"src/locks_watcher.erl"},
{line,100}]}]}
make: *** [compile] Error 1

how the leader election compares to raft?

I am curious how the implementation compares with raft? Did you have a look on it?

Also can we add dynamically a member and remove it from the cluster? How many participants can it handle?

No-one becomes a leader after netsplit

I found an unexpected behaviour of locks_leaders; I confirmed it using test_cb.erl. The repro is very simple:

  1. start test_cb on nodes A and B; each process is a leader
  2. connect A with B; one of test_cb processes will become the leader
  3. disconnect A and B; each process is a leader again
  4. connect A with B; both processes will be "elected" at the same time, after which both will surrender.

test_cb node A logs
test_cb node B logs

release tag ?

What prevent to tag a release right now? Would be cool to have locks available soon hex.pm :)

locks_leader:call/2 timeouts after joining new node

Trying to use locks app (master branch, c9b585a) I've got interesting failure in a scenario described below.

There were 4 nodes alive connected to each other - A, B, C and D. Node D was a leader.
One time new node E was started, it discovered other running nodes and connected to them.
Before new node E even connected to other nodes, it decided it was a leader.

Once node E connected to other nodes, it sent its leadership info to them. For all 3 non-leaders A, B and C node E locks_leader’s callback elected(State, Election, Pid) was called with "Pid" of the “joined” node (A, B and C) process. In its turn, node’s A, B and C locks_leader’s callback surrendered(State, Synch, Election) was called.

When new leader E connected to old leader D, netsplit happened. Node D won, it’s locks_leader’s callback elected(State, Election, undefined) was called and all other nodes (A, B, C and E) received notification in a callback surrendered(State, Synch, Election), so node E was informed that it was not a leader anymore.

Since then all calls locks_leader:call/2 made in nodes A, B and C ended up with timeout. Same call made in D and E worked as usual with no errors. So it seems that internal state of locks leader of the "passive" nodes A, B and C was compromised by fighting leaders D and E...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.