Hi, I am testing orchestrator with 5.7.17, Master and two slaves. Ha

GTID appears as disabled in the master, the web interface shows the button to en

GTID data from the master: <div class="snippet-clipboard-content notr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

no problem <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

GTID not found properly (5.7) and some graceful-master-takeover issues,about openark/orchestrator

Comments (30)

shlomi-noach commented on May 30, 2024

GTID appears as disabled in the master, the web interface shows the button to enable it, when obviously it is enabled in all the replication chain (GTID_MODE=ON)

Can you please issue:

select @@global.gtid_mode, @@global.gtid_purged

on your master?

This issue causes that the takeover doesn't use GTID (I guess)

That makes sense. We need to solve the GTID recognition problem.

Instance B was in read-only before the takeover, after the takeover, the read-only is not disabled, is this a feature or something that should I add via hooks? Should be nice to have a parameter to end the process in the status that you prefer, depending on the takeover reasons/conditions.

By default orchestrator does not RESET SLAVE ALL and does not SET GLOBAL read_only=0. To do both, set ApplyMySQLPromotionAfterMasterFailover: true

I apologize for the inconvenient name. I'll be working to minimize the number of configuration params.

Also, for any reason the role change old-master-> new slave doesn't work. It executes a CHANGE MASTER but apparently the replication username in the old master is empty, failing the change master operation (orchestrator user has SELECT ON mysql.slave_master_info in the cluster).

As per #57:

What's missing in this story is the MASTER_USER and MASTER_PASSWORD, which are likely to not exist, because the old master was like not having replication info.
So that leads to the case where even after positioning, the old master cant truly replicate from the promoted master. Nonetheless, it is placed in the correct position to assume replication once credential settings are applied.

The problem is orchestrator doesn't have the username & password of your replication user.

Finally, should be nice to add a feature to force to refactor the topology when you have one master and several slaves below. It requires moving slaves below the new elected master, just before the master-takeover. The process will take a bit longer, moving the slaves, and waiting until they are ready.

This can be easily scripted on the user's side. I really think that in the event of planned takeover the user should choose the identity of the new master. If orchestrator were to choose the identity -- fine, but no promises held that everything would work. Perhaps your setup is such where the promoted server would not be the one you'd expect.
You may find such statement confusing. Your own setup may be simple enough, but there are various setups that are not as simple to deal with: servers with no log-slave-updates (can happen with 5.7 GTID), a mixture of 5.6 and 5.7 etc.
Some servers may not be able to grab the VIP the current master has. Or are in an unreliable physical location. Please understand orchestrator has "seen it all" and much of its behavior is crafted by experiencing non-trivial scenarios.

To this end, when things go bad, orchestrator is very smart in making the best of a situation. But at planned failovers, it would very much like you to set up your topology in a way that makes sense to you and will guarantee survival of all servers you care about.

from orchestrator.

ecortestws commented on May 30, 2024

GTID data from the master:

mysql>  select @@global.gtid_mode, @@global.gtid_purged\G
*************************** 1. row ***************************
@@global.gtid_mode: ON
@@global.gtid_purged: 17255cd9-b2f6-11e6-b59d-005056946d8b:1-15546,
604d9088-a5c6-11e6-8f72-005056945836:1-8796013,
9cb4118b-a5c6-11e6-96c0-005056945189:1-37645
1 row in set (0.00 sec)

There are so many options that didn't see that. Will check it, thanks!
About username/password issue, orchestrator can read username/password from the new master just before the take over, and use them with the old master demote. Otherwise, some kind of warning about "no credentials found" or something else should be useful.
You are right about planned thing, is just that reading the documentation I read that orchestrator could require several steps to finish in the target state. This would be that scenario, and the only requirement could be that you specify the new master rather than let orchestrator to select it. As you said, can be done at user's side :-)

from orchestrator.

shlomi-noach commented on May 30, 2024

GTID

looking into!

About username/password issue, orchestrator can read username/password from the new master just before the take over

It cannot. You cannot reveal the password by SHOW SLAVE STATUS. There is a potential solution (utilized by orchestrator) in the event you use system tables for master-info.

is just that reading the documentation I read that orchestrator could require several steps to finish in the target state

More than anything, I'd appreciate help with documentation!

from orchestrator.

sjmudd commented on May 30, 2024

@shlomi-noach: slave username and password information are available via mysql.slave_master_info:

root@somehost [mysql]> select Host, Port,  User_name, User_password from slave_master_info;
+-----------------------+------+-----------+---------------+
| Host                  | Port | User_name | User_password |
+-----------------------+------+-----------+---------------+
| somehost.mydomain.com | 3306 | some_user | some_password |
+-----------------------+------+-----------+---------------+
1 row in set (0.00 sec)

So in theory you could try these credentials. However, depending on existing grants this may or many not work as expected as the grant may be for 'some_user'@'%' (any address), 'some_user'@'192.168.9.10' (specific address) or any combination between, some of which may work and others may not.

It may be worth having an option to try or check the configuration but specific site configs may vary.

For what it's worth for planned topology changes (of the master) I don't use orchestrator but custom scripts. This gives a bit more control and reduces downtime, but orchestrator is nearly always used manually both before and afterwards to arrange the topology as needed to minimise the impact of the master changeover. I guess I could use orchestrator and most of what's described here is what I do already but I have more freedom to check stuff both before and afterwards which makes me feel more comfortable. Maybe I need to look again at how well orchestrator handles this task as it simplifies things if the amount of software used is reduced.

from orchestrator.

ecortestws commented on May 30, 2024

That's right, orchestrator user has select permissions on slave_master_info and the information is there, just need to read it and use it. If you expect that orchestrator executes this task cleanly, the user should ensure that replication user has permissions in all the nodes involved.

from orchestrator.

shlomi-noach commented on May 30, 2024

Reading credentials from slave_master_info is already implemented for make-co-master so should be easy to apply to graceful-takeover

https://github.com/github/orchestrator/blob/55cedffe8da1163df6d6d2374207cae97ae375fe/go/inst/instance_topology.go#L900-L911

from orchestrator.

fuyar commented on May 30, 2024

Hello, just wanted to report the same issue with GTID based replication (Percona 5.7.16) not being recognized on a simple master - 4 slaves topology.

On the master :

+--------------------+----------------------+
| @@global.gtid_mode | @@global.gtid_purged |
+--------------------+----------------------+
| ON                 |                      |
+--------------------+----------------------+
1 row in set (0.00 sec)

from orchestrator.

shlomi-noach commented on May 30, 2024

@fuyar thank you!

from orchestrator.

fuyar commented on May 30, 2024

no problem @shlomi-noach :)

Seems like Orchestrator was finally able to detect GTIDs on the 4 slaves (I rechecked this morning while doing nothing previously).

oracle_gtid: 0 still for the master in the 'database_instance' table but yeah as the master is not a slave of anyone it should be ok I suppose ?

from orchestrator.

shlomi-noach commented on May 30, 2024

but yeah as the master is not a slave of anyone it should be ok I suppose ?

~~That's the very bug; because the master is not identified as gtid-enabled, orchestrator doesn't run a gtid-based failover.~~

from orchestrator.

shlomi-noach commented on May 30, 2024

OK, have taken a closer look into GTID recoveries:

The fact the master does not show as oracle_gtid in database_instance, or that it shows as GTID based replication: false is irrelevant to the failover mechanism.
The failover looks at the set of replicas and determines that they're using GTIDs ; GTID-based failover takes place when there's at least one GTID replica, and when all valid replicas (valid == responsive) are GTID. In other words, if a single valid replica is not a GTID based replica, then failover is not GTID based. This shouldn't happen in reality, but added as a safety mechanism in the unlikelihood that GTID->non-GID replication is made possible in the far far future.

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws my last comment suggests that:

This issue causes that the takeover doesn't use GTID (I guess)

is wrong. Are you able to show that the recovery was not based on GTID? I do mean it's a completely valid assumption on your side, but I believe is incorrect. The logs actually specify the type of recovery. Look for:

topology_recovery: RecoverDeadMaster: masterRecoveryType=...

I realize that was 15 days ago and you may not have the logs at this time.

from orchestrator.

ecortestws commented on May 30, 2024

Hi @shlomi-noach:
from the logs:
2017-02-14 08:15:34 DEBUG topology_recovery: RecoverDeadMaster: masterRecoveryType=MasterRecoveryPseudoGTID

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws thank you. Then, indeed, orchestrator didn't recognize this to be a GTID recovery.

from orchestrator.

shlomi-noach commented on May 30, 2024

Applying replication-credentials on demoted master is addressed by #93

from orchestrator.

ecortestws commented on May 30, 2024

Hi @shlomi-noach,
any progress on GTID issue?
Thanks
Eduardo

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws Perfect timing. I am setting up an environment for this now.

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws can you confirm your servers are Percona Server? If so, this is identified in #96 and solved via #98 (no release yet)

My current GTID testing environment is happily identifying GTID topologies.

#106 makes the web interface recognize a GTID master as "using GTID" -- but this is a visualization matter only; recoveries are using a lower level logic.

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws can you test https://github.com/github/orchestrator/releases/tag/v2.1.0 ?

from orchestrator.

ecortestws commented on May 30, 2024

@shlomi-noach my servers are Oracle MySQL. Will try the new release and let you know.

from orchestrator.

ecortestws commented on May 30, 2024

@shlomi-noach I have tested it but it didn't work as expected.

orchestrator -version
2.1.0
05241ab2608de7ed5dd66a363690a33db36e9954

2017-02-14 08:15:34 DEBUG topology_recovery: RecoverDeadMaster: masterRecoveryType=MasterRecoveryPseudoGTID
2017-02-14 08:15:35 INFO ChangeMasterTo: Changed master on 10.102.92.162:3306 to: 10.102.92.161:3306, bin-log.000256:80539410. GTID: false 10.102.92.161:3306

from orchestrator.

ecortestws commented on May 30, 2024

The web interface now shows GTID enabled in the master.

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws thank you. Are you again looking at a A-B-C chain with graceful-master-takeover?

I'll run some more checks and may come back with more questions.

from orchestrator.

ecortestws commented on May 30, 2024

@shlomi-noach yes, the same approach, the same topology. Moved C from A to B before the takeover, and verified that the replication chain was healthy. I have all the logs, let me know if you need anything else. I understand that #93 hasn't been merged yet, so the issue with the credentials after the takeover is expected. Thanks.

from orchestrator.

shlomi-noach commented on May 30, 2024

#93 is now merged

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws I'm happy if you can share the logs. If they contain sensitive data, can you please share them with me via email? My address is [email protected]

from orchestrator.

shlomi-noach commented on May 30, 2024

OK I'm able to reproduce this.

The reason this happens: the auto_position is not set by default, and orchestrator uses that to recognize GTID replication. I'm looking into improving this.

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws can you please confirm https://github.com/github/orchestrator/releases/tag/v2.1.1-BETA works for you?

Make sure that the replicas are on auto_position=1, as this is a requirement for a GTID-based recovery.

from orchestrator.

ecortestws commented on May 30, 2024

@shlomi-noach it worked, but the replication was not started in the demoted master. Is it a expected behavior? The credentials were in place, and after execute "START SLAVE" in the old master it started syncing with the new master.

from orchestrator.

shlomi-noach commented on May 30, 2024

@ecortestws This is expected behavior. I see advantages and reasons for both starting and not starting replication automatically; "not starting" is on the safer side.

from orchestrator.

GTID not found properly (5.7) and some graceful-master-takeover issues about orchestrator HOT 30 CLOSED

Comments (30)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent