Code Monkey home page Code Monkey logo

mptevents's Introduction

mptevents

Build Status

The MPT drivers from LSI have an event reporting mechanism built in to the driver which enables to get a sneak peak into what happens in the SAS network at a level below that of the OS itself and can provide insights when debugging SAS issues.

mptevents is a small daemon intended to expose that information by reporting it all to syslog.

How to Use

Just make sure to run this daemon at system startup or when debugging is needed, give it the device name to open such as /dev/mpt2ctl or /dev/mpt3ctl and it will send syslog messages.

This daemon will try to auto-detect each supported host in /sys/class/scsi_host. Any unsupported host (e.g. ahci) will be ignored.

If you give it no arguments it will try to auto-detect the control device and use that if it finds only one. If more than one is available you'll need to decide which one to use.

Understanding the logs

The logs generated will look something like the following, I will add inline comments to explain some of the things that can be seen here.

SAS Device Status Change: ioc=1 context=2 tag=ffff rc=8(INTERNAL_DEVICE_RESET) port=0 asc=00 ascq=00 handle=000a reserved2=0 SASAddress=5000cca02b0458ba

Here the device was reset for some reason, the SAS HBA most likely lost contact with the device and issued an internal reset to clear out all associated IOs against the device. We can see the SAS address and the handle both of which identify the device.

SAS Device Status Change: ioc=1 context=3 tag=ffff rc=14(COMPLETED_INTERNAL_DEV_RESET) port=0 asc=00 ascq=00 handle=000a reserved2=0 SASAddress=5000cca02b0458ba

Here the reset was completed, there were no pending IOs to abort so things were quick (syslog output will provide timestamps as well, at least from the time the event was received by mptevents).

SAS Discovery: context=12 flags=01(IN_PROGRESS) reason=1(STARTED) physical_port=0 discovery_status=0() reserved1=0
SAS Device Status Change: ioc=1 context=13 tag=ffff rc=8(INTERNAL_DEVICE_RESET) port=0 asc=00 ascq=00 handle=000a reserved2=0 SASAddress=5000cca02b0458ba
SAS Topology Change List: context=14 enclosure_handle=2 expander_dev_handle=9 num_phys=37 num_entries=1 start_phy_num=11 exp_status=3(RESPONDING) physical_port=0 reserved1=0 reserved2=0
SAS Topology Change List Entry (1/1): attached_dev_handle=a link_rate=a(prev=RATE_6_0,next=UNKNOWN_LINK_RATE) phy_status=5(DELAY_NOT_RESPONDING)
SAS Device Status Change: ioc=1 context=15 tag=ffff rc=14(COMPLETED_INTERNAL_DEV_RESET) port=0 asc=00 ascq=00 handle=000a reserved2=0 SASAddress=5000cca02b0458ba
SAS Discovery: context=16 flags=00(IN_PROGRESS) reason=2(COMPLETED) physical_port=0 discovery_status=0() reserved1=0

This shows a typical disk dropping from the SAS network, there is a "SAS Discovery" starting first, the information about the device is cleared and we also get a "SAS Topology Change List" that shows a single device switching from 6Gbps to unknown which means there is no device present anymore.

SAS Discovery: context=17 flags=01(IN_PROGRESS) reason=1(STARTED) physical_port=0 discovery_status=0() reserved1=0
SAS Topology Change List: context=18 enclosure_handle=2 expander_dev_handle=9 num_phys=37 num_entries=1 start_phy_num=11 exp_status=3(RESPONDING) physical_port=0 reserved1=0 reserved2=0
SAS Topology Change List Entry (1/1): attached_dev_handle=a link_rate=0(prev=UNKNOWN_LINK_RATE,next=UNKNOWN_LINK_RATE) phy_status=2(DELAY_NOT_RESPONDING)
SAS Discovery: context=19 flags=02(DEVICE_CHANGE) reason=2(COMPLETED) physical_port=0 discovery_status=0() reserved1=0
SAS Discovery: context=20 flags=01(IN_PROGRESS) reason=1(STARTED) physical_port=0 discovery_status=0() reserved1=0
SAS Discovery: context=21 flags=02(DEVICE_CHANGE) reason=2(COMPLETED) physical_port=0 discovery_status=0() reserved1=0
SAS Topology Change List: context=22 enclosure_handle=2 expander_dev_handle=9 num_phys=37 num_entries=1 start_phy_num=11 exp_status=3(RESPONDING) physical_port=0 reserved1=0 reserved2=0
SAS Topology Change List Entry (1/1): attached_dev_handle=a link_rate=a0(prev=UNKNOWN_LINK_RATE,next=RATE_6_0) phy_status=1(DELAY_NOT_RESPONDING)

The above now shows how a device joins, again a SAS discovery a topology change list showing a device (weirdly with no new speed, could be associated with a fast removal and reinsertion causing the DELAY_NOT_RESPONDING state. In the second discovery we already get the device joining in.

Support

Since the tool is quite obscure and requires understanding of the SAS/SATA protocol and the way things behave at a very low level I'm also happy to give a helping hand by looking at debug logs and trying to explain them. I'm also doing this for my own benefit, while I saw plenty of such traces debugging SAS/SATA issues I'm always interested in seeing more of these. My email is provided below, feel free to contact me with your problems and I'd do my best to help. I cannot guarantee anything besides a sincere attempt to debug such issues.

License

My code is licensed under the MIT license (see LICENSE file). The files under the mpt directory are taken verbatim from the Linux kernel and are thus licensed under the GPLv2.

Author

Baruch Even [email protected]

mptevents's People

Contributors

baruch avatar

Stargazers

 avatar  avatar  avatar Rob Frawley 2nd avatar derjohn avatar Homer Li avatar  avatar Tim Johnson avatar Disaster.Xen avatar Bryce Guinta avatar Rich Ercolani avatar curse666 avatar  avatar

Watchers

 avatar James Cloos avatar curse666 avatar  avatar  avatar

mptevents's Issues

Incorrect output of phy status

Missed a break in switch:

diff --git a/mptparser.c b/mptparser.c
index fb85e92..dc43e69 100644
--- a/mptparser.c
+++ b/mptparser.c
@@ -467,11 +467,11 @@ static const char *sas_topo_phy_status_to_text(uint8_t status)

        const char *rc = "UNKNOWN";
        switch (status & MPI2_EVENT_SAS_TOPO_RC_MASK) {
-               case MPI2_EVENT_SAS_TOPO_RC_TARG_ADDED: rc = "TARG_ADDED";
-               case MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING: rc = "TARG_NOT_RESPONDING";
-               case MPI2_EVENT_SAS_TOPO_RC_PHY_CHANGED: rc = "PHY_CHANGED";
-               case MPI2_EVENT_SAS_TOPO_RC_NO_CHANGE: rc = "NO_CHANGE";
-               case MPI2_EVENT_SAS_TOPO_RC_DELAY_NOT_RESPONDING: rc = "DELAY_NOT_RESPONDING";
+               case MPI2_EVENT_SAS_TOPO_RC_TARG_ADDED: rc = "TARG_ADDED"; break;
+               case MPI2_EVENT_SAS_TOPO_RC_TARG_NOT_RESPONDING: rc = "TARG_NOT_RESPONDING"; break;
+               case MPI2_EVENT_SAS_TOPO_RC_PHY_CHANGED: rc = "PHY_CHANGED"; break;
+               case MPI2_EVENT_SAS_TOPO_RC_NO_CHANGE: rc = "NO_CHANGE"; break;
+               case MPI2_EVENT_SAS_TOPO_RC_DELAY_NOT_RESPONDING: rc = "DELAY_NOT_RESPONDING"; break;
        }

        if (i > 0)

Allow to ignore events on startup

Sometimes the startup events are not interesting to the user, he should be able to ignore them and just let the application read the data in order to know what is the next event to expect.

Update MPI headers

There were a few updates since the code was written, nothing seems major.

There is a new Power Performance Event, an added 12G speed value and a few other non-event changes relating to firmware upgrades.

Need ability to debug parsing of messages

If a message is not parsed properly there is no real way to get the raw data for debugging. Need to implement a mode where the raw information read from the driver is written to file to facilitate such debugging.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.