Code Monkey home page Code Monkey logo

Comments (11)

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
This must be telephaty, yesterday while sketching ideas for lsyncd-2 this came 
to my mind this has been forgotten along the way, and I should add in the 
description that exactly due to this, lsyncd cannot guarantee data transfer. 
Currently I'd suggest a cronjob that kill -HUPs lsyncd every 24 hours / x days, 
to check if something might have been missed due to a network error. 

When putting lsyncd into practice I discoverd unfortunally by surprise that 
some rsync errors can happen inherintly in design, due to the fact that lsyncd 
always lags behind the real file system. For example 
* directory X is created
* lsyncd rsyncs creation of X.
* file X/r is created
* directory X is recursevly deleted.
* lsyncd tries to rsync X/r -- must fail!

I'm a little unsure how to handle this, since what to do if there are multiple 
targets? Shouldn't maybe the unresponsiveness of one target be ignored and the 
others continued to be fed?

Otherwise, could you maybe help me? This are the return values from the manpage 
rsync can make. In which case what behaviour should be selected? (Possible 
behaviours I see are: Continue, Retry, Restart, Die)

       0      Success
       1      Syntax or usage error
       2      Protocol incompatibility
       3      Errors selecting input/output files, dirs
       4      Requested  action  not supported: an attempt was made to manipuâ
              late 64-bit files on a platform that cannot support them; or  an
              option  was specified that is supported by the client and not by
              the server.
       5      Error starting client-server protocol
       6      Daemon unable to append to log-file
       10     Error in socket I/O
       11     Error in file I/O
       12     Error in rsync protocol data stream
       13     Errors with program diagnostics
       14     Error in IPC code
       20     Received SIGUSR1 or SIGINT
       21     Some error returned by waitpid()
       22     Error allocating core memory buffers
       23     Partial transfer due to error
       24     Partial transfer due to vanished source files
       25     The --max-delete limit stopped deletions
       30     Timeout in data send/receive
       35     Timeout waiting for daemon connection
        *     What to do on unknown exit codes?


Original comment by [email protected] on 13 Oct 2010 at 12:21

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
[deleted comment]

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024

Restarting lsyncd hardly counts for workaround for me (24h is too long, and 
killing it every 
15 minutes would create more problems with partial transfers). Rsync errors 
caused by FS changes
during transfer are hard to avoid (maybe some kind of cheap zfs/netapp-style 
snapshots are the
way to go) but loss of network connectivity is different matter. 

Multiple targets are hard. The only good solution is to keep per-target queue 
of outstanding 
transfers and re-try them in configurable intervals. Currently I have no plans 
to use lsyncd
for multiple targets (being mostly interested in 1-1 long-distance replication) 
so some simpler
solution will suffice -- e.g. if some targets are unreachable (and queue is not 
empty) sleep
for configurable amount on time (instead of waiting for inotify forever) and 
retry transfer.

Typical network-related rsync errors are 
  5      Error starting client-server protocol
  10     Error in socket I/O
  11     Error in file I/O
  12     Error in rsync protocol data stream
plus, probably, other errors due to a temporary conditions like
  3      Errors selecting input/output files, dirs
  6      Daemon unable to append to log-file
  14     Error in IPC code
  20     Received SIGUSR1 or SIGINT
  21     Some error returned by waitpid()
  22     Error allocating core memory buffers
  30     Timeout in data send/receive
  35     Timeout waiting for daemon connection
I think those are good candidates for later retry.

2nd category are configuration-related errors like
  1      Syntax or usage error
  2      Protocol incompatibility
  4      Requested  action  not supported
Those must be corrected by configuration changes, so target must be marked as 
'dead' with 
corresponding log record.

3rd category are errors related to FS activity during transfer
  23     Partial transfer due to error
  24     Partial transfer due to vanished source files
'Partial tranfer' usually means that file was changed while rsyncing (so retry 
should fix it).
Vanished files are probably ok to ignore (as far as I remember rsync builds a 
list of files to 
transfer before actual run and complains if something changes during sync, but 
those errors are
harmless).

Last category is artifical limits violation
  25     The --max-delete limit stopped deletions
Those probably will not occur in lsyncd context, because there are no defaults 
for max number of
deleted files and there is no sane reason to explicitly set this parameter in 
config.

Original comment by [email protected] on 13 Oct 2010 at 2:49

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Typical network-related rsync errors are 
  5      Error starting client-server protocol
  10     Error in socket I/O
  11     Error in file I/O
  12     Error in rsync protocol data stream

---> So RETRY after say (DELAY) seconds?

plus, probably, other errors due to a temporary conditions like
  3      Errors selecting input/output files, dirs
  6      Daemon unable to append to log-file
  14     Error in IPC code
  20     Received SIGUSR1 or SIGINT
  21     Some error returned by waitpid()
  22     Error allocating core memory buffers
  30     Timeout in data send/receive
  35     Timeout waiting for daemon connection
I think those are good candidates for later retry.
---> So RETRY after say (DELAY) seconds?

2nd category are configuration-related errors like
  1      Syntax or usage error
  2      Protocol incompatibility
  4      Requested  action  not supported
Those must be corrected by configuration changes, so target must be marked as 
'dead' with 
corresponding log record.
---> Die?

3rd category are errors related to FS activity during transfer
  23     Partial transfer due to error
  24     Partial transfer due to vanished source files
'Partial tranfer' usually means that file was changed while rsyncing (so retry 
should fix it).
Vanished files are probably ok to ignore (as far as I remember rsync builds a 
list of files to 
transfer before actual run and complains if something changes during sync, but 
those errors are
harmless).
---> This is a good candita for CONTINUE. Id just tested, if the source does 
not exist, it raises 23 level.

Last category is artifical limits violation
  25     The --max-delete limit stopped deletions
Those probably will not occur in lsyncd context, because there are no defaults 
for max number of
deleted files and there is no sane reason to explicitly set this parameter in 
config.
---> I suppose, DIE (if a user configured it out of some reason, I suppose he 
does not want to delete the files then)

All others undocument or future one --- DIE? Since something should be 
reconfigured.

About seperate lists for multiple targets --- yes, I plan this for lsyncd-2, 
but in the 1 series I don't see a good way to fit it in the current data 
structures.

Original comment by [email protected] on 13 Oct 2010 at 3:22

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
> > Typical network-related rsync errors are 
> >   5     Error starting client-server protocol
> >  10     Error in socket I/O
> >  11     Error in file I/O
> >  12     Error in rsync protocol data stream
> ---> So RETRY after say (DELAY) seconds?

Yes -- keep file in queue and retry after some delay. Also,
- stop further resync attempts for DELAY seconds (only queue them)
- log message to syslog on 'warning' level (like 'error NN, waiting DELAY 
seconds')
- disable target (until restart/HUP) after some (configurable) number of failed 
attempts 
  or when number of outstanding transfers reached some (configurable) threshold

> > plus, probably, other errors due to a temporary conditions like
> > I think those are good candidates for later retry.
> ---> So RETRY after say (DELAY) seconds?

Yes. Those are mostly internal rsync errors (IPC code, malloc errors etc) and 
should not
happen, but retry after a good DELAY could improve things a bit.

> > 2nd category are configuration-related errors like
> ---> Die?

Or disable this target until restart/HUP, with corresponding error message in 
log

> > 3rd category are errors related to FS activity during transfer
> ---> This is a good candidate for CONTINUE. Id just tested, if the source 
does not exist, it raises 23 level.

Yes, could be safely skipped.

> All others undocumented or future one --- DIE? Since something should be 
reconfigured.

Probably yes (or disable target). 

Original comment by [email protected] on 13 Oct 2010 at 4:35

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
According to the discussion group, different admins have different 
understandings what to do so. So until a big recoding is made -- currently I 
plan to use LUA as new configuration tool -- how about simply using this bash 
rsync "driver"?. Simply replace the <binary> to point to this script.

----
#!/bin/bash
while [ 1 ]; do
    /usr/bin/rsync "$@"
    err=$?
    case $err in
    3|5|6|10|11|12|14|20|21|22|30|35)
        # network error, retry
        echo rdriver: retry on $err
        sleep 5
        ;;
    1|2|4|25)
        # kill parent (lysncd)
        echo rdriver: kill on $err
        kill $PPID
        ;;
    0|23|24|*)
        # done
        echo rdriver: done on $err
        break;
        ;;
    esac
done
----

Kind regards, Axel

Original comment by [email protected] on 14 Oct 2010 at 9:56

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Ok, thanks. I'll try to use it for a time being.

Original comment by [email protected] on 14 Oct 2010 at 4:53

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
The problem I see with the bash workaround, is that is can potentially spawn a 
huge number of processes while the mirror is down which could cause memory 
exhaustion on the source or hammering of many parallel rsync processes on the 
target, if it becomes available again.

Original comment by [email protected] on 8 Nov 2010 at 2:56

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Btw. when using rsync-over-ssh a network error is usually indicated by exit 
code 255.

Original comment by [email protected] on 8 Nov 2010 at 3:13

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
lsyncd 1.x will halt and wait while a process is running. so there is never 
more processes than the number of targets. if one target is down and the bash 
script from above is blocking it will halt all targets. If it recovers from a 
longer period of time the inotify queue likely will have overflowed, in which 
lsyncd does a restart to recursively sync what has been missed. I'm addressing 
this issues in lsyncd 2.0 where multiple targets are treated seperately and 
will not block each other. For a single target system, the workaround should 
work just fine as one would expect.


Original comment by [email protected] on 8 Nov 2010 at 7:26

from lsyncd.

GoogleCodeExporter avatar GoogleCodeExporter commented on June 24, 2024
Lsyncd 2.0 will repeat on network errors.

Original comment by [email protected] on 27 Nov 2010 at 1:20

  • Changed state: Fixed

from lsyncd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.