Code Monkey home page Code Monkey logo

Comments (14)

grke avatar grke commented on September 14, 2024

Hello,

The only way that I can think of to explain this is that maybe the manifest
ordering is messed up in some way.

The way that burp chooses what to backup is by creating the phase1 scan, then
in phase2, it compares the phase1 scan with the manifest in the previous
successful backup.

The ordering of entries in both the phase1 scan file and the previous manifest
file has to be the same. If not, then I guess that the behaviour could look
like what you are seeing.

So, I guess the first thing to do would be to determine whether that is the
case.

If it is, then second thing to do would be to figure out how it got that way

  • did a resumed backup mess it up?
  • did somebody manually edit a manifest?
  • is there some other bug?
  • something else I didn't think of

from burp.

moterpent avatar moterpent commented on September 14, 2024

Thank you Graham. I'm not sure the best way to go about what you have recommended, but here's what I did.

I started a new backup. I then identified a file that seems to backup every time and grepped for that file in the previous backup's manifest.gz (backup #10). Next I grepped the in process backup's phase1.gz (backup #11) . Path and filename changes are mine as are addition of blank lines in the outputs below. The file that was grepped for is "File2.txt". Grep included two lines before and two lines after (-B2 -A2) the match. I've also included a graphical diff in case it helps in more easily identifying differences and identicality.

I'm not great at interpreting the encoded lines from the files. Is this helpful and/or are there other approaches I might take and other information I could provide?

0000010 - manifest.gz

f0097D:/Path/To/File/File1.txt
x002B1405317392:e4cdca37df64544a5d392ab4ae66737b
t0099t/D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
x002B1406726416:5459eb19f697e3c876d1d1463148d356
t0099t/D:/Path/To/File/File3.txt
--
f0097D:/Path/To/File/File1.txt
x002B1405317392:e4cdca37df64544a5d392ab4ae66737b
t0099t/D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
x002B1406726416:5459eb19f697e3c876d1d1463148d356
t0099t/D:/Path/To/File/File3.txt

0000011 - phase1.gz

f0097D:/Path/To/File/File1.txt


r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT3iAA A A BjfTg0 BheN2v BjfTgc A g J A A
f0097D:/Path/To/File/File3.txt
--
f0097D:/Path/To/File/File1.txt


r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT3iAA A A BjfTg0 BheN2v BjfTgc A g J A A
f0097D:/Path/To/File/File3.txt

image

from burp.

grke avatar grke commented on September 14, 2024

Hello,

If I am reading the grep correctly, it looks like both your manifest.gz and phase1.gz files are listing the same files twice.
If that is the case, that is most likely the root cause of the problem.

There ought to be a single entry for each file, a single entry being like this for the phase1:

r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt

And like this for the manifest:

t0099t/D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
x002B1406726416:5459eb19f697e3c876d1d1463148d356

It seems like there is something off in the phase1 scan, causing the same files to get listed more than once.
Maybe looking at your include/exclude configuration could shed some light?
The 'incexc' file in each backup directory will have this information.

from burp.

moterpent avatar moterpent commented on September 14, 2024

It could very well be as the includes section of the config is a bit more complicated than just "backup everything". Due to the size of what is being backed and the sheer number of files/folders, I've been slowly adding "new content" (ie. new paths) each time a backup runs. Some of this is because the backups would fail when I tried to do everything, but also because I wanted to at least have key things backed up as soon as possible, while the "less important" files were added over time.

Regardless, the backup include/exclude section looks something like this:

include = D:/Shares
include = D:/Shares/Share1
include = D:/Shares/Share1/Documents

include_regex = D:/Shares/Folder1
include_regex = D:/Shares/Folder2
include_regex = D:/Shares/Folder3

include_regex = D:/Shares/Share1/Folder1
include_regex = D:/Shares/Share1/Folder2
include_regex = D:/Shares/Share1/Folder3
...etc...
include_regex = D:/Shares/Share1/Documents/1
include_regex = D:/Shares/Share1/Documents/A
include_regex = D:/Shares/Share1/Documents/B
include_regex = D:/Shares/Share1/Documents/C
include_regex = D:/Shares/Share1/Documents/D
...etc...
exclude = D:/Shares/Share1/Documents/folder_with_corrupt_files
exclude = D:/Shares/Share1/Documents/another_folder_with_corrupt_files
exclude = D:/Shares/Share1/Documents/yet_another_folder_with_corrupt_files
...etc...
exclude_regex = ^[A-Z]:/recycler$
exclude_regex = ^[A-Z]:/\$recycle\.bin$
exclude_regex = ^[A-Z]:/pagefile\.sys$
exclude_regex = ^[A-Z]:/swapfile\.sys$
exclude_regex = ^[A-Z]:/hiberfil\.sys$

I realize there is redundancy within the first three includes, but I want to say I had to do it that way in order to get the regexes to work the way I wanted them to. It may also explain the issues I'm having. From reading and testing I was under the impression that an include has to first build a list of files, and then the regex identifies files from said list and only backs up those that match. Of course mistakes, misunderstanding/misinterpretations, and general ignorance are my own. If there's a better way to get it done, I'm open any and all suggestions.

Thanks again for all the excellent help and assistance.

from burp.

grke avatar grke commented on September 14, 2024

Hello,

I think that there might be some complex interaction with how burp ends up interleaving the "include_regex" lines.
Given the information you provided above, it looks like you are not actually using any regexes in your "include_regex" lines.

I suggest replacing "include_regex" with simple "include" lines. For example, where you have lines like:

include_regex = D:/Shares/Folder1

replace them with:

include = D:/Shares/Folder1

Also, double check that all of your lines are using the same case.

from burp.

moterpent avatar moterpent commented on September 14, 2024

Thank you. That's helpful. Yes, it's true that I'm not using regex tokens in the regex specs. I'm merely using them to match a pattern. I went with regex because there was the possibility of wanting to do something like "/path/[A-C]" at some later date.

Maybe a better way to approach this would be to explain as literally as possible what I'm trying to accomplish.

Ultimately this all stems from a combination of issue #921 (burp terminates the backup when encountering a corrupt file/folder), along with having a lot of files and a large overall size of content to back up. A full initial back up would take at least days. The combination of needing to work around the corruption issue (failure after days of the backup running), and wanting to incrementally add pieces due to the sheer size, resulted in the config shown in the earlier post.

I may be completely wrong in how burp handles the combination of "include" and "include_regex". If so, my apologies. From my testing and reading it seems that one must first specify an include and only then follow it with include_regex. The assumption being that the include will recurse through the specified path and will build a list. After which anything that matches the include_regex is what actually gets backed up. There seems to be a catch though. Let me try to step through it.

Let's say we have a folder structure that looks something like the following:

shares
├── important
│   ├── 1
│   ├── 2
│   └── 3
└── later
    ├── phase1
    │   ├── p.1
    │   ├── p.2
    │   └── p.3
    ├── phase2
    │   ├── p2.1
    │   ├── p2.2
    │   └── p2.3
    └── phase3
        ├── p3.1
        ├── p3.2
        └── p3.3

In the beginning I have a single simple include, like this:

include = /shares

This is great and backs up everything under said folder, except in my case there is the corruption issue which terminates the backup. But let's set that aside for a moment, and say there is a lot of data and we want to only select key paths and introduce the less important content in an incremental manner at a later time. The following does what one might expect. It backs up only the files/folders contained in "/shares/important". Good so far.

include = /shares
include_regex = /shares/important

There's a catch though. The following is functionally equivalent to the former, in that only files in "/shares/important" are backed up. Anything in "/shares/later/phase1" are ignored. I'm not sure why this is as technically both are part of the same include, but I'm guessing maybe burp applies each regex in order and because the second regex isn't part of the first, there's nothing to match?

include = /shares
include_regex = /shares/important
include_regex = /shares/later/phase1

It seems one can work around this by specifying another "include" with parent root of the match. I'm not sure how important the order of options are, but something like this seems to work:

include = /shares
include = /shares/later
include_regex = /shares/important
include_regex = /shares/later/phase1

However, this where we presumably start seeing the undesired behavior where a file shows up multiple times.

Any recommendations of another way of coming at this that might accomplish the desired results without the problematic artifacts?

from burp.

grke avatar grke commented on September 14, 2024

Hello,

Thanks for the extra details.
(Note, you referred to issue #931, but I think you meant #921)

I think that the way that include_regex works is not intuitive, and you will probably be better off not using it, especially since you are not using regexes.
I think you should try using a combination of simple includes/excludes instead.

For example...

include = /shares
exclude = /shares/later

...will back up:
/shares/important/1
/shares/important/2
/shares/important/3

Then you can change it to...

include = /shares
exclude = /shares/later
include = /shares/later/phase1

...and this will addititionally back up:
/shares/later/phase1/p.1
/shares/later/phase1/p.2
/shares/later/phase1/p.3

Then you might change it to:

include = /shares
exclude = /shares/later
include = /shares/later/phase1
include = /shares/later/phase2
exclude = /shares/later/phase2/p2.2

...and this will addititionally back up:
include = /shares/later/phase2/p2.1
include = /shares/later/phase2/p2.3

And so on.

Ordering of the configs doesn't matter, as burp will sort them internally.
So, this is equivalent to the above:

exclude = /shares/later/phase2/p2.2
exclude = /shares/later
include = /shares/later/phase2
include = /shares/later/phase1
include = /shares

Internally, burp constructs and orders a set of starting points to recurse through on the client file system.
It will remove duplicates from this list.
In this case, the starting points will be:
/shares
/shares/later/phase1
/shares/later/phase2
As it recurses through each starting point, it will check your exclusions and not descend through those.
If you add a subpath of another include that hasn't been excluded, then it will remove them from it's starting set.
For example, adding "include = /shares/later/phase1/p.1" won't make "p.1" be backed up twice.

from burp.

moterpent avatar moterpent commented on September 14, 2024

Thanks Graham. A brilliant and concise explanation. Thank you. That makes perfect sense. I'm going to refactor and will report back.

And yes you are correct. I did intend to refer to issue #921. I've corrected the original comment to refer to the correct issue.

from burp.

grke avatar grke commented on September 14, 2024

Thanks for the kind words.

Just bear in mind, it's not guaranteed that the include_regexes are causing the problem.
But removing them will at least be a step towards figuring it out, if that is not it.

from burp.

moterpent avatar moterpent commented on September 14, 2024

Out of curiosity is there a way to get burp to include files folders without using regex and without having to iterate each individual folder? Using your last example, something like this (see rows 5-6):

include = /shares
exclude = /shares/later
include = /shares/later/phase1
include = /shares/later/phase2
include = /shares/later/a*
include = /shares/later/b*
exclude = /shares/later/phase2/p2.2

Given there may be hundreds if not thousands of folders, creating an individual config line can be a bit unruly. It would be nice to be able to just say, include all of the directories that start with the letters a - c, instead of having to list each individually like so:

include = /shares
exclude = /shares/later
include = /shares/later/phase1
include = /shares/later/phase2

include = /shares/later/Aardvark
include = /shares/later/Ace
include = /shares/later/Apple
include = /shares/later/Axe
...etc...
include = /shares/later/Ball
include = /shares/later/Bills
include = /shares/later/Bowls
...etc...

exclude = /shares/later/phase2/p2.2

Obviously once I get stable backup set, with appropriate corruption exclusions, I can remove all of that, but thought I would ask.

from burp.

grke avatar grke commented on September 14, 2024

Hello,

Possibly include_glob might do what you want.

It works differently to include_regex in that it generates a list of directories to include before starting the phase1 scan, whereas include_regex does it's thing at the same time as recursing through the directories.

from burp.

moterpent avatar moterpent commented on September 14, 2024

I finally ran a complete backup without having any files/folders duplicated or otherwise backing up improperly. Sticking with only includes and excludes seems to resolve the issues. It's not optimal as there are hundreds of include and exclude entries, but it does solve the problem. I did not test include_glob, though I may do so at some point and will update this thread if and when that happens.

Thanks again Graham. Always appreciate the excellent help.

from burp.

grke avatar grke commented on September 14, 2024

Thanks for the update, sorry that you had troubles.
Keeping this open to remind me to investigate include_regex.

from burp.

paulie-g avatar paulie-g commented on September 14, 2024

Would it not make sense to make sure the file list in each phase is unique? Producing duplicates through injudicious, but unobviously problematic, use of regex/glob matching ought not break backups, no?

from burp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.