Comments (14)
Hello,
The only way that I can think of to explain this is that maybe the manifest
ordering is messed up in some way.
The way that burp chooses what to backup is by creating the phase1 scan, then
in phase2, it compares the phase1 scan with the manifest in the previous
successful backup.
The ordering of entries in both the phase1 scan file and the previous manifest
file has to be the same. If not, then I guess that the behaviour could look
like what you are seeing.
So, I guess the first thing to do would be to determine whether that is the
case.
If it is, then second thing to do would be to figure out how it got that way
- did a resumed backup mess it up?
- did somebody manually edit a manifest?
- is there some other bug?
- something else I didn't think of
from burp.
Thank you Graham. I'm not sure the best way to go about what you have recommended, but here's what I did.
I started a new backup. I then identified a file that seems to backup every time and grepped for that file in the previous backup's manifest.gz (backup #10). Next I grepped the in process backup's phase1.gz (backup #11) . Path and filename changes are mine as are addition of blank lines in the outputs below. The file that was grepped for is "File2.txt". Grep included two lines before and two lines after (-B2 -A2) the match. I've also included a graphical diff in case it helps in more easily identifying differences and identicality.
I'm not great at interpreting the encoded lines from the files. Is this helpful and/or are there other approaches I might take and other information I could provide?
0000010 - manifest.gz
f0097D:/Path/To/File/File1.txt
x002B1405317392:e4cdca37df64544a5d392ab4ae66737b
t0099t/D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
x002B1406726416:5459eb19f697e3c876d1d1463148d356
t0099t/D:/Path/To/File/File3.txt
--
f0097D:/Path/To/File/File1.txt
x002B1405317392:e4cdca37df64544a5d392ab4ae66737b
t0099t/D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
x002B1406726416:5459eb19f697e3c876d1d1463148d356
t0099t/D:/Path/To/File/File3.txt
0000011 - phase1.gz
f0097D:/Path/To/File/File1.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT3iAA A A BjfTg0 BheN2v BjfTgc A g J A A
f0097D:/Path/To/File/File3.txt
--
f0097D:/Path/To/File/File1.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT3iAA A A BjfTg0 BheN2v BjfTgc A g J A A
f0097D:/Path/To/File/File3.txt
from burp.
Hello,
If I am reading the grep correctly, it looks like both your manifest.gz and phase1.gz files are listing the same files twice.
If that is the case, that is most likely the root cause of the problem.
There ought to be a single entry for each file, a single entry being like this for the phase1:
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
And like this for the manifest:
t0099t/D:/Path/To/File/File2.txt
r0039A A IH/ B A A A BT2PAA A A BjfTgc Bhd4ws BjfTgA A g J A A
f0097D:/Path/To/File/File2.txt
x002B1406726416:5459eb19f697e3c876d1d1463148d356
It seems like there is something off in the phase1 scan, causing the same files to get listed more than once.
Maybe looking at your include/exclude configuration could shed some light?
The 'incexc' file in each backup directory will have this information.
from burp.
It could very well be as the includes section of the config is a bit more complicated than just "backup everything". Due to the size of what is being backed and the sheer number of files/folders, I've been slowly adding "new content" (ie. new paths) each time a backup runs. Some of this is because the backups would fail when I tried to do everything, but also because I wanted to at least have key things backed up as soon as possible, while the "less important" files were added over time.
Regardless, the backup include/exclude section looks something like this:
include = D:/Shares
include = D:/Shares/Share1
include = D:/Shares/Share1/Documents
include_regex = D:/Shares/Folder1
include_regex = D:/Shares/Folder2
include_regex = D:/Shares/Folder3
include_regex = D:/Shares/Share1/Folder1
include_regex = D:/Shares/Share1/Folder2
include_regex = D:/Shares/Share1/Folder3
...etc...
include_regex = D:/Shares/Share1/Documents/1
include_regex = D:/Shares/Share1/Documents/A
include_regex = D:/Shares/Share1/Documents/B
include_regex = D:/Shares/Share1/Documents/C
include_regex = D:/Shares/Share1/Documents/D
...etc...
exclude = D:/Shares/Share1/Documents/folder_with_corrupt_files
exclude = D:/Shares/Share1/Documents/another_folder_with_corrupt_files
exclude = D:/Shares/Share1/Documents/yet_another_folder_with_corrupt_files
...etc...
exclude_regex = ^[A-Z]:/recycler$
exclude_regex = ^[A-Z]:/\$recycle\.bin$
exclude_regex = ^[A-Z]:/pagefile\.sys$
exclude_regex = ^[A-Z]:/swapfile\.sys$
exclude_regex = ^[A-Z]:/hiberfil\.sys$
I realize there is redundancy within the first three includes, but I want to say I had to do it that way in order to get the regexes to work the way I wanted them to. It may also explain the issues I'm having. From reading and testing I was under the impression that an include has to first build a list of files, and then the regex identifies files from said list and only backs up those that match. Of course mistakes, misunderstanding/misinterpretations, and general ignorance are my own. If there's a better way to get it done, I'm open any and all suggestions.
Thanks again for all the excellent help and assistance.
from burp.
Hello,
I think that there might be some complex interaction with how burp ends up interleaving the "include_regex" lines.
Given the information you provided above, it looks like you are not actually using any regexes in your "include_regex" lines.
I suggest replacing "include_regex" with simple "include" lines. For example, where you have lines like:
include_regex = D:/Shares/Folder1
replace them with:
include = D:/Shares/Folder1
Also, double check that all of your lines are using the same case.
from burp.
Thank you. That's helpful. Yes, it's true that I'm not using regex tokens in the regex specs. I'm merely using them to match a pattern. I went with regex because there was the possibility of wanting to do something like "/path/[A-C]" at some later date.
Maybe a better way to approach this would be to explain as literally as possible what I'm trying to accomplish.
Ultimately this all stems from a combination of issue #921 (burp terminates the backup when encountering a corrupt file/folder), along with having a lot of files and a large overall size of content to back up. A full initial back up would take at least days. The combination of needing to work around the corruption issue (failure after days of the backup running), and wanting to incrementally add pieces due to the sheer size, resulted in the config shown in the earlier post.
I may be completely wrong in how burp handles the combination of "include" and "include_regex". If so, my apologies. From my testing and reading it seems that one must first specify an include and only then follow it with include_regex. The assumption being that the include will recurse through the specified path and will build a list. After which anything that matches the include_regex is what actually gets backed up. There seems to be a catch though. Let me try to step through it.
Let's say we have a folder structure that looks something like the following:
shares
├── important
│ ├── 1
│ ├── 2
│ └── 3
└── later
├── phase1
│ ├── p.1
│ ├── p.2
│ └── p.3
├── phase2
│ ├── p2.1
│ ├── p2.2
│ └── p2.3
└── phase3
├── p3.1
├── p3.2
└── p3.3
In the beginning I have a single simple include, like this:
include = /shares
This is great and backs up everything under said folder, except in my case there is the corruption issue which terminates the backup. But let's set that aside for a moment, and say there is a lot of data and we want to only select key paths and introduce the less important content in an incremental manner at a later time. The following does what one might expect. It backs up only the files/folders contained in "/shares/important". Good so far.
include = /shares
include_regex = /shares/important
There's a catch though. The following is functionally equivalent to the former, in that only files in "/shares/important" are backed up. Anything in "/shares/later/phase1" are ignored. I'm not sure why this is as technically both are part of the same include, but I'm guessing maybe burp applies each regex in order and because the second regex isn't part of the first, there's nothing to match?
include = /shares
include_regex = /shares/important
include_regex = /shares/later/phase1
It seems one can work around this by specifying another "include" with parent root of the match. I'm not sure how important the order of options are, but something like this seems to work:
include = /shares
include = /shares/later
include_regex = /shares/important
include_regex = /shares/later/phase1
However, this where we presumably start seeing the undesired behavior where a file shows up multiple times.
Any recommendations of another way of coming at this that might accomplish the desired results without the problematic artifacts?
from burp.
Hello,
Thanks for the extra details.
(Note, you referred to issue #931, but I think you meant #921)
I think that the way that include_regex works is not intuitive, and you will probably be better off not using it, especially since you are not using regexes.
I think you should try using a combination of simple includes/excludes instead.
For example...
include = /shares
exclude = /shares/later
...will back up:
/shares/important/1
/shares/important/2
/shares/important/3
Then you can change it to...
include = /shares
exclude = /shares/later
include = /shares/later/phase1
...and this will addititionally back up:
/shares/later/phase1/p.1
/shares/later/phase1/p.2
/shares/later/phase1/p.3
Then you might change it to:
include = /shares
exclude = /shares/later
include = /shares/later/phase1
include = /shares/later/phase2
exclude = /shares/later/phase2/p2.2
...and this will addititionally back up:
include = /shares/later/phase2/p2.1
include = /shares/later/phase2/p2.3
And so on.
Ordering of the configs doesn't matter, as burp will sort them internally.
So, this is equivalent to the above:
exclude = /shares/later/phase2/p2.2
exclude = /shares/later
include = /shares/later/phase2
include = /shares/later/phase1
include = /shares
Internally, burp constructs and orders a set of starting points to recurse through on the client file system.
It will remove duplicates from this list.
In this case, the starting points will be:
/shares
/shares/later/phase1
/shares/later/phase2
As it recurses through each starting point, it will check your exclusions and not descend through those.
If you add a subpath of another include that hasn't been excluded, then it will remove them from it's starting set.
For example, adding "include = /shares/later/phase1/p.1" won't make "p.1" be backed up twice.
from burp.
Thanks Graham. A brilliant and concise explanation. Thank you. That makes perfect sense. I'm going to refactor and will report back.
And yes you are correct. I did intend to refer to issue #921. I've corrected the original comment to refer to the correct issue.
from burp.
Thanks for the kind words.
Just bear in mind, it's not guaranteed that the include_regexes are causing the problem.
But removing them will at least be a step towards figuring it out, if that is not it.
from burp.
Out of curiosity is there a way to get burp to include files folders without using regex and without having to iterate each individual folder? Using your last example, something like this (see rows 5-6):
include = /shares
exclude = /shares/later
include = /shares/later/phase1
include = /shares/later/phase2
include = /shares/later/a*
include = /shares/later/b*
exclude = /shares/later/phase2/p2.2
Given there may be hundreds if not thousands of folders, creating an individual config line can be a bit unruly. It would be nice to be able to just say, include all of the directories that start with the letters a - c, instead of having to list each individually like so:
include = /shares
exclude = /shares/later
include = /shares/later/phase1
include = /shares/later/phase2
include = /shares/later/Aardvark
include = /shares/later/Ace
include = /shares/later/Apple
include = /shares/later/Axe
...etc...
include = /shares/later/Ball
include = /shares/later/Bills
include = /shares/later/Bowls
...etc...
exclude = /shares/later/phase2/p2.2
Obviously once I get stable backup set, with appropriate corruption exclusions, I can remove all of that, but thought I would ask.
from burp.
Hello,
Possibly include_glob might do what you want.
It works differently to include_regex in that it generates a list of directories to include before starting the phase1 scan, whereas include_regex does it's thing at the same time as recursing through the directories.
from burp.
I finally ran a complete backup without having any files/folders duplicated or otherwise backing up improperly. Sticking with only includes and excludes seems to resolve the issues. It's not optimal as there are hundreds of include and exclude entries, but it does solve the problem. I did not test include_glob, though I may do so at some point and will update this thread if and when that happens.
Thanks again Graham. Always appreciate the excellent help.
from burp.
Thanks for the update, sorry that you had troubles.
Keeping this open to remind me to investigate include_regex.
from burp.
Would it not make sense to make sure the file list in each phase is unique? Producing duplicates through injudicious, but unobviously problematic, use of regex/glob matching ought not break backups, no?
from burp.
Related Issues (20)
- Question: is dedup_group still valid with protocol 1 only supported? HOT 2
- Start on server boot HOT 2
- Burp Server not Restoring To Burp Windows Client HOT 1
- Burp stops in phase 2 when encountering crazy filename HOT 6
- Backup stuck on "End of file in forward_past_entry()" HOT 14
- Burp on aarch64 (arm64): supported? HOT 2
- burp fails cert validation w/ OpenSSL 3.0 HOT 4
- SSL error triggered when max_children is hit HOT 1
- embedded yajl is vulnerable to CVE-2017-16516 and CVE-2022-24795 HOT 2
- Only restore newer files
- About 3.1.4 release HOT 2
- error in readdir: Unknown error HOT 5
- EVP_CipherInit_ex failed - 2.4.0 on ARCH HOT 2
- Configuring 2 clients for 2 servers on one instance HOT 2
- Question : why speed is slow on first backup ? HOT 3
- [BUG] version 3.1.4 both server/client bug, client on Windows 11 HOT 10
- Restore Windows Server 2022 inetpub files error HOT 3
- Summary e-Mail is always blank HOT 3
- Help me with the error: error in rs_async: 100 IO error HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from burp.