Code Monkey home page Code Monkey logo

Comments (13)

gtback avatar gtback commented on June 12, 2024

Thanks for reporting this, @cricard . I'll take a closer look.

from cti-python-stix2.

elegantmoose avatar elegantmoose commented on June 12, 2024

Certain that the bottleneck is in Bundle()?

from cti-python-stix2.

cricard avatar cricard commented on June 12, 2024

Pretty sure.

If I build the bundle manually (without calling Bundle()), it completes in a few seconds.

The below function completes in a couple seconds, for a data set of around 3000 STIX2 objects:

def build_bundle(objects):
bundle_header='{\n\t"type": "bundle",\n\t"id": "bundle--%s",\n\t"spec_version": "2.0",\n\t"objects": [\n' % uuid.uuid4()
bundle_footer='\n\t]\n}'

bundle=bundle_header
for object in objects:
    bundle+=str(object)+',\n'
bundle=bundle.rstrip(',\n')      
bundle+=bundle_footer
return bundle

But using the Bundle() function, it takes about 10-15 minutes with the same dataset.

def build_bundle(objects):
bundle=Bundle(objects=objects)
return bundle

I only started experiencing the issue when I upgraded stix2 to version 0.3. Worked fine in 0.2. Two other people experienced the same results (one on Windows, one on ubuntu), using stix2 v0.3. All three of us were running python 2.7.x.

If I decrease the dataset to 100 STIX objects, the Bundle() function works fine. Somewhere around 1000 STIX objects is where I experienced significant delays.

I provided gtback a copy of the script and dataset I was using out of band, so he can validate.

from cti-python-stix2.

chisholm avatar chisholm commented on June 12, 2024

I just tried creating a bundle with > 5000 objects in it. That completed in like 0.01 sec. Then I tried generating the JSON string via code like str(my_bundle). I'm still waiting for that to complete...

Those two functions aren't actually doing the same things. The first build_bundle() is creating a string. The second is creating an object. It isn't exactly an apples-to-apples comparison. (Perhaps you subsequently stringified the return value of the latter function?)

As far as my stringification experiment, I tried commenting out the sorting here, and it sped up significantly. I gave up waiting on my above experiment; after removing the sort, it completed in about 2sec.

from cti-python-stix2.

packet-rat avatar packet-rat commented on June 12, 2024

In relation to @chisholm comment on sorting: Are we canonizing the bundle content? Are we canonizing STIX package contents in general?

from cti-python-stix2.

cricard avatar cricard commented on June 12, 2024

@chisholm oops - I think you're right. The issue does appear to be in the stringification of the STIX objects, rather than Bundle(). Thanks for catching that!

from cti-python-stix2.

gtback avatar gtback commented on June 12, 2024

Yes, the bottleneck is the way we are sorting property names when stringify-ing objects (to match the order in the spec, rather than just alphabetic order). This was introduced in 0.3. There are probably lots of ways to fix this; I've been meaning to profile the code and see if there are any quick wins, but I'm open to anything. @chisholm, @mbastian1135 , or anyone else, let me know if you want to take a look at this.

from cti-python-stix2.

emmanvg avatar emmanvg commented on June 12, 2024

We can introduce a new method to serialize objects that does not use the sorting properties and any other formatting if all you want is machine-to-machine operation. That may help a lot for performance and in cases where the "pretty" SDO/SROs are not needed.

from cti-python-stix2.

emmanvg avatar emmanvg commented on June 12, 2024

I can look at this if what I proposed seems like a good solution.

from cti-python-stix2.

clenk avatar clenk commented on June 12, 2024

Maybe _STIXBase could have an option to whether its __str__ method uses the "pretty-print" version or the unsorted version. Not sure which should be the default.

from cti-python-stix2.

gtback avatar gtback commented on June 12, 2024

@emmanvg and I talked about moving the actual serialization of _STIXBase objects to a separate serialize function that has a pretty option, and having __str__ call that with pretty=True (we could make False the default for `serialize, though).

from cti-python-stix2.

packet-rat avatar packet-rat commented on June 12, 2024

Assertion: Seems like we're going to have to solve canonical representations if we're ever going to get to encrypting/signing STIX content (?). Research indicates canonizing json increases stringification by 4 to 5 times. If you concur with the assertion, any thoughts on how we can tackle this looming issue? This library/effort is "where the rubber meets the road" in terms of practical reference implementations based on the CTI TC standards, so I'm raising it here.

from cti-python-stix2.

gtback avatar gtback commented on June 12, 2024

We already had a "canonical" (or at least, standarized) representation in python-stix2 v0.2: keys in alphabetic order. This was necessary to ensure tests would repeatably pass. In v0.3, we changed the order from alphabetic to (roughly) the order in the spec, with custom properties at the end (but still repeatable). The implementation of this ordering is what's causing the performance impact, but we should be able to speed up this implementation significantly, to roughly what we had before.

I've always asserted that the actual JSON (de-)serialization speed is negligible in comparison to whatever process creates the content being STIX-ified, or acting on parsed STIX content. The exception should be if you're just shoveling STIX content around (as a sharing hub, for instance), in which case you should be parsing only enough to examine, but then sending the original, unmodified content.

Signing STIX content has been discussed, but is not in the current scope for STIX 2.1.

from cti-python-stix2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.