Code Monkey home page Code Monkey logo

Comments (25)

emmanvg avatar emmanvg commented on May 20, 2024 2

Hi @Cyb3rWard0g, the behavior you are experiencing may be a bug that needs to be addressed in stix2. I noticed that using dict() on a stix2 object will not correctly serialize datetime properties. While we figure this out, you can perform the following workaround (look at the attached image). In this step

rename some of the columns and put everything in dictionaries.

Don't put them in dictionaries and instead serialize them and load the strings back to have a JSON object that you can pass to the pandas library. I hope this helps!

image

from cti.

chisholm avatar chisholm commented on May 20, 2024 2

Actually, this existing pandas issue is probably the cause. Opened only 11 days ago, and is still open.

from cti.

emmanvg avatar emmanvg commented on May 20, 2024 1

@Cyb3rWard0g, I took a quick look at your project. Based on your usage basics notebook I noticed what you meant by modifying the pandas matrix and how you manipulate the STIX data (I am not a pandas expert). Though, when looking at your code this method creates new dictionaries based on direct interaction with stix2 objects. Hence why obj['created'] (and any other property that uses STIXdatetime) will return the STIXdatetime object instead. It may be possible that older versions of pandas performed additional work to serialize those datetime objects (since STIXdatetime subclasses datetime.datetime). Note that calling str(obj['created']) produces a string with similar format to the ones in your usage documentation. Ultimately, I will leave the desired workaround up to you, but you may have found a legitimate bug in cti-python-stix2.

from cti.

chisholm avatar chisholm commented on May 20, 2024 1

Hey chisholm ! very interesting. so when you say

In the meantime, I guess you need to avoid passing STIXdatetime instances into pandas!

You mean to follow the str(obj['created']) example to any object returning STIXdatetime correct?

If obj['created'] would be passed into pandas, yeah. Or convert to a datetime.datetime instance (i.e. from STIXdatetime to its superclass). Anything but passing through STIXdatetime's :)

I mentioned that they added this to their Pandas version 0.22.0

Actually, both of our stacktraces track further back than cast.py (or datetimes.py in my case). They track back to a "tslib.pyx" file. I suspect that's where the problem is, and is probably not Python code. My stacktrace references an array_to_datetime function. That is probably this. Looks like a funny combination of C and Python. I'm not familiar with how those languages integrate, but I suspect the actual bug/problem location is there. Oh yeah, my stacktrace specifically references line 544, which is here, and there is the nanosecond reference. From the pandas issue and that code comment in tslib.pyx (# i.e. a Timestamp object), it seems like they are trying to distinguish between a pandas Timestamp instance and a plain datetime.datetime instance, and aren't doing it correctly.

from cti.

chisholm avatar chisholm commented on May 20, 2024 1

I pass the STIX object to a dictionary ...

I think you must mean the same bit of code emmanvg linked to earlier in the thread. Yeah, STIXdatetime is not JSON serializable via the default encoder. It's not a bug. Python's built-in encoder doesn't know anything about stix2 types. The types supported by the built-in encoder are listed here. As he notes, people who stick with the plain stix2 objects can call the serialize() method to obtain JSON. If you've got stix2 types embedded in a different data structure and want to serialize it all to JSON, you could try the encoder provided by the library, which he linked to. E.g. json.dumps(group_name, cls=STIXJSONEncoder).

from cti.

emmanvg avatar emmanvg commented on May 20, 2024 1

No problem @Cyb3rWard0g! I think it is OK to close. If you have stix2 library problems in the future I would recommend opening them there to better address it.

from cti.

chisholm avatar chisholm commented on May 20, 2024 1

Yep. The main problem in this issue was a pandas bug, not a stix2 bug.

from cti.

gtback avatar gtback commented on May 20, 2024 1

Thanks for the detailed report, @Cyb3rWard0g . And thanks for the helpful follow-ups, @emmanvg and @chisholm . Based on what everyone has said, this is:

  • definitely not an issue in the mitre/cti STIX content.
  • almost definitely related to a bug in pandas (linked above).
  • likely not a bug we need/want to fix in python-stix2. The correct way to convert a stix2 object to a string is with serialize(). If you want to serialize stix2 objects in part of a larger data structure, you can use STIXJSONEncoder as @chisholm suggests, or create your own encoder class that handles datetimes, STIX objects, and anything else that's needed.

There's not currently a way to convert a stix2 object to a "clean" Python dict with only built-in data types, other than calling serialize() and then json.loads()-ing the result. We could add something, but I don't know if the complexity of doing that is worth the minor performance benefits. In any event, that's not an issue in this repo, so I'm going to close this, but feel free to open another issue in cti-python-stix2 to discuss 😀

Thanks again!

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024 1

Thank you so much for all the detailed information and for teaching me the right way to do things. I learned a lot in this thread issue. Thank you again. and any future issue will be open in the STIX repo. 👍 I hope you all have a great weekend!!!

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024 1

Thank you @emmanvg @chisholm @gtback 👍

from cti.

isaisabel avatar isaisabel commented on May 20, 2024 1

As far as I know yes that's the best way to do it. You may get more info if you ask on the cti-python-stix2 issue tracker though since the team over there would know more about the capabilities of their library.

FWIW if you're trying to use ATT&CK with Pandas in 2021, we have an official way of doing that: https://github.com/mitre-attack/mitreattack-python/tree/master/mitreattack/attackToExcel#accessing-the-pandas-dataframes

from cti.

jburns12 avatar jburns12 commented on May 20, 2024

Hey @Cyb3rWard0g...I'm going to pass this along to the folks working on the python-stix2 lib to get their feedback on this. As always, thanks so much for bringing this to our attention!

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

Thank you @jburns12 👍 it is probably nothing, but I figured I would let you guys know first just in case 😄 . Np. Thank you guys!

from cti.

emmanvg avatar emmanvg commented on May 20, 2024

Hi @Cyb3rWard0g, it sounds like you might be working with stix2 objects rather than JSON when you reach line 5 in this section of the code. Since I cannot see at which step in your code this happened, my suggestion is to call serialize() on those objects.

<ipython-input-5-e6f5fde53c7b> in <module>()
      3 print(" ")
      4 df = all_attack['techniques']
----> 5 df = json_normalize(df)
      6 df.reindex(['matrix', 'object created','tactic', 'technique', 'technique id', 'data sources'], axis=1)[0:5]

For example, assuming all_attack['techniques'] is a stix2 object you can perform all_attack['techniques'].serialize(). Let me know if that helped!

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

Hey @emmanvg Thank you for the suggestion and for taking a look at this issue. When I work with the data from ATT&CK content in STIX, I first parse every field to rename some of the columns and put everything in dictionaries. Those dictionaries then get aggregated in a list. STIX objects do not get passed to the list. I checked every object in techniques with a for loop :

techniques = all_attack['techniques']
for t in techniques:
  print(type(t)

<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
..
....

Now, I dont know how to check if those dictionaries are also considered STIX objects. When I pass serialize() to the objects it does not work because I am now working with lists and dictionaries. I loop through STIX objects and then pass them to dictionaries and then to a long list to manage the fields better and keep it consistent across all objects. I am releasing this first beta version of a script I am working on today. I hope it helps to provide more context to the question. 👍

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

Hey @emmanvg ! Thank you very much for the information. I will then use that method to convert the STIX objects to Json first. The reason why I rename most of the fields is because when you want to have mappings going matrix -> tactic -> technique -> technique id -> group -> group id -> software -> software id , I cannot use the field "name" or simply "id" since it will create conflicts. I will work on the serialize update. thank you again!!! 👍

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

I really appreciate the help and thank you for providing all those details that make perfect sense!!! 👍 I will try the str(obj['created']) then and it seemed that will solve the issue since it will not be returning STIXdatetime object instead. This is why I wanted to share it with you guys first before opening all kinds of issues in Pandas without much information on how cti-python-stix2 was handling those functions. I am glad I could help 👍 Thank you again. I will share some updates of the changes to the script in the next couple of days.

from cti.

chisholm avatar chisholm commented on May 20, 2024

This is easy to reproduce without python-stix2. If you subclass python's datetime.datetime class, pandas blows up (this is with python 3.6):

import datetime
import pandas as pd

class DTTest(datetime.datetime):
    pass

our_dt = DTTest(2012,1,2)
pd.to_datetime(our_dt)

produces:

Traceback (most recent call last):
  File "...", line 8, in <module>
    pd.to_datetime(our_dt)
  File "...\lib\site-packages\pandas\core\tools\datetimes.py", line 469, in to_datetime
    result = _convert_listlike(np.array([arg]), box, format)[0]
  File "...\lib\site-packages\pandas\core\tools\datetimes.py", line 368, in _convert_listlike
    require_iso8601=require_iso8601
  File "pandas\_libs\tslib.pyx", line 492, in pandas._libs.tslib.array_to_datetime
  File "pandas\_libs\tslib.pyx", line 544, in pandas._libs.tslib.array_to_datetime
AttributeError: 'DTTest' object has no attribute 'nanosecond'

The stacktrace is a little different (I called a different function, to try to simplify), but I bet it's the same underlying issue. There is no documented "nanosecond" field on datetime. Instance attributes are documented here. It works fine with a plain datetime object. Pandas might be doing some voodoo it shouldn't be doing. I'd ask what's going on in a pandas help forum. Give them a simple test like this and ask why pandas is misbehaving.

In the meantime, I guess you need to avoid passing STIXdatetime instances into pandas!

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

Hey @chisholm ! very interesting. so when you say

In the meantime, I guess you need to avoid passing STIXdatetime instances into pandas!

You mean to follow the str(obj['created']) example to any object returning STIXdatetime correct?

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

@chisholm thank you very much for providing more details. It is a 0.23.0 version issue since I am not having the same issues with Pandas 0.22.0 in Python2.7 and 3.6. I mentioned that they added this to their Pandas version 0.23.0 https://github.com/pandas-dev/pandas/blob/0.23.x/pandas/core/dtypes/cast.py#L920

# we might have a sequence of the same-datetimes with tz's
            # if so coerce to a DatetimeIndex; if they are not the same,
            # then these stay as object dtype, xref GH19671
            try:
                from pandas._libs.tslibs import conversion
                from pandas import DatetimeIndex

                values, tz = conversion.datetime_to_datetime64(v)
                return DatetimeIndex(values).tz_localize(
                    'UTC').tz_convert(tz=tz)
            except (ValueError, TypeError):
                pass

I tracked the error I had back to that part of the library.

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

niceee makes sense. Thank you @chisholm 😄

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

Good evening @chisholm . I wanted to provide an update on how I am managing the STIXdatetime object type error. I noticed that even without using Pandas and simply "json.dumps(stix_object)", I get the same error. I pass the STIX object to a dictionary to apply a standard naming convention to all the data I retrieve from TAXII. for example when I run the function from my library, it returns a dictionary with STIX object types.

>>> group_name = lift.get_group_by_alias('Cozy Bear')
>>> group_name
{'created_by_ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5', 'group_aliases': [u'APT29', u'The Dukes', u'Cozy Bear', u'CozyDuke'], 'grou_description': u'APT29 is threat group that has been attributed to the Russian government and has operated since at least 2008. (Citation: F-Secure The Dukes) (Citation: GRIZZLY STEPPE JAR) This group reportedly compromised the Democratic National Committee starting in the summer of 2015. (Citation: Crowdstrike DNC June 2016)', 'group_references': None, 'id': u'intrusion-set--899ce53f-13a0-479b-a0e4-67d46e241542', 'group': u'APT29', 'matrix': u'mitre-attack', 'created': '2017-05-31T21:31:52.748Z', 'url': u'https://attack.mitre.org/wiki/Group/G0016', 'modified': '2018-04-18T17:59:24.739Z', 'group_id': u'G0016', 'type': u'intrusion-set'}

I can confirm that the created field is a STIXdatetime:

<class 'stix2.utils.STIXdatetime'>

Then I just simply:

>>> import json
>>> json.dumps(group_name)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: '2017-05-31T21:31:52.748Z' is not JSON serializable

So I was wondering if it is also related to what @emmanvg mentioned earlier in this thread

Hi @Cyb3rWard0g, the behavior you are experiencing may be a bug that needs to be addressed in stix2. I noticed that using dict() on a stix2 object will not correctly serialize datetime properties. While we figure this out, you can perform the following workaround (look at the attached image). In this step

Anyways. I am just using the workaround passing the STIXdatetime object type to a str.

str(object['created'])

when I do that I get the following date and it can be passed as a string instead of STIXdatetime:

"2017-05-31 21:31:52.748000+00:00"

Just sharing some updates on how I am approaching this error 👍 Thank you for all your help an time ! Have a great weekend!

from cti.

emmanvg avatar emmanvg commented on May 20, 2024

@Cyb3rWard0g, you should not call json.dumps(stix_obj) because stix2 objects implement a custom JSONEncoder found here. You should instead use stix_obj.serialize() and it will return a string of the serialized object. You could then turn it back into a dictionary by using json.loads(). Note that datetime properties will be serialized with a format like this one 2017-05-31T21:31:52.748Z. If str(object['created']) works best for you, I'd say continue down that path. 👍

from cti.

Cyb3rWard0g avatar Cyb3rWard0g commented on May 20, 2024

Thank you so much @emmanvg and @chisholm . It now makes more sense. I appreciate the time you guys took to answer my questions and all the details. I think this can be closed then correct?

from cti.

Morikko avatar Morikko commented on May 20, 2024

There's not currently a way to convert a stix2 object to a "clean" Python dict with only built-in data types, other than calling serialize() and then json.loads()-ing the result

Just for confirmation as I found nothing in the doc. Is it still the recommended solution today if we need data as native dict ?

from cti.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.