Comments (25)
Hi @Cyb3rWard0g, the behavior you are experiencing may be a bug that needs to be addressed in stix2
. I noticed that using dict()
on a stix2 object will not correctly serialize datetime properties. While we figure this out, you can perform the following workaround (look at the attached image). In this step
rename some of the columns and put everything in dictionaries.
Don't put them in dictionaries and instead serialize them and load the strings back to have a JSON object that you can pass to the pandas
library. I hope this helps!
from cti.
Actually, this existing pandas issue is probably the cause. Opened only 11 days ago, and is still open.
from cti.
@Cyb3rWard0g, I took a quick look at your project. Based on your usage basics notebook I noticed what you meant by modifying the pandas matrix and how you manipulate the STIX data (I am not a pandas expert). Though, when looking at your code this method creates new dictionaries based on direct interaction with stix2
objects. Hence why obj['created']
(and any other property that uses STIXdatetime) will return the STIXdatetime
object instead. It may be possible that older versions of pandas performed additional work to serialize those datetime objects (since STIXdatetime subclasses datetime.datetime). Note that calling str(obj['created']) produces a string with similar format to the ones in your usage documentation. Ultimately, I will leave the desired workaround up to you, but you may have found a legitimate bug in cti-python-stix2
.
from cti.
Hey chisholm ! very interesting. so when you say
In the meantime, I guess you need to avoid passing STIXdatetime instances into pandas!
You mean to follow the str(obj['created']) example to any object returning STIXdatetime correct?
If obj['created']
would be passed into pandas, yeah. Or convert to a datetime.datetime
instance (i.e. from STIXdatetime to its superclass). Anything but passing through STIXdatetime's :)
I mentioned that they added this to their Pandas version 0.22.0
Actually, both of our stacktraces track further back than cast.py (or datetimes.py in my case). They track back to a "tslib.pyx" file. I suspect that's where the problem is, and is probably not Python code. My stacktrace references an array_to_datetime
function. That is probably this. Looks like a funny combination of C and Python. I'm not familiar with how those languages integrate, but I suspect the actual bug/problem location is there. Oh yeah, my stacktrace specifically references line 544, which is here, and there is the nanosecond
reference. From the pandas issue and that code comment in tslib.pyx (# i.e. a Timestamp object
), it seems like they are trying to distinguish between a pandas Timestamp
instance and a plain datetime.datetime
instance, and aren't doing it correctly.
from cti.
I pass the STIX object to a dictionary ...
I think you must mean the same bit of code emmanvg linked to earlier in the thread. Yeah, STIXdatetime is not JSON serializable via the default encoder. It's not a bug. Python's built-in encoder doesn't know anything about stix2 types. The types supported by the built-in encoder are listed here. As he notes, people who stick with the plain stix2 objects can call the serialize() method to obtain JSON. If you've got stix2 types embedded in a different data structure and want to serialize it all to JSON, you could try the encoder provided by the library, which he linked to. E.g. json.dumps(group_name, cls=STIXJSONEncoder)
.
from cti.
No problem @Cyb3rWard0g! I think it is OK to close. If you have stix2 library problems in the future I would recommend opening them there to better address it.
from cti.
Yep. The main problem in this issue was a pandas bug, not a stix2 bug.
from cti.
Thanks for the detailed report, @Cyb3rWard0g . And thanks for the helpful follow-ups, @emmanvg and @chisholm . Based on what everyone has said, this is:
- definitely not an issue in the
mitre/cti
STIX content. - almost definitely related to a bug in pandas (linked above).
- likely not a bug we need/want to fix in python-stix2. The correct way to convert a
stix2
object to a string is withserialize()
. If you want to serializestix2
objects in part of a larger data structure, you can useSTIXJSONEncoder
as @chisholm suggests, or create your own encoder class that handles datetimes, STIX objects, and anything else that's needed.
There's not currently a way to convert a stix2
object to a "clean" Python dict with only built-in data types, other than calling serialize()
and then json.loads()
-ing the result. We could add something, but I don't know if the complexity of doing that is worth the minor performance benefits. In any event, that's not an issue in this repo, so I'm going to close this, but feel free to open another issue in cti-python-stix2 to discuss 😀
Thanks again!
from cti.
Thank you so much for all the detailed information and for teaching me the right way to do things. I learned a lot in this thread issue. Thank you again. and any future issue will be open in the STIX repo. 👍 I hope you all have a great weekend!!!
from cti.
Thank you @emmanvg @chisholm @gtback 👍
from cti.
As far as I know yes that's the best way to do it. You may get more info if you ask on the cti-python-stix2 issue tracker though since the team over there would know more about the capabilities of their library.
FWIW if you're trying to use ATT&CK with Pandas in 2021, we have an official way of doing that: https://github.com/mitre-attack/mitreattack-python/tree/master/mitreattack/attackToExcel#accessing-the-pandas-dataframes
from cti.
Hey @Cyb3rWard0g...I'm going to pass this along to the folks working on the python-stix2 lib to get their feedback on this. As always, thanks so much for bringing this to our attention!
from cti.
Thank you @jburns12 👍 it is probably nothing, but I figured I would let you guys know first just in case 😄 . Np. Thank you guys!
from cti.
Hi @Cyb3rWard0g, it sounds like you might be working with stix2
objects rather than JSON when you reach line 5 in this section of the code. Since I cannot see at which step in your code this happened, my suggestion is to call serialize()
on those objects.
<ipython-input-5-e6f5fde53c7b> in <module>()
3 print(" ")
4 df = all_attack['techniques']
----> 5 df = json_normalize(df)
6 df.reindex(['matrix', 'object created','tactic', 'technique', 'technique id', 'data sources'], axis=1)[0:5]
For example, assuming all_attack['techniques']
is a stix2
object you can perform all_attack['techniques'].serialize()
. Let me know if that helped!
from cti.
Hey @emmanvg Thank you for the suggestion and for taking a look at this issue. When I work with the data from ATT&CK content in STIX, I first parse every field to rename some of the columns and put everything in dictionaries. Those dictionaries then get aggregated in a list. STIX objects do not get passed to the list. I checked every object in techniques with a for loop :
techniques = all_attack['techniques']
for t in techniques:
print(type(t)
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
..
....
Now, I dont know how to check if those dictionaries are also considered STIX objects. When I pass serialize() to the objects it does not work because I am now working with lists and dictionaries. I loop through STIX objects and then pass them to dictionaries and then to a long list to manage the fields better and keep it consistent across all objects. I am releasing this first beta version of a script I am working on today. I hope it helps to provide more context to the question. 👍
from cti.
Hey @emmanvg ! Thank you very much for the information. I will then use that method to convert the STIX objects to Json first. The reason why I rename most of the fields is because when you want to have mappings going matrix -> tactic -> technique -> technique id -> group -> group id -> software -> software id , I cannot use the field "name" or simply "id" since it will create conflicts. I will work on the serialize update. thank you again!!! 👍
from cti.
I really appreciate the help and thank you for providing all those details that make perfect sense!!! 👍 I will try the str(obj['created']) then and it seemed that will solve the issue since it will not be returning STIXdatetime object instead. This is why I wanted to share it with you guys first before opening all kinds of issues in Pandas without much information on how cti-python-stix2 was handling those functions. I am glad I could help 👍 Thank you again. I will share some updates of the changes to the script in the next couple of days.
from cti.
This is easy to reproduce without python-stix2. If you subclass python's datetime.datetime
class, pandas blows up (this is with python 3.6):
import datetime
import pandas as pd
class DTTest(datetime.datetime):
pass
our_dt = DTTest(2012,1,2)
pd.to_datetime(our_dt)
produces:
Traceback (most recent call last):
File "...", line 8, in <module>
pd.to_datetime(our_dt)
File "...\lib\site-packages\pandas\core\tools\datetimes.py", line 469, in to_datetime
result = _convert_listlike(np.array([arg]), box, format)[0]
File "...\lib\site-packages\pandas\core\tools\datetimes.py", line 368, in _convert_listlike
require_iso8601=require_iso8601
File "pandas\_libs\tslib.pyx", line 492, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 544, in pandas._libs.tslib.array_to_datetime
AttributeError: 'DTTest' object has no attribute 'nanosecond'
The stacktrace is a little different (I called a different function, to try to simplify), but I bet it's the same underlying issue. There is no documented "nanosecond" field on datetime. Instance attributes are documented here. It works fine with a plain datetime object. Pandas might be doing some voodoo it shouldn't be doing. I'd ask what's going on in a pandas help forum. Give them a simple test like this and ask why pandas is misbehaving.
In the meantime, I guess you need to avoid passing STIXdatetime instances into pandas!
from cti.
Hey @chisholm ! very interesting. so when you say
In the meantime, I guess you need to avoid passing STIXdatetime instances into pandas!
You mean to follow the str(obj['created']) example to any object returning STIXdatetime correct?
from cti.
@chisholm thank you very much for providing more details. It is a 0.23.0 version issue since I am not having the same issues with Pandas 0.22.0 in Python2.7 and 3.6. I mentioned that they added this to their Pandas version 0.23.0 https://github.com/pandas-dev/pandas/blob/0.23.x/pandas/core/dtypes/cast.py#L920
# we might have a sequence of the same-datetimes with tz's
# if so coerce to a DatetimeIndex; if they are not the same,
# then these stay as object dtype, xref GH19671
try:
from pandas._libs.tslibs import conversion
from pandas import DatetimeIndex
values, tz = conversion.datetime_to_datetime64(v)
return DatetimeIndex(values).tz_localize(
'UTC').tz_convert(tz=tz)
except (ValueError, TypeError):
pass
I tracked the error I had back to that part of the library.
from cti.
niceee makes sense. Thank you @chisholm 😄
from cti.
Good evening @chisholm . I wanted to provide an update on how I am managing the STIXdatetime object type error. I noticed that even without using Pandas and simply "json.dumps(stix_object)", I get the same error. I pass the STIX object to a dictionary to apply a standard naming convention to all the data I retrieve from TAXII. for example when I run the function from my library, it returns a dictionary with STIX object types.
>>> group_name = lift.get_group_by_alias('Cozy Bear')
>>> group_name
{'created_by_ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5', 'group_aliases': [u'APT29', u'The Dukes', u'Cozy Bear', u'CozyDuke'], 'grou_description': u'APT29 is threat group that has been attributed to the Russian government and has operated since at least 2008. (Citation: F-Secure The Dukes) (Citation: GRIZZLY STEPPE JAR) This group reportedly compromised the Democratic National Committee starting in the summer of 2015. (Citation: Crowdstrike DNC June 2016)', 'group_references': None, 'id': u'intrusion-set--899ce53f-13a0-479b-a0e4-67d46e241542', 'group': u'APT29', 'matrix': u'mitre-attack', 'created': '2017-05-31T21:31:52.748Z', 'url': u'https://attack.mitre.org/wiki/Group/G0016', 'modified': '2018-04-18T17:59:24.739Z', 'group_id': u'G0016', 'type': u'intrusion-set'}
I can confirm that the created field is a STIXdatetime:
<class 'stix2.utils.STIXdatetime'>
Then I just simply:
>>> import json
>>> json.dumps(group_name)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 243, in dumps
return _default_encoder.encode(obj)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: '2017-05-31T21:31:52.748Z' is not JSON serializable
So I was wondering if it is also related to what @emmanvg mentioned earlier in this thread
Hi @Cyb3rWard0g, the behavior you are experiencing may be a bug that needs to be addressed in stix2. I noticed that using dict() on a stix2 object will not correctly serialize datetime properties. While we figure this out, you can perform the following workaround (look at the attached image). In this step
Anyways. I am just using the workaround passing the STIXdatetime object type to a str.
str(object['created'])
when I do that I get the following date and it can be passed as a string instead of STIXdatetime:
"2017-05-31 21:31:52.748000+00:00"
Just sharing some updates on how I am approaching this error 👍 Thank you for all your help an time ! Have a great weekend!
from cti.
@Cyb3rWard0g, you should not call json.dumps(stix_obj)
because stix2
objects implement a custom JSONEncoder found here. You should instead use stix_obj.serialize()
and it will return a string of the serialized object. You could then turn it back into a dictionary by using json.loads()
. Note that datetime properties will be serialized with a format like this one 2017-05-31T21:31:52.748Z
. If str(object['created']) works best for you, I'd say continue down that path. 👍
from cti.
Thank you so much @emmanvg and @chisholm . It now makes more sense. I appreciate the time you guys took to answer my questions and all the details. I think this can be closed then correct?
from cti.
There's not currently a way to convert a stix2 object to a "clean" Python dict with only built-in data types, other than calling serialize() and then json.loads()-ing the result
Just for confirmation as I found nothing in the doc. Is it still the recommended solution today if we need data as native dict ?
from cti.
Related Issues (20)
- https://cti-taxii.mitre.org/taxii/ Taxi Server is Down HOT 1
- Update relationships micro library to include new campaign objects
- Techniques table is empty for the Mobile Tactics Network Effects and Remote Service Effects HOT 1
- relationship between attack-pattern and tool
- Mitre Taxii Service Throwing 502 Errors
- Some revoked attack pattern miss the revoked-by relation in mobile domain HOT 2
- x_mitre_domains field for x-mitre-matrices populated only for ics
- Microsoft Defender Detection HOT 4
- [T1059.009] Cloud API - Typo in source name HOT 2
- v13.0 bundle ids match in both mitre/cti and mitre-attack/attack-stix-data, but content is different
- x_mitre_data_sources missing for Mobile ATT&CK attack-patterns HOT 2
- ICS platform information
- Some relationship missing when v12, v13 release HOT 2
- Alias of APT37 has a typo HOT 1
- The CAPEC dataset is not updated with the one available on capec.mitre.org
- Missing Some Records in 'Data Sources' HOT 1
- Request for ATT&CK version to be added to objects
- ATT&CK attack-patterns no longer have external_references to CAPEC HOT 1
- Bug: All MITRE ATT&CK ICS Techniques have "x_mitre_platforms": [ "None" ] HOT 2
- Certificate Expired
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cti.