Good evening Team,
I hope you guys are having a good day. I have been playing with the ATT&CK STIX content for the past week and I wanted to report an issue that I am not sure if this is an issue with Pandas or the STIX library. However, I figured it would be good to share it here first just in case I am missing something and also if anyone is having an issue when using pandas 0.23.0 (Latest Version) with ATT&CK STIX content via TAXII.
I tested ATT&CK STIX content with Pandas 0.21.0 and 0.22.0 and everything was working fine. I was getting everything fine like this:
{'contributors': 'NA',
'data sources': [u'File monitoring',
u'Process monitoring',
u'Process command-line parameters'],
'defense bypassed': 'NA',
'description': u'Collected data is staged in a central location or directory prior to Exfiltration. Data may be kept in separate files or combined into one file through techniques such as Data Compressed or Data Encrypted.\n\nInteractive command shells may be used, and common functionality within cmd and bash may be used to copy data into a staging location.\n\nDetection: Processes that appear to be reading files from disparate locations and writing them to the same directory or file may be an indication of data being staged, especially if they are suspected of performing encryption or compression on the files.\n\nMonitor processes and command-line arguments for actions that could be taken to collect and combine files. Remote access tools with built-in features may interact directly with the Windows API to gather and copy to a location. Data may also be acquired and staged through Windows system management tools such as Windows Management Instrumentation and PowerShell.\n\nPlatforms: Linux, macOS, Windows\n\nData Sources: File monitoring, Process monitoring, Process command-line parameters',
'detectable': 'NA',
'detectable description': 'NA',
'difficulty': 'NA',
'difficulty description': 'NA',
'effective permissions': 'NA',
'matrix': u'mitre-attack',
'network requirements': 'NA',
'object created': '2017-05-31T21:30:58.938Z',
'object created by ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5',
'object id': u'attack-pattern--7dd95ff6-712e-4056-9626-312ea4ab4c5e',
'object marking refs': ['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'],
'object modified': '2018-04-18T17:59:24.739Z',
'object type': u'attack-pattern',
'permission required': 'NA',
'platforms': [u'Linux', u'macOS', u'Windows'],
'references': 'NA',
'remote support': 'NA',
'system requirements': 'NA',
'tactic': [u'collection'],
'tactic type': 'NA',
'technique': u'Data Staged',
'technique id': u'T1074',
'url': u'https://attack.mitre.org/wiki/Technique/T1074'}
However, when I tested it with Pandas 0.23.0, I got the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/Library/Python/3.6/lib/python/site-packages/pandas/core/dtypes/cast.py in try_datetime(v)
913 require_iso8601=True,
--> 914 errors='raise')
915 except ValueError:
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<ipython-input-5-e6f5fde53c7b> in <module>()
3 print(" ")
4 df = all_attack['techniques']
----> 5 df = json_normalize(df)
6 df.reindex(['matrix', 'object created','tactic', 'technique', 'technique id', 'data sources'], axis=1)[0:5]
~/Library/Python/3.6/lib/python/site-packages/pandas/io/json/normalize.py in json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep)
202 # reasonably
203 data = nested_to_record(data, sep=sep)
--> 204 return DataFrame(data)
205 elif not isinstance(record_path, list):
206 record_path = [record_path]
~/Library/Python/3.6/lib/python/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
385 if is_named_tuple(data[0]) and columns is None:
386 columns = data[0]._fields
--> 387 arrays, columns = _to_arrays(data, columns, dtype=dtype)
388 columns = _ensure_index(columns)
389
~/Library/Python/3.6/lib/python/site-packages/pandas/core/frame.py in _to_arrays(data, columns, coerce_float, dtype)
7435 elif isinstance(data[0], collections.Mapping):
7436 return _list_of_dict_to_arrays(data, columns,
-> 7437 coerce_float=coerce_float, dtype=dtype)
7438 elif isinstance(data[0], Series):
7439 return _list_of_series_to_arrays(data, columns,
~/Library/Python/3.6/lib/python/site-packages/pandas/core/frame.py in _list_of_dict_to_arrays(data, columns, coerce_float, dtype)
7558 content = list(lib.dicts_to_array(data, list(columns)).T)
7559 return _convert_object_array(content, columns, dtype=dtype,
-> 7560 coerce_float=coerce_float)
7561
7562
~/Library/Python/3.6/lib/python/site-packages/pandas/core/frame.py in _convert_object_array(content, columns, coerce_float, dtype)
7578 return arr
7579
-> 7580 arrays = [convert(arr) for arr in content]
7581
7582 return arrays, columns
~/Library/Python/3.6/lib/python/site-packages/pandas/core/frame.py in <listcomp>(.0)
7578 return arr
7579
-> 7580 arrays = [convert(arr) for arr in content]
7581
7582 return arrays, columns
~/Library/Python/3.6/lib/python/site-packages/pandas/core/frame.py in convert(arr)
7575 if dtype != object and dtype != np.object:
7576 arr = lib.maybe_convert_objects(arr, try_float=coerce_float)
-> 7577 arr = maybe_cast_to_datetime(arr, dtype)
7578 return arr
7579
~/Library/Python/3.6/lib/python/site-packages/pandas/core/dtypes/cast.py in maybe_cast_to_datetime(value, dtype, errors)
1086 elif not (is_array and not (issubclass(value.dtype.type, np.integer) or
1087 value.dtype == np.object_)):
-> 1088 value = maybe_infer_to_datetimelike(value)
1089
1090 return value
~/Library/Python/3.6/lib/python/site-packages/pandas/core/dtypes/cast.py in maybe_infer_to_datetimelike(value, convert_dates)
948 value = try_datetime(v)
949 elif inferred_type == 'datetime':
--> 950 value = try_datetime(v)
951 elif inferred_type == 'timedelta':
952 value = try_timedelta(v)
~/Library/Python/3.6/lib/python/site-packages/pandas/core/dtypes/cast.py in try_datetime(v)
922 from pandas import DatetimeIndex
923
--> 924 values, tz = conversion.datetime_to_datetime64(v)
925 return DatetimeIndex(values).tz_localize(
926 'UTC').tz_convert(tz=tz)
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_datetime_to_tsobject()
AttributeError: 'STIXdatetime' object has no attribute 'nanosecond'
The reason why at the beginning I thought it was STIX library was due to the following error message at the end:
AttributeError: 'STIXdatetime' object has no attribute 'nanosecond'
This is a very specific error in version 0.23.0 so I checked the changes to that specific definitions in pandas:
~/Library/Python/3.6/lib/python/site-packages/pandas/core/dtypes/cast.py in try_datetime(v)
922 from pandas import DatetimeIndex
923
--> 924 values, tz = conversion.datetime_to_datetime64(v)
925 return DatetimeIndex(values).tz_localize(
926 'UTC').tz_convert(tz=tz)
** Pandas Version 0.22.0:**
https://github.com/pandas-dev/pandas/blob/0.22.x/pandas/core/dtypes/cast.py#L879
** Pandas Version 0.23.0:**
https://github.com/pandas-dev/pandas/blob/0.23.x/pandas/core/dtypes/cast.py#L908
So they added the following in version 0.23.0:
https://github.com/pandas-dev/pandas/blob/0.23.x/pandas/core/dtypes/cast.py#L920
# we might have a sequence of the same-datetimes with tz's
# if so coerce to a DatetimeIndex; if they are not the same,
# then these stay as object dtype, xref GH19671
try:
from pandas._libs.tslibs import conversion
from pandas import DatetimeIndex
values, tz = conversion.datetime_to_datetime64(v)
return DatetimeIndex(values).tz_localize(
'UTC').tz_convert(tz=tz)
except (ValueError, TypeError):
pass
I checked the STIXdateTime class arguments and I dont see nanoseconds as an option
https://github.com/oasis-open/cti-python-stix2/blob/master/stix2/utils.py#L24
I am not sure if there is anything that needs to be done on the STIX library side.
I downgraded the Python3 Pandas package to 0.22.0 and it worked fine. I didnt want to start an issue in Pandas before asking you guys if this makes sense and if it is possible that nanoseconds needs to be defined as an argument for the STIXdatetime class.
I hope you all have a great weekend! No rush at all on this one. I will keep working with Pandas 0.22.0 for now. I dont need to use pandas to collect or filter the data initially. I use it for a better representation of the results after collecting everything via STIX and TAXII libraries. Therefore, if you want to close this issue since I am using an external library, I would understand. It is just that the STIXdatetime error message caught my attention and I wasnt sure if nanosecond is an standard or anything that needs to be defined on the STIX side. If not, then this issue can be close 😄
Once again guys, great job and thank you for all your help!! I hope you all have a great weekend!!!