Comments (17)
Done 👍
from pyais.
Hi @M0r13n and @Inrixia, regarding your discussion on turn
vs rate_of_turn
. I dont understand how this conversion differs in any way from for example speed
, lat
and lon
? Why is the raw value kept for the turn
field and not for the other fields?
For consistency, and better usability, I suggest to also make this conversion during decoding/encoding in pyais.
from pyais.
@M0r13n is turn actually a useful value without performing the conversion? I haven't needed to use (yet) it so didn't mind leaving it be, but if it's useless without the conversion may as well just convert it.
from pyais.
I haven't been able to check everything yet but I have looked over the changes and they look good. If you want to merge then feel free to. I'll double check all the types etc sometime soon.
from pyais.
Just ran a bunch of tests and using the new types, so far looking great!
Thanks again
from pyais.
Hey @Inrixia :-),
The naming is inconsistent for spare and regional, MessageType18 has reserved and reserved_2 rather than reserved_1 and reserved_2 and both fields change this up on a per message basis.
_1, _2 etc should be used for all fields which have multiple instances such as reserved and spare.
Yep.
Single bit fields accuracy, raim, dte, retransmit, assigned, cs, display, dsc, band, msg22, virtual_aid, power, addressed, band_a, band_b, structured and gnss are boolean according to the docs for some message types and int for others. As these are single bit fields they should all be boolean for all message types.
Yep.
turn according to the docs is a Signed integer with scale - renders as float, suffix is decimal places but its type is int in the library.
As the docs say renders as float, I believe this is actually meant to be a float.
This is a little bit more complicated than that. Currently the lib returns the raw values for turn
. That are values between -127 and 128 and are Integers. You, on the other hand, are talking about the rate of turn. To obtain the rate of turn one needs to divide the raw field value by 4.733 and square it. -> (turn_raw / 4.733)**2
. This value would then render as float.
data should be a string representation of the binary and not a int as you cannot have leading 0's in python integers.
Wouldn't this be the slowest most efficient way of storing binary data? I decided to store the data as an int, because you can apply bitwise operations to the data and easily convert the data to bytes, etc.
reserved/regional naming is completely inconsistent here.
I named the fields the same as the documentation that I used. That why it's sometimes regional and sometimes reserved. https://gpsd.gitlab.io/gpsd/AIVDM.html#_type_19_extended_class_b_cs_position_report
The dict of types you provided in #54 appears to be missing vendorid, model and serial from from Type 24 and addressed, structured, app_id from Type 25 & Type 26
Better?
{'accuracy': <class 'bool'>,
'addressed': <class 'bool'>,
'aid_type': <class 'int'>,
'ais_version': <class 'int'>,
'alt': <class 'int'>,
'app_id': <class 'int'>,
'assigned': <class 'bool'>,
'band': <class 'bool'>,
'callsign': <class 'str'>,
'cs': <class 'bool'>,
'dac': <class 'int'>,
'data': <class 'int'>,
'day': <class 'int'>,
'dest_mmsi': <class 'int'>,
'destination': <class 'str'>,
'display': <class 'bool'>,
'draught': <class 'float'>,
'dsc': <class 'bool'>,
'dte': <class 'bool'>,
'epfd': <class 'int'>,
'fid': <class 'int'>,
'gnss': <class 'bool'>,
'heading': <class 'int'>,
'hour': <class 'int'>,
'imo': <class 'int'>,
'increment1': <class 'int'>,
'increment2': <class 'int'>,
'increment3': <class 'int'>,
'increment4': <class 'int'>,
'interval': <class 'int'>,
'lat': <class 'float'>,
'lon': <class 'float'>,
'maneuver': <class 'int'>,
'minute': <class 'int'>,
'mmsi': <class 'int'>,
'mmsi1': <class 'int'>,
'mmsi2': <class 'int'>,
'mmsi3': <class 'int'>,
'mmsi4': <class 'int'>,
'mmsiseq1': <class 'int'>,
'mmsiseq2': <class 'int'>,
'mmsiseq3': <class 'int'>,
'mmsiseq4': <class 'int'>,
'model': <class 'int'>,
'month': <class 'int'>,
'msg22': <class 'bool'>,
'msg_type': <class 'int'>,
'name': <class 'str'>,
'name_ext': <class 'str'>,
'ne_lat': <class 'int'>,
'ne_lon': <class 'int'>,
'number1': <class 'int'>,
'number2': <class 'int'>,
'number3': <class 'int'>,
'number4': <class 'int'>,
'off_position': <class 'bool'>,
'offset1': <class 'int'>,
'offset1_1': <class 'int'>,
'offset1_2': <class 'int'>,
'offset2': <class 'int'>,
'offset2_1': <class 'int'>,
'offset3': <class 'int'>,
'offset4': <class 'int'>,
'partno': <class 'int'>,
'quiet': <class 'int'>,
'radio': <class 'int'>,
'raim': <class 'bool'>,
'repeat': <class 'int'>,
'reserved_1': <class 'int'>,
'reserved_2': <class 'int'>,
'retransmit': <class 'bool'>,
'second': <class 'int'>,
'seqno': <class 'int'>,
'serial': <class 'int'>,
'ship_type': <class 'int'>,
'shipname': <class 'str'>,
'spare_1': <class 'int'>,
'spare_2': <class 'int'>,
'spare_3': <class 'int'>,
'spare_4': <class 'int'>,
'speed': <class 'float'>,
'station_type': <class 'int'>,
'status': <class 'int'>,
'structured': <class 'bool'>,
'sw_lat': <class 'int'>,
'sw_lon': <class 'int'>,
'text': <class 'str'>,
'timeout1': <class 'int'>,
'timeout2': <class 'int'>,
'timeout3': <class 'int'>,
'timeout4': <class 'int'>,
'to_bow': <class 'int'>,
'to_port': <class 'int'>,
'to_starboard': <class 'int'>,
'to_stern': <class 'int'>,
'turn': <class 'int'>,
'txrx': <class 'int'>,
'type1_1': <class 'int'>,
'type1_2': <class 'int'>,
'type2_1': <class 'int'>,
'vendorid': <class 'str'>,
'virtual_aid': <class 'bool'>,
'year': <class 'int'>}
from pyais.
Wouldn't this be the slowest most efficient way of storing binary data? I decided to store the data as an int, because you can apply bitwise operations to the data and easily convert the data to bytes, etc.
100% correct I incorrectly thought that you were returning the binary values as a integer of 0's and 1's.
However returning as a int
is still really not ideal as for message types which have a large data field such as Type 17 the integer value that represents binary data can grow to massive sizes even for a bigint
and attempting to use this field as a normal int
or even a long
or bigint
in other libraries will not work with large data fields.
Because of this all unknown binary fields like spare should be a bytes
type as this would remove any ambiguity, chance for errors and incompatibility with external and/or strictly typed systems.
This is a little bit more complicated than that. Currently the lib returns the raw values for turn. That are values between -127 and 128 and are Integers. You, on the other hand, are talking about the rate of turn. To obtain the rate of turn one needs to divide the raw field value by 4.733 and square it. -> (turn_raw / 4.733)**2. This value would then render as float.
This makes sense and I'm happy with turn being the integer representation.
Though I noticed you added the rate_of_turn
property to types 1,2,3 but it does not appear when decoding and its types are missing from the type doc is this meant to be a accessible value?
If getting that field working is problematic or going to slow performance I suggest just removing it and requiring the user to perform the conversion.
Lines 536 to 544 in 8b7b74e
Better?
This is great thanks, though going over every field for every message type I have found that the following fields are missing from the types you provided:
course, channel_a, channel_b, power, dest1, empty_1, dest_2, empty_2, band_a, band_b, zonesize
I have gone over all the types for each field and checked to see if there are any inconsistencies, ie MessageType1
returning speed as a float
correctly but MessageType9
returning it as a int
. Inconsistencies in the typing like this cause systems which are type strict to not handle values properly causing major issues.
This is actually what prompted this chain of issues and fixes in the first place when we noticed some fields having issues.
Below are fields which have inconsistent types:
MessageType9
speed is a integer
should be a float
.
Line 682 in 8b7b74e
MessageType5, MessageType6, MessageType12, MessageType21
spare_1 is a boolean
while this is correct as its a single bit value spare_1 can be longer than 1 bit in other message types and thus should be consistent everywhere. Ideally of type bytes
re the binary field statement above.
Line 614 in 8b7b74e
Line 629 in 8b7b74e
Line 731 in 8b7b74e
Line 938 in 8b7b74e
MessageType27
speed is a integer
should be a float
.
course is a integer
should be a float
.
spare_1 is a boolean
should be bytes
re above.
Lines 1334 to 1337 in 8b7b74e
This was actually mentioned in the original PR regarding fixing return types #57 (comment)
And while the reply that they are unsigned integers is valid, it really needs to be consistent to not cause problems re the statement about strict type systems above.
This would be as simple as having the return be 57.0
rather than 57
for example.
MessageType23
ne_lat, ne_lon, sw_lat, sw_lon is a integer
should be a float
to be consistent with MessageType22Broadcast
Lines 1042 to 1045 in 8b7b74e
A minor issue, the description for the decode
function is incorrect now as the types are wrong I can only assume that this may be true for other parts of the library such as tests too, might be something worth checking.
Lines 310 to 315 in 8b7b74e
It's probably a good idea to try and think of a good way to check/enforce types ideally through tests.
I was going to work on something like this to generate a types dict but due to pythons lack of native types support afik the only way to do this would be to decode one of each message type & variant and then check/use the types of all the returned dicts for tests.
How are you generating your types? Or are you just doing them by hand?
from pyais.
Because of this all unknown binary fields like spare should be a bytes type as this would remove any ambiguity, chance for errors and incompatibility with external and/or strictly typed systems.
Seems legit. I also added some handy conversion functions bits2bytes
and bytes2bits
. I also decided to encode those bytes as Base64, when using the messages to_json()
method.
Though I noticed you added the rate_of_turn property to types 1,2,3 but it does not appear when decoding and its types are missing from the type doc is this meant to be a accessible value?
This would mess up the code, due to how attrs
works. We would need to add additional logic to handle raw fields and interpreted fields differently. So I decided to add rate_of_turn
as a nice property, which can be used - but is totally optional.
spare_1 is a boolean while this is correct as its a single bit value spare_1 can be longer than 1 bit in other message types and thus should be consistent everywhere. Ideally of type bytes re the binary field statement above.
I always assumed that spare
field would never be used. They just need to be present so that the iterative decoder uses correct bounds for fields.
It's probably a good idea to try and think of a good way to check/enforce types ideally through tests.
I decided to use attrs
__attrs_post_init__
method to enforce strict types.
def __attrs_post_init__(self) -> None:
"""
Called after __init__ by attrs.
Makes sure that every object has the correct datatype. Therefore
strict types are enforced.
"""
for field in self.fields():
val = getattr(self, field.name)
if val is None:
continue
d_type = field.metadata['d_type']
coerce = field.metadata['coerce']
if coerce is None:
coerce = d_type
try:
coerced_val = coerce_val(val, coerce)
except ValueError as err:
raise ValueError(f"Could not coerce value for field '{field.name}'") from err
setattr(self, field.name, coerced_val)
It's not the most efficient way to handle this, because now for every message all fields are iterated twice. But it is very readable and understandable. The impact is also fairly slim. When decoding those 82758 messages from the included sample NMEA file I got the following results:
master
: Decoding 82758 messages took: 3.63838291168212963-iterative-decoding
: Decoding 82758 messages took: 4.214369058609009
How are you generating your types? Or are you just doing them by hand?
They are generated here
from pyais.
Seems legit. I also added some handy conversion functions
bits2bytes
andbytes2bits
. I also decided to encode those bytes as Base64, when using the messagesto_json()
method.
I presume that asdict()
will still return the binary fields as bytes
though yes?
This would mess up the code, due to how
attrs
works. We would need to add additional logic to handle raw fields and interpreted fields differently. So I decided to addrate_of_turn
as a nice property, which can be used - but is totally optional.
Right, I was just confused as I tried accessing it even outside of asdict()
and was unable to. But as long as it doesn't impact performance I have 0 need for it so it doesn't affect me. That said perhaps you should look into it if you want to make it available for use, or perhaps I am was just incorrectly trying to access it.
I always assumed that
spare
field would never be used. They just need to be present so that the iterative decoder uses correct bounds for fields.
True but potentially they could be of use. Imo they should either be deemed usable and left in or deemed non usable and removed. But regardless of that all fields should have consistent types so it has to be changed to binary, or have the fields name changed. The former of which is the obvious solution without convoluting things.
I decided to use
attrs
__attrs_post_init__
method to enforce strict types.
I don't quite see how this is useful as its only enforcing types at the object/message type level and not overall so will still allow for inconsistencies between objects and message types. Setting the type for each field per object will just allow for the same inconsistencies to exist as they do now.
What is really needed is a test to catch code errors across types by having one source of truth, ie the dict of all fields and their types which each class can be validated against. This would also remove the need to run type checks at runtime which imo is pointless and a unnecessary overhead as any type errors should come from the decoder being incorrectly setup for a field and not change depending on the input given.
Looking at your latest commit 479825e on the https://github.com/M0r13n/pyais/compare/63-fix-inconsistencies branch it looks like your fixing the inconsistent types.
Though I don't understand why mmsi
is being coerced with a str
type when its type is a int
.
from pyais.
I presume that asdict() will still return the binary fields as bytes though yes?
Yes, see
True but potentially they could be of use. Imo they should either be deemed usable and left in or deemed non usable and removed. But regardless of that all fields should have consistent types so it has to be changed to binary, or have the fields name changed. The former of which is the obvious solution without convoluting things.
I just followed your suggestion and made them all bytes. Personally I don't really care, so bytes are totally fine for me :-)
What is really needed is a test to catch code errors across types by having one source of truth, ie the dict of all fields and their types which each class can be validated against.
Done. See. This method iterates over all attributes of a decoded message and makes sure that the decoded values actually match the given datatype.
Though I don't understand why mmsi is being coerced with a str type when its type is a int.
This is a bit messy
The problem with MMSIs is the following:
- during decoding, they are parsed like regular integers
- but their actual datatype is a string that stores leading zeros
This is done by to_mmsi
, whose body looks like:
def to_mmsi(v: typing.Union[str, int]) -> str:
return str(v).zfill(9)
This is a dilemma, because:
- the datatype can't be set to
str
, because then the lib would try to decode the field using ASCII6 - if the datatype remains
int
, everything works, but the type is misleading, asto_mmsi
always returns strings
Currently, the mmsi
field is always a string, even though it's datatype says otherwise. Which is unintuitive.
I see the following possibilities:
- Always use int's for the mmsi (no leading zeros)
- Use the additional
coerce
field to force types and make thing a bit more explicit - Add a
mmsi_str
datatype, which inherits fromstr
I think that the mmsi_str
is a nice compromise.
from pyais.
Done. See.
This is great but actually still will allow for inconsistent types if not paired with another check, as it is checking a test decoded object against its decoding classes types, two classes could have different types for the same named field and this would fail to identify that.
In order to resolve this either a single source of truth for field types would need to be created, and then all classes be validated against that, imo the single source of truth should be built from all the class types and fail/throw when two fields do not match.
Technically speaking you are already doing this here:
Lines 932 to 952 in b50fd4d
But you need to remove the special handling for the types that were inconsistent and should now be fixed (ironically the exact thing this function was meant to identify and prevent).
With this second check in place tests should handle testing individual classes against their expected types using real world data, and validating that all classes conform to the overall schema hopefully removing any chance of issues.
Also previously the above function failed to identify/generate types for some message types re:
This is great thanks, though going over every field for every message type I have found that the following fields are missing from the types you provided:
course, channel_a, channel_b, power, dest1, empty_1, dest_2, empty_2, band_a, band_b, zonesize
Was this issue resolved? I presume it was due to missing some classes when testing or the fact that some classes have split logic.
I decided to use
attrs
__attrs_post_init__
method to enforce strict types.
Was this runtime check removed? With the above changes to type tests it is redundant and just hurts performance.
This is a bit messy
Imo if the only reason its being turned into a string is for padding then its better to just have the library return the integer and leave it up to the user to zfill, this would potentially improve performance (though likely not noticeably) and give the ability to choose to the user.
If you wanted to still support a user getting the string mmsi out of the library you could add a mmsi_str
field as a optional property like rate_of_turn
which is only calculated if the user of the lib explicitly access it (I presume/hope?).
from pyais.
This is great but actually still will allow for inconsistent types if not paired with another check, as it is checking a test decoded object against its decoding classes types, two classes could have different types for the same named field and this would fail to identify that.
I think that we were talking at cross-purposes all along.
-
I tried to ensure that decoded values always have the data type defined by the respective class (e.g. a mmsi is actually of type str and not int, etc.)
-
You, on the other hand, aimed at the fact that identical fields also have identical data types across all classes
I guess that both concerns are valid and do complement each other. And I guess the only thing left to do is to:
But you need to remove the special handling for the types that were inconsistent and should now be fixed
Which I resolved.
Was this issue resolved? I presume it was due to missing some classes when testing or the fact that some classes have split logic.
Yes, this was resolved.
Was this runtime check removed? With the above changes to type tests it is redundant and just hurts performance.
Yes. But it still was useful, because I identified quite a few attributes that had wrong types. But because these were removed and additional unittests were added, this hook is now obsolete.
Imo if the only reason its being turned into a string is for padding then its better to just have the library return the integer and leave it up to the user to zfill, this would potentially improve performance (though likely not noticeably) and give the ability to choose to the user.
Yep. This is also consistent with other popular encoders. I looked at a few and they all seem to return Integers without leading zeros. Fun fact: Removing the conversion of an integer to a string with leading zeros made the lib ~15% faster when decoding. (83k messages).
from pyais.
Awesome! I suspected that removing the string conversion might speed things up a bit
Please link me the branch once you are done with the changes and I can take a look/run some tests.
And thanks again for all the work on resolving these issues
from pyais.
Here you go: #64
from pyais.
This is actually a good question. I do not see any problems. Do you @Inrixia ?
from pyais.
@Inrixia Did you have the chance to look at the PR? I would like to merge it, but would appreciate if you take a look at it.
from pyais.
Great to hear!
from pyais.
Related Issues (20)
- [docs] messages.rst content duplicated. HOT 1
- PiPY out of date? HOT 1
- Stream from serial/COM port HOT 2
- lat lon converters float accuracy HOT 2
- AtoN codes do not match R0126 (A-126) Table 3 or M.1371-5 TABLE 74 HOT 1
- Provide decoding of communication status field in types 1, 2, 3, 4, 9, 11, 18 HOT 5
- pyais fails to decode type 5 messages HOT 6
- Message Type 26 Logic may be incorrect HOT 2
- A (debugging?) print statement is present in code HOT 3
- Message decoding failed HOT 2
- Only one attribute of decode
- Urgent need HOT 4
- Encoding large (Type 5) messages causes NMEA max message length to exceed 82 characters. HOT 8
- Is the checksum check not commonly used? HOT 4
- failed parse some MessageType5 data , it throw 'MissingMultipartMessageException' HOT 6
- Different payload after decode/encode roundtrip HOT 8
- Proposal for update HOT 5
- TypeError: 'dict' object is not callable HOT 2
- Gatehouse wrapper messages feature HOT 8
- Variable-width text fields processed as constant-width ones HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyais.