Code Monkey home page Code Monkey logo

Comments (17)

M0r13n avatar M0r13n commented on July 25, 2024 2

Done 👍

from pyais.

freol35241 avatar freol35241 commented on July 25, 2024 1

Hi @M0r13n and @Inrixia, regarding your discussion on turn vs rate_of_turn. I dont understand how this conversion differs in any way from for example speed, lat and lon? Why is the raw value kept for the turn field and not for the other fields?

For consistency, and better usability, I suggest to also make this conversion during decoding/encoding in pyais.

from pyais.

Inrixia avatar Inrixia commented on July 25, 2024 1

@M0r13n is turn actually a useful value without performing the conversion? I haven't needed to use (yet) it so didn't mind leaving it be, but if it's useless without the conversion may as well just convert it.

from pyais.

Inrixia avatar Inrixia commented on July 25, 2024 1

I haven't been able to check everything yet but I have looked over the changes and they look good. If you want to merge then feel free to. I'll double check all the types etc sometime soon.

from pyais.

Inrixia avatar Inrixia commented on July 25, 2024 1

Just ran a bunch of tests and using the new types, so far looking great!
Thanks again

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

Hey @Inrixia :-),

The naming is inconsistent for spare and regional, MessageType18 has reserved and reserved_2 rather than reserved_1 and reserved_2 and both fields change this up on a per message basis.
_1, _2 etc should be used for all fields which have multiple instances such as reserved and spare.

Yep.

Single bit fields accuracy, raim, dte, retransmit, assigned, cs, display, dsc, band, msg22, virtual_aid, power, addressed, band_a, band_b, structured and gnss are boolean according to the docs for some message types and int for others. As these are single bit fields they should all be boolean for all message types.

Yep.

turn according to the docs is a Signed integer with scale - renders as float, suffix is decimal places but its type is int in the library.
As the docs say renders as float, I believe this is actually meant to be a float.

This is a little bit more complicated than that. Currently the lib returns the raw values for turn. That are values between -127 and 128 and are Integers. You, on the other hand, are talking about the rate of turn. To obtain the rate of turn one needs to divide the raw field value by 4.733 and square it. -> (turn_raw / 4.733)**2. This value would then render as float.

data should be a string representation of the binary and not a int as you cannot have leading 0's in python integers.

Wouldn't this be the slowest most efficient way of storing binary data? I decided to store the data as an int, because you can apply bitwise operations to the data and easily convert the data to bytes, etc.

reserved/regional naming is completely inconsistent here.

I named the fields the same as the documentation that I used. That why it's sometimes regional and sometimes reserved. https://gpsd.gitlab.io/gpsd/AIVDM.html#_type_19_extended_class_b_cs_position_report

The dict of types you provided in #54 appears to be missing vendorid, model and serial from from Type 24 and addressed, structured, app_id from Type 25 & Type 26

Better?
{'accuracy': <class 'bool'>,
 'addressed': <class 'bool'>,
 'aid_type': <class 'int'>,
 'ais_version': <class 'int'>,
 'alt': <class 'int'>,
 'app_id': <class 'int'>,
 'assigned': <class 'bool'>,
 'band': <class 'bool'>,
 'callsign': <class 'str'>,
 'cs': <class 'bool'>,
 'dac': <class 'int'>,
 'data': <class 'int'>,
 'day': <class 'int'>,
 'dest_mmsi': <class 'int'>,
 'destination': <class 'str'>,
 'display': <class 'bool'>,
 'draught': <class 'float'>,
 'dsc': <class 'bool'>,
 'dte': <class 'bool'>,
 'epfd': <class 'int'>,
 'fid': <class 'int'>,
 'gnss': <class 'bool'>,
 'heading': <class 'int'>,
 'hour': <class 'int'>,
 'imo': <class 'int'>,
 'increment1': <class 'int'>,
 'increment2': <class 'int'>,
 'increment3': <class 'int'>,
 'increment4': <class 'int'>,
 'interval': <class 'int'>,
 'lat': <class 'float'>,
 'lon': <class 'float'>,
 'maneuver': <class 'int'>,
 'minute': <class 'int'>,
 'mmsi': <class 'int'>,
 'mmsi1': <class 'int'>,
 'mmsi2': <class 'int'>,
 'mmsi3': <class 'int'>,
 'mmsi4': <class 'int'>,
 'mmsiseq1': <class 'int'>,
 'mmsiseq2': <class 'int'>,
 'mmsiseq3': <class 'int'>,
 'mmsiseq4': <class 'int'>,
 'model': <class 'int'>,
 'month': <class 'int'>,
 'msg22': <class 'bool'>,
 'msg_type': <class 'int'>,
 'name': <class 'str'>,
 'name_ext': <class 'str'>,
 'ne_lat': <class 'int'>,
 'ne_lon': <class 'int'>,
 'number1': <class 'int'>,
 'number2': <class 'int'>,
 'number3': <class 'int'>,
 'number4': <class 'int'>,
 'off_position': <class 'bool'>,
 'offset1': <class 'int'>,
 'offset1_1': <class 'int'>,
 'offset1_2': <class 'int'>,
 'offset2': <class 'int'>,
 'offset2_1': <class 'int'>,
 'offset3': <class 'int'>,
 'offset4': <class 'int'>,
 'partno': <class 'int'>,
 'quiet': <class 'int'>,
 'radio': <class 'int'>,
 'raim': <class 'bool'>,
 'repeat': <class 'int'>,
 'reserved_1': <class 'int'>,
 'reserved_2': <class 'int'>,
 'retransmit': <class 'bool'>,
 'second': <class 'int'>,
 'seqno': <class 'int'>,
 'serial': <class 'int'>,
 'ship_type': <class 'int'>,
 'shipname': <class 'str'>,
 'spare_1': <class 'int'>,
 'spare_2': <class 'int'>,
 'spare_3': <class 'int'>,
 'spare_4': <class 'int'>,
 'speed': <class 'float'>,
 'station_type': <class 'int'>,
 'status': <class 'int'>,
 'structured': <class 'bool'>,
 'sw_lat': <class 'int'>,
 'sw_lon': <class 'int'>,
 'text': <class 'str'>,
 'timeout1': <class 'int'>,
 'timeout2': <class 'int'>,
 'timeout3': <class 'int'>,
 'timeout4': <class 'int'>,
 'to_bow': <class 'int'>,
 'to_port': <class 'int'>,
 'to_starboard': <class 'int'>,
 'to_stern': <class 'int'>,
 'turn': <class 'int'>,
 'txrx': <class 'int'>,
 'type1_1': <class 'int'>,
 'type1_2': <class 'int'>,
 'type2_1': <class 'int'>,
 'vendorid': <class 'str'>,
 'virtual_aid': <class 'bool'>,
 'year': <class 'int'>}

from pyais.

Inrixia avatar Inrixia commented on July 25, 2024

Wouldn't this be the slowest most efficient way of storing binary data? I decided to store the data as an int, because you can apply bitwise operations to the data and easily convert the data to bytes, etc.

100% correct I incorrectly thought that you were returning the binary values as a integer of 0's and 1's.
However returning as a int is still really not ideal as for message types which have a large data field such as Type 17 the integer value that represents binary data can grow to massive sizes even for a bigint and attempting to use this field as a normal int or even a long or bigint in other libraries will not work with large data fields.

Because of this all unknown binary fields like spare should be a bytes type as this would remove any ambiguity, chance for errors and incompatibility with external and/or strictly typed systems.

This is a little bit more complicated than that. Currently the lib returns the raw values for turn. That are values between -127 and 128 and are Integers. You, on the other hand, are talking about the rate of turn. To obtain the rate of turn one needs to divide the raw field value by 4.733 and square it. -> (turn_raw / 4.733)**2. This value would then render as float.

This makes sense and I'm happy with turn being the integer representation.
Though I noticed you added the rate_of_turn property to types 1,2,3 but it does not appear when decoding and its types are missing from the type doc is this meant to be a accessible value?
If getting that field working is problematic or going to slow performance I suggest just removing it and requiring the user to perform the conversion.

pyais/pyais/messages.py

Lines 536 to 544 in 8b7b74e

@property
def rate_of_turn(self) -> typing.Optional[float]:
turn = self.turn
if not turn:
return 0.0
elif abs(turn) == 127 or abs(turn) == 128:
return None
else:
return math.copysign((round(turn / 4.733)) ** 2, turn)

Better?

This is great thanks, though going over every field for every message type I have found that the following fields are missing from the types you provided:
course, channel_a, channel_b, power, dest1, empty_1, dest_2, empty_2, band_a, band_b, zonesize

I have gone over all the types for each field and checked to see if there are any inconsistencies, ie MessageType1 returning speed as a float correctly but MessageType9 returning it as a int. Inconsistencies in the typing like this cause systems which are type strict to not handle values properly causing major issues.
This is actually what prompted this chain of issues and fixes in the first place when we noticed some fields having issues.

Below are fields which have inconsistent types:

MessageType9

speed is a integer should be a float.

speed = bit_field(10, int, default=0, signed=False)

MessageType5, MessageType6, MessageType12, MessageType21

spare_1 is a boolean while this is correct as its a single bit value spare_1 can be longer than 1 bit in other message types and thus should be consistent everywhere. Ideally of type bytes re the binary field statement above.

spare_1 = bit_field(1, bool, default=0)

spare_1 = bit_field(1, bool, default=0)

spare_1 = bit_field(1, bool, default=0, signed=False)

spare_1 = bit_field(1, bool, default=0)

MessageType27

speed is a integer should be a float.
course is a integer should be a float.
spare_1 is a boolean should be bytes re above.

pyais/pyais/messages.py

Lines 1334 to 1337 in 8b7b74e

speed = bit_field(6, int, default=0, signed=False)
course = bit_field(9, int, default=0, signed=False)
gnss = bit_field(1, bool, default=0, signed=False)
spare_1 = bit_field(1, bool, default=0)

This was actually mentioned in the original PR regarding fixing return types #57 (comment)
And while the reply that they are unsigned integers is valid, it really needs to be consistent to not cause problems re the statement about strict type systems above.
This would be as simple as having the return be 57.0 rather than 57 for example.

MessageType23

ne_lat, ne_lon, sw_lat, sw_lon is a integer should be a float to be consistent with MessageType22Broadcast

pyais/pyais/messages.py

Lines 1042 to 1045 in 8b7b74e

ne_lon = bit_field(18, int, from_converter=from_10th, to_converter=to_10th, default=0, signed=True)
ne_lat = bit_field(17, int, from_converter=from_10th, to_converter=to_10th, default=0, signed=True)
sw_lon = bit_field(18, int, from_converter=from_10th, to_converter=to_10th, default=0, signed=True)
sw_lat = bit_field(17, int, from_converter=from_10th, to_converter=to_10th, default=0, signed=True)

A minor issue, the description for the decode function is incorrect now as the types are wrong I can only assume that this may be true for other parts of the library such as tests too, might be something worth checking.

pyais/pyais/messages.py

Lines 310 to 315 in 8b7b74e

>>> nmea = NMEAMessage(b"!AIVDO,1,1,,,B>qc:003wk?8mP=18D3Q3wgTiT;T,0*13").decode()
MessageType18(msg_type=18, repeat=0, mmsi='1000000000', reserved=0, speed=1023,
accuracy=0, lon=181.0, lat=91.0, course=360.0, heading=511, second=31,
reserved_2=0, cs=True, display=False, dsc=False, band=True, msg22=True,
assigned=False, raim=False, radio=410340)
"""

It's probably a good idea to try and think of a good way to check/enforce types ideally through tests.
I was going to work on something like this to generate a types dict but due to pythons lack of native types support afik the only way to do this would be to decode one of each message type & variant and then check/use the types of all the returned dicts for tests.
How are you generating your types? Or are you just doing them by hand?

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

Because of this all unknown binary fields like spare should be a bytes type as this would remove any ambiguity, chance for errors and incompatibility with external and/or strictly typed systems.

Seems legit. I also added some handy conversion functions bits2bytes and bytes2bits. I also decided to encode those bytes as Base64, when using the messages to_json() method.

Though I noticed you added the rate_of_turn property to types 1,2,3 but it does not appear when decoding and its types are missing from the type doc is this meant to be a accessible value?

This would mess up the code, due to how attrs works. We would need to add additional logic to handle raw fields and interpreted fields differently. So I decided to add rate_of_turn as a nice property, which can be used - but is totally optional.

spare_1 is a boolean while this is correct as its a single bit value spare_1 can be longer than 1 bit in other message types and thus should be consistent everywhere. Ideally of type bytes re the binary field statement above.

I always assumed that spare field would never be used. They just need to be present so that the iterative decoder uses correct bounds for fields.

It's probably a good idea to try and think of a good way to check/enforce types ideally through tests.

I decided to use attrs __attrs_post_init__ method to enforce strict types.

    def __attrs_post_init__(self) -> None:
        """
        Called after __init__ by attrs.
        Makes sure that every object has the correct datatype. Therefore
        strict types are enforced.
        """
        for field in self.fields():
            val = getattr(self, field.name)
            if val is None:
                continue

            d_type = field.metadata['d_type']
            coerce = field.metadata['coerce']

            if coerce is None:
                coerce = d_type

            try:
                coerced_val = coerce_val(val, coerce)
            except ValueError as err:
                raise ValueError(f"Could not coerce value for field '{field.name}'") from err

            setattr(self, field.name, coerced_val)

It's not the most efficient way to handle this, because now for every message all fields are iterated twice. But it is very readable and understandable. The impact is also fairly slim. When decoding those 82758 messages from the included sample NMEA file I got the following results:

  • master: Decoding 82758 messages took: 3.638382911682129
  • 63-iterative-decoding: Decoding 82758 messages took: 4.214369058609009

How are you generating your types? Or are you just doing them by hand?

They are generated here

from pyais.

Inrixia avatar Inrixia commented on July 25, 2024

Seems legit. I also added some handy conversion functions bits2bytes and bytes2bits. I also decided to encode those bytes as Base64, when using the messages to_json() method.

I presume that asdict() will still return the binary fields as bytes though yes?

This would mess up the code, due to how attrs works. We would need to add additional logic to handle raw fields and interpreted fields differently. So I decided to add rate_of_turn as a nice property, which can be used - but is totally optional.

Right, I was just confused as I tried accessing it even outside of asdict() and was unable to. But as long as it doesn't impact performance I have 0 need for it so it doesn't affect me. That said perhaps you should look into it if you want to make it available for use, or perhaps I am was just incorrectly trying to access it.

I always assumed that spare field would never be used. They just need to be present so that the iterative decoder uses correct bounds for fields.

True but potentially they could be of use. Imo they should either be deemed usable and left in or deemed non usable and removed. But regardless of that all fields should have consistent types so it has to be changed to binary, or have the fields name changed. The former of which is the obvious solution without convoluting things.

I decided to use attrs __attrs_post_init__ method to enforce strict types.

I don't quite see how this is useful as its only enforcing types at the object/message type level and not overall so will still allow for inconsistencies between objects and message types. Setting the type for each field per object will just allow for the same inconsistencies to exist as they do now.

What is really needed is a test to catch code errors across types by having one source of truth, ie the dict of all fields and their types which each class can be validated against. This would also remove the need to run type checks at runtime which imo is pointless and a unnecessary overhead as any type errors should come from the decoder being incorrectly setup for a field and not change depending on the input given.

Looking at your latest commit 479825e on the https://github.com/M0r13n/pyais/compare/63-fix-inconsistencies branch it looks like your fixing the inconsistent types.
Though I don't understand why mmsi is being coerced with a str type when its type is a int.

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

I presume that asdict() will still return the binary fields as bytes though yes?

Yes, see

True but potentially they could be of use. Imo they should either be deemed usable and left in or deemed non usable and removed. But regardless of that all fields should have consistent types so it has to be changed to binary, or have the fields name changed. The former of which is the obvious solution without convoluting things.

I just followed your suggestion and made them all bytes. Personally I don't really care, so bytes are totally fine for me :-)

What is really needed is a test to catch code errors across types by having one source of truth, ie the dict of all fields and their types which each class can be validated against.

Done. See. This method iterates over all attributes of a decoded message and makes sure that the decoded values actually match the given datatype.

Though I don't understand why mmsi is being coerced with a str type when its type is a int.

This is a bit messy

The problem with MMSIs is the following:

  • during decoding, they are parsed like regular integers
  • but their actual datatype is a string that stores leading zeros

This is done by to_mmsi, whose body looks like:

def to_mmsi(v: typing.Union[str, int]) -> str:
    return str(v).zfill(9)

This is a dilemma, because:

  • the datatype can't be set to str, because then the lib would try to decode the field using ASCII6
  • if the datatype remains int, everything works, but the type is misleading, as to_mmsi always returns strings

Currently, the mmsi field is always a string, even though it's datatype says otherwise. Which is unintuitive.

I see the following possibilities:

  1. Always use int's for the mmsi (no leading zeros)
  2. Use the additional coerce field to force types and make thing a bit more explicit
  3. Add a mmsi_str datatype, which inherits from str

I think that the mmsi_str is a nice compromise.

from pyais.

Inrixia avatar Inrixia commented on July 25, 2024

Done. See.

This is great but actually still will allow for inconsistent types if not paired with another check, as it is checking a test decoded object against its decoding classes types, two classes could have different types for the same named field and this would fail to identify that.

In order to resolve this either a single source of truth for field types would need to be created, and then all classes be validated against that, imo the single source of truth should be built from all the class types and fail/throw when two fields do not match.

Technically speaking you are already doing this here:

pyais/tests/test_decode.py

Lines 932 to 952 in b50fd4d

def test_types_for_messages(self):
"""Make sure that the types are consistent for all messages"""
types = {}
for typ, msg in MSG_CLASS.items():
for field in msg.fields():
d_type = field.metadata['d_type']
f_name = field.name
if f_name in types:
if typ == 9 and f_name == 'speed' and d_type == int:
continue
if f_name == 'spare':
continue
if typ == 27 and f_name == 'speed' and d_type == int:
continue
if typ == 27 and f_name == 'course' and d_type == int:
continue
assert d_type == types[f_name], f"{typ}.{f_name}: {d_type} vs. {types[f_name]}"
else:
types[f_name] = d_type
pprint(types)

But you need to remove the special handling for the types that were inconsistent and should now be fixed (ironically the exact thing this function was meant to identify and prevent).
With this second check in place tests should handle testing individual classes against their expected types using real world data, and validating that all classes conform to the overall schema hopefully removing any chance of issues.

Also previously the above function failed to identify/generate types for some message types re:

This is great thanks, though going over every field for every message type I have found that the following fields are missing from the types you provided:
course, channel_a, channel_b, power, dest1, empty_1, dest_2, empty_2, band_a, band_b, zonesize

Was this issue resolved? I presume it was due to missing some classes when testing or the fact that some classes have split logic.


I decided to use attrs __attrs_post_init__ method to enforce strict types.

Was this runtime check removed? With the above changes to type tests it is redundant and just hurts performance.


This is a bit messy

Imo if the only reason its being turned into a string is for padding then its better to just have the library return the integer and leave it up to the user to zfill, this would potentially improve performance (though likely not noticeably) and give the ability to choose to the user.

If you wanted to still support a user getting the string mmsi out of the library you could add a mmsi_str field as a optional property like rate_of_turn which is only calculated if the user of the lib explicitly access it (I presume/hope?).

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

This is great but actually still will allow for inconsistent types if not paired with another check, as it is checking a test decoded object against its decoding classes types, two classes could have different types for the same named field and this would fail to identify that.

I think that we were talking at cross-purposes all along. 😆

  • I tried to ensure that decoded values always have the data type defined by the respective class (e.g. a mmsi is actually of type str and not int, etc.)

  • You, on the other hand, aimed at the fact that identical fields also have identical data types across all classes

I guess that both concerns are valid and do complement each other. And I guess the only thing left to do is to:

But you need to remove the special handling for the types that were inconsistent and should now be fixed

Which I resolved.

Was this issue resolved? I presume it was due to missing some classes when testing or the fact that some classes have split logic.

Yes, this was resolved.

Was this runtime check removed? With the above changes to type tests it is redundant and just hurts performance.

Yes. But it still was useful, because I identified quite a few attributes that had wrong types. But because these were removed and additional unittests were added, this hook is now obsolete.

Imo if the only reason its being turned into a string is for padding then its better to just have the library return the integer and leave it up to the user to zfill, this would potentially improve performance (though likely not noticeably) and give the ability to choose to the user.

Yep. This is also consistent with other popular encoders. I looked at a few and they all seem to return Integers without leading zeros. Fun fact: Removing the conversion of an integer to a string with leading zeros made the lib ~15% faster when decoding. (83k messages).

from pyais.

Inrixia avatar Inrixia commented on July 25, 2024

Awesome! I suspected that removing the string conversion might speed things up a bit

Please link me the branch once you are done with the changes and I can take a look/run some tests.

And thanks again for all the work on resolving these issues

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

@Inrixia

Here you go: #64

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

This is actually a good question. I do not see any problems. Do you @Inrixia ?

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

@Inrixia Did you have the chance to look at the PR? I would like to merge it, but would appreciate if you take a look at it.

from pyais.

M0r13n avatar M0r13n commented on July 25, 2024

Great to hear!

from pyais.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.