byteverse / country Goto Github PK
View Code? Open in Web Editor NEWHaskell data types and functions for countries
License: Other
Haskell data types and functions for countries
License: Other
The functions encodeEnglish
and decode
form a Prism'
https://hackage.haskell.org/package/lens-4.15.1/docs/src/Control.Lens.Prism.html#prism%27
GHC 9.8 uses this version of deepseq, so it would be nice if country
accepted it.
This is a little tricky because ISO doesn't explicitly spell out what the continents are, but the model most people are familiar with hold that there are seven:
Unlike countries, there are no numbers for continents. However, it is beneficial to be able to refer to them (internally) as a number, and a Word8
is sufficient for this. They can be assigned number from 0 to 6 arbitrary as an implementation detail available through an Unsafe
module.
A function that resolves a country to the continent on which it is located would be useful. That is the chief aim of this addition.
Also, functions for the English names of continents and for continent codes would be useful as well.
We had an issue in some code we wrote where we have a test to confirm that an invalid ISO-3166-1 numeric country code doesn't decode to a country.
That test passed and continued to, until in one instance it didn't. I wasn't able to reproduce this, but spent some time reading the code and I think the issue lies in how the ByteArray
s used for country codes are allocated.
The country
library uses the encapsulated ByteArray
technique in a few places, and it generally looks like this:
https://github.com/andrewthad/country/blob/573499b9368bd86775cbd51d773cfd60622084e9/country/src/Country/Unexposed/Names.hs#L85-L90
ByteArray
runST
The clue comes in the numeric case:
https://github.com/andrewthad/country/blob/573499b9368bd86775cbd51d773cfd60622084e9/country/src/Country/Unexposed/Names.hs#L239-L242
We create this clear
function to set a given indexed byte to 0. But then we only call it on index 0. The only indexes we touch after allocating the ByteArray
are 0 and those corresponding to valid numeric country codes.
But, Data.Primitive.ByteArray.newByteArray
doesn't initialise the array it gives back.
So when we look up an invalid country code between 1 and 999 in the ByteArray
, we get some arbitrary uninitialised byte, which has a small chance of being 1, and therefore giving back an invalid Country
value.
Anyway, if this sounds correct I'm happy to open a PR to fix!
Instead of the attoparsec
library, consider using the parsers
library which generalises it, and leaves the implementation as a decision for the end user. The attoparsec
library is an instance of the type-classes within parsers
.
Hi, yesterday I had a bug where I used congo
from Country.Identifier
rather than congoDemocraticRepublicOfThe
. I assumed these were aliases, since people call The Democratic Republic of Congo "Congo" sometimes, but actually congo
is the Republic of Congo.
Would you be open to a PR that distinguishes the two in Country.Identifier
with a comment? Looking at the code, it looks like if you're open to this, the best way to do it would be to add a comment
column to the CSV, and pull the comment out of that if present.
I haven't considered all the countries, but some other countries that might benefit from comments could be North/South Korea (koreaDemocraticPeoplesRepublicOf
/koreaRepublicOf
), laoPeoplesDemocraticRepublic
(Laos), holySee
(Rome, The Vatican, Vatican City).
Another way this might be implemented is to pull in the data from aliases.txt
to generate comments for each country. Something like:
-- | Suriname, Dutch Guiana, Netherlands Guiana, Republic of Suriname, Republiek Suriname, Surinam
suriname :: Country
This would be more verbose (e.g. Russia has 23 aliases), but might be helpful for people who don't use the English/American names I'm familiar with.
Thoughts?
@andrewthad can you do this? alternatively, if we could figure out a way for me to do it. I don't mind maintaining any CI in whatever project.
How does this project compare to https://hackage.haskell.org/package/iso3166-country-codes?
Functions such as alphaTwoUpper
and alphaThreeUpper
use a Text
which guarantees length=2. Consider instead using a data type: data TwoChars = TwoChars Char Char
from which there is a function to/from a Text
. That pair of functions will form a Prism'
. This safety helps library use.
It'd nice to have wrapper types whose ToJSON instance for a country gives things like "EE" or "EST`
Instead of the two functions encodeNumeric
and decodeNumeric
consider instead a value of the type Prism' Country Word16
The prism'
construction function achieves this.
https://hackage.haskell.org/package/lens-4.15.1/docs/src/Control.Lens.Prism.html#prism%27
There are guarantees that the Word16
is less than 1000
. This can be represented in the type system as three digits.
https://hackage.haskell.org/package/digit
Once that is safely represented, then functions to/from Word16
and other number types can be implemented.
No idea what this error really means, tbh, but trying to build under 8.4 I get:
$ stack build
country-0.1.4: configure (lib)
streaming-0.2.1.0: download
country-0.1.4: build (lib)
colonnade-1.2.0: configure
colonnade-1.2.0: build
disjoint-containers-0.2.3: configure
disjoint-containers-0.2.3: build
streaming-0.2.1.0: configure
streaming-0.2.1.0: build
disjoint-containers-0.2.3: copy/register
colonnade-1.2.0: copy/register
streaming-0.2.1.0: copy/register
Log files have been written to: /home/kb/workspace/country/.stack-work/logs/
Progress: 4/7
-- While building custom Setup.hs for package country-0.1.4 using:
/home/kb/.stack/setup-exe-cache/x86_64-linux-nopie/Cabal-simple_mPHDZzAJ_2.2.0.0_ghc-8.4.1 --builddir=.stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.0 build lib:country --ghc-options " -ddump-hi -ddump-to-file -fdiagnostics-color=always"
Process exited with code: ExitFailure 1
Logs have been written to: /home/kb/workspace/country/.stack-work/logs/country-0.1.4.log
Configuring country-0.1.4...
Preprocessing library for country-0.1.4..
Building library for country-0.1.4..
[1 of 8] Compiling Country.Unexposed.Alias ( src/Country/Unexposed/Alias.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.0/build/Country/Unexposed/Alias.o )
[2 of 8] Compiling Country.Unexposed.Encode.English ( src/Country/Unexposed/Encode/English.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.0/build/Country/Unexposed/Encode/English.o )
[3 of 8] Compiling Country.Unexposed.Names ( src/Country/Unexposed/Names.hs, .stack-work/dist/x86_64-linux-nopie/Cabal-2.2.0.0/build/Country/Unexposed/Names.o )
/home/kb/workspace/country/country/src/Country/Unexposed/Names.hs:108:20: error:
Illegal kind: ((:) (GHC.Types.TupleRep ([] :: [] GHC.Types.RuntimeRep)) ((:) GHC.Types.LiftedRep ([] :: [] GHC.Types.RuntimeRep) :: [] GHC.Types.RuntimeRep) :: [] GHC.Types.RuntimeRep)
Did you mean to enable TypeInType?
|
108 | deriving (Eq,Ord,Prim,Hashable,Storable)
| ^^^^
/home/kb/workspace/country/country/src/Country/Unexposed/Names.hs:108:20: error:
Illegal kind: ([] :: [] GHC.Types.RuntimeRep)
Did you mean to enable TypeInType?
|
108 | deriving (Eq,Ord,Prim,Hashable,Storable)
| ^^^^
/home/kb/workspace/country/country/src/Country/Unexposed/Names.hs:108:20: error:
Illegal kind: ((:) GHC.Types.LiftedRep ([] :: [] GHC.Types.RuntimeRep) :: [] GHC.Types.RuntimeRep)
Did you mean to enable TypeInType?
|
108 | deriving (Eq,Ord,Prim,Hashable,Storable)
| ^^^^
/home/kb/workspace/country/country/src/Country/Unexposed/Names.hs:108:20: error:
Illegal kind: ([] :: [] GHC.Types.RuntimeRep)
Did you mean to enable TypeInType?
|
108 | deriving (Eq,Ord,Prim,Hashable,Storable)
| ^^^^
...
The offending line is https://github.com/andrewthad/country/blob/master/country/src/Country/Subdivision.hs#L83 which fails to parse with Haddock, causing the Haddock build to fail.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.