Code Monkey home page Code Monkey logo

haskell-with-utf8's Introduction

with-utf8

Get your IO right on the first try.

Reading files in Haskell is trickier than it could be due to the non-obvious interactions between file encodings and system locale. This library is meant to make it easy once and for all by providing “defaults” that make more sense in the modern world.

See this blog post for more details on why this library needs to exists and an explanation of some of the opinionated decisions it is based on.

Use

See the documentation on Hackage for details, this is a quick summary.

Step 1: Get it

The library is on Hackage, go ahead and add it to the dependencies of your project.

Step 2: Wrap your main

Import withUtf8 from Main.Utf8 and wrap it around your main:

import Main.Utf8 (withUtf8)

main :: IO ()
main = withUtf8 $
  {- ... your main function ... -}

This will make sure that if your program reads something from stdin or outputs something to stdout/stderr, it will not fail with a runtime error due to encoding issues.

Step 3: Read files using UTF-8

If you are going to read a text file (to be precise, if you are going to open a file in text mode), you’ll probably use withFile, openFile, or readFile. Grab the first two from System.IO.Utf8 or the latter from Data.Text.IO.Utf8. Starting from text-2.1, Data.Text.IO.Utf8 is available in the text package itself, hence this module in with-utf8 is now deprecated.

Note: it is best to import these modules qualified.

Note: there is no System.IO.Utf8.readFile because it’s 2024 and you should not read Strings from files.

All these functions will make sure that the content will be treated as if it was encoded in UTF-8.

If, for some reason, you really need to use withFile/openFile from base, or you got your file handle from somewhere else, wrap the code that works with it in a call to withHandle from System.IO.Utf8:

import qualified System.IO as IO
import qualified System.IO.Utf8 as Utf8

doSomethingWithAFile :: IO.Handle -> IO ()
doSomethingWithAFile h = Utf8.withhandle h $ do
    {- ... work with the file ... -}

Step 4: Write files using UTF-8

When writing a file either open it using withFile/openFile from System.IO.Utf8 or write to it directly with writeFile from Data.Text.IO.Utf8. Starting from text-2.1, Data.Text.IO.Utf8 is available in the text package itself, hence this module in with-utf8 is now deprecated.

Note: it is best to import these modules qualified.

Note: there is no System.IO.Utf8.writeFile.

If, for some reason, you really need to use withFile/openFile from base, do the same as in the previous step.

Troubleshooting

Locales are pretty straightforward, but some people might have their terminals misconfigured for various reasons. To help troubleshoot any potential issues, this package comes with a tool called utf8-troubleshoot.

This tool outputs some basic information about locale settings in the OS and what they end up being mapped to in Haskell. If you are looking for help, please, provide the output of this tool, or if you are helping someone, ask them to run this tool and provide the output.

Contributing

If you encounter any issues when using this library or have improvement ideas, please open report in issue on GitHub. You are also very welcome to submit pull request, if you feel like doing so.

License

MPL-2.0 © Serokell

haskell-with-utf8's People

Contributors

erikd avatar gromakovsky avatar heitor-lassarote avatar karandit avatar kirelagin avatar rvem avatar sereja313 avatar serokell-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

f-f amesgen erikd

haskell-with-utf8's Issues

`withUtf8` should also `setFileSystemEncoding`

This "file system encoding" is responsible for decoding command line arguments and environment variables (source). If we make an assumption that stdin is encoded in utf-8, it's fair to assume that command line arguments also have the same encoding.

I have doubts about environment variables, thought.

The issue has arisen in this discussion.

Doing setFileSystemEncoding in withUtf8 initialization seems to be sufficient to fix the problem.

If you think it's a good idea, I can make a PR.

Failed to automatically update flake.lock

I tried updating flake.lock, but failed:

Error during flake update: Input haskell.nix is missing from the flake.lock root nodes. Check spelling or consider using the allow_missing_inputs configuration option.

Compilation error when building the executable with GHC 8.8.4 on Windows

with-utf8                  > Configuring with-utf8-1.0.2.1...
with-utf8                  > build
with-utf8                  > Preprocessing library for with-utf8-1.0.2.1..
with-utf8                  > Building library for with-utf8-1.0.2.1..
with-utf8                  > [1 of 6] Compiling Paths_with_utf8
with-utf8                  > [2 of 6] Compiling System.IO.Utf8.Internal
with-utf8                  > [3 of 6] Compiling System.IO.Utf8
with-utf8                  > [4 of 6] Compiling Main.Utf8
with-utf8                  > [5 of 6] Compiling Data.Text.Lazy.IO.Utf8
with-utf8                  > [6 of 6] Compiling Data.Text.IO.Utf8
with-utf8                  > Preprocessing executable 'utf8-troubleshoot' for with-utf8-1.0.2.1..
with-utf8                  > Building executable 'utf8-troubleshoot' for with-utf8-1.0.2.1..
with-utf8                  > [1 of 2] Compiling Main
with-utf8                  > 
with-utf8                  > app\utf8-troubleshoot\Main.hs:23:31: error:
with-utf8                  >     Module `GHC.IO.Encoding.Iconv' does not export `localeEncodingName'
with-utf8                  >    |
with-utf8                  > 23 | import GHC.IO.Encoding.Iconv (localeEncodingName)
with-utf8                  >    |                               ^^^^^^^^^^^^^^^^^^
with-utf8                  > 

Error from here

As you can see from the log above the library compiles fine, but the executable fails to (even though the base version in the build is supposed to contain the module). I cannot find a straightforward way to exclude the executable from the compilation, so I was wondering if it would be worth splitting it out in a separate package?

`withUtf8` mistakenly protects character on a UTF-8 capable terminal

This was reported by @Heimdell:

In REPL, w/o withUtf8, I get proper
[✓] unidict-1.12.2.2.9.6
[✓] waila-1.8.26
[✓] wailaharvestability-1.1.12
but after using an executable, with main wrapped in withUtf8, I get:
[?] unidict-1.12.2.2.9.6
[?] waila-1.8.26
[?] wailaharvestability-1.1.12

# System
  * OS = linux
  * arch = x86_64
  * compiler = ghc 8.8
  * TERM = xterm-256color
# GHC
  * initLocaleEncoding = ASCII
  * locale encoding = ASCII
  * stdout = Just ASCII
  * stderr = Just ASCII
# Environment
  * LANG = en_US.UTF-8
  * LC_COLLATE is not set
  * LC_CTYPE = en_US.UTF-8
  * LC_MESSAGES is not set
  * LC_MONETARY = ru_RU.UTF-8
  * LC_NUMERIC = ru_RU.UTF-8
  * LC_TIME = ru_RU.UTF-8
  * LC_ALL= is not set
$ localectl list-locales
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
ru_RU.utf8
ru_UA.utf8

accept a UTF-8 BOM for improved compatibility ?

Thanks for with-utf8 !

We would like to use it for hledger, but if I understand correctly it doesn't support a UTF-8 BOM marker. I know this is deprecated (https://stackoverflow.com/questions/2223882/whats-the-difference-between-utf-8-and-utf-8-without-bom, but also https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8), but we aim to just work with random bank data from all over the world, and eg I found BOM support necessary to read Paypal CSV files in 2018. What are your thoughts on this ? Would it be a good improvement, making with-utf8 an easier choice ?

Use utf-8 codepage on Windows

Currently withUtf8 is not adding full utf-8 support on Windows, because we're not changing active codepage, and default one is not supporting utf-8 symbols.

Steps to reprouce:

Create a.hs with

import Main.Utf8
import System.IO.CodePage

main = do
	withUtf8 $ putStrLn "\9920"
	withUtf8 $ withCP65001 $ putStrLn "\9920"
	withCP65001 $ putStrLn "\9920"
	putStrLn "\9920"

and run it in cmd (my machine has Windows 10 installed).
Output is

?
⛀
⛀
a.hs: <stdout>: commitBuffer: invalid argument (invalid character)

so when default codepage is used, symbols are rendered as question marks.
When correct codepage is set, printing works with or without withUtf8
and when neither withCP65001 nor withUtf8 was used, there is error.

Suggestion

We can include withCP65001 to withUtf8. It makes printing user-friendly on Windows and does nothing on other systems.

Support GHC 9.2

New base-4.16.0.0.

Currently blocked by haskell.nix not providing GHC 9.2.1.

Delete Data.Text.IO.Utf8 module

A module with this name has been added to text recently: haskell/text#503. It seems to contain all functions from our module, so most likely we can delete it. Anyway, we need to do something with it because it causes a name collision, which isn't convenient.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.