snoyberg / file-embed Goto Github PK

Use Template Haskell to embed file contents directly.

License: Other

Haskell 99.49% Shell 0.51%

file-embed's Introduction

file-embed

Use Template Haskell to read a file or all the files in a directory, and turn them into (path, bytestring) pairs embedded in your haskell code.

file-embed's People

Contributors

Stargazers

Watchers

file-embed's Issues

Why export `getDir`?

getDir :: FilePath -> IO [(FilePath, ByteString)]

seems like a rather strange export, considering everything else that is exported is in the Q monad, and getDir is listed under Embed at compile time.

embedFile truncates files that contain certain multi-byte Unicode code points on older GHC versions

When I feed a file to embedFile that contains certain Unicode code points (in my case, one of them being the "sprocket" code point, "⚙"), the embedded bytestring ends up truncated on GHC 7.4; on GHC 7.6, everything works as expected. The offending code point is also mangled.

The following is a small example that reproduces the problem:

{-#LANGUAGE TemplateHaskell #-}
{-#LANGUAGE CPP #-}

module Main where

import qualified Data.ByteString.UTF8 as BS8
import qualified Data.ByteString as BS
import Data.Monoid
import Data.FileEmbed

main :: IO ()
main = do
    putStrLn "***** Loaded at runtime with Data.ByteString.readFile: "
    loadStr >>= BS.putStr
    putStrLn "***** Embedded using Data.FileEmbed.embedFile: "
    BS.putStr embeddedStr

embeddedStr :: BS.ByteString
embeddedStr = $(embedFile __FILE__)

loadStr :: IO BS.ByteString
loadStr = BS.readFile __FILE__

-- What follows is some comment to trigger the bug.
-- ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙ ⚙
-- If this line appears in full, then all is fine.

It should print its own source twice, but on my test system (32-bit debian stable, GHC 7.4.1), only the first one is correct, while the second prints a series of â and cuts off before printing the last line.

At a quick glance, I suspect that the use of Data.ByteString.Char8.pack/unpack is the culprit; I think I understand why it is being used here (limitations of older TH implementations), but apparently not all code points survive the round tripping. Maybe the bytestring could be constructed via a [Word8] literal instead?

Failure to embed large files with -O0 on GHC 9.4, Mac OS X

Hi, I've found that when I try to build file-embed using GHC 9.4 with -O0 compilation fails when using a file > ~20MB in size. I get this error from the compiler:

$  cabal build test
Resolving dependencies...
Build profile: -w ghc-9.4.4 -O0
In order, the following will be built (use -v for more details):
 - file-embed-0.0.15.0 (lib) (first run)
 - file-embed-0.0.15.0 (test:test) (first run)
Configuring library for file-embed-0.0.15.0..
Preprocessing library for file-embed-0.0.15.0..
Building library for file-embed-0.0.15.0..
[1 of 1] Compiling Data.FileEmbed   ( Data/FileEmbed.hs, /Users/csasarak/file-embed/dist-newstyle/build/x86_64-osx/ghc-9.4.4/file-embed-0.0.15.0/noopt/build/Data/FileEmbed.o, /Users/csasarak/file-embed/dist-newstyle/build/x86_64-osx/ghc-9.4.4/file-embed-0.0.15.0/noopt/build/Data/FileEmbed.dyn_o )

Data/FileEmbed.hs:70:1: warning: [-Wunused-imports]
    The import of ‘Control.Applicative’ is redundant
      except perhaps to import instances from ‘Control.Applicative’
    To import instances alone, use: import Control.Applicative()
   |
70 | import Control.Applicative ((<$>))
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ld: warning: -undefined dynamic_lookup may not work with chained fixups
Configuring test suite 'test' for file-embed-0.0.15.0..
Preprocessing test suite 'test' for file-embed-0.0.15.0..
Building test suite 'test' for file-embed-0.0.15.0..
[1 of 1] Compiling Main             ( test/main.hs, /Users/csasarak/file-embed/dist-newstyle/build/x86_64-osx/ghc-9.4.4/file-embed-0.0.15.0/t/test/noopt/build/test/test-tmp/Main.o, /Users/csasarak/file-embed/dist-newstyle/build/x86_64-osx/ghc-9.4.4/file-embed-0.0.15.0/t/test/noopt/build/test/test-tmp/Main.dyn_o )
ghc-9.4.4: Stack space overflow: current size 33624 bytes.
ghc-9.4.4: Use `+RTS -Ksize -RTS' to increase it.
Error: cabal: Failed to build test:test from file-embed-0.0.15.0.

I've made a sample using file-embed's tests to demonstrate this behavior here that you can try yourself.

Check out the branch here
Find (or make: dd if=/dev/zero of=./zero_data bs=1024 count=20000) a file ~20MB or greater.
Provide an absolute path to the file from step 2 here.
Run cabal build test

Altering the cabal.project file to have optimization: 1 (or no optimization directive at all) and running cabal build test will resolve this problem.

Other things I've checked:

I added traces in the code and found that it is indeed using bytesPrimL from #36 in bsToExp.
Compiling with -O0 and -ffull-laziness is also successful.

This may be a bug in the compiler, but I wanted to ask here. Thank you for maintaining this package. Please let me know if there's any additional info I can provide.

Best wishes,
-Chris

linker string sharing causes corruption

more details in IntersectMBO/cardano-node#5223

https://i.imgur.com/xr5JQkW.png

ghc found a "00" in some unrelated code
and the "0000000000000000000000000000000000000000" in the dummySpace
and then it shared the last 3 bytes (the 00 and terminating null) between both strings

embedding a git rev within the binary, then changed the "00" in unrelated code

embedDir does not work

On my machine:

bitonic@stringer /tmp % cat foo.hs
{-# LANGUAGE TemplateHaskell #-}

import Data.ByteString
import Data.FileEmbed

myDir :: [(FilePath, Data.ByteString.ByteString)]
myDir = $(embedDir "static")
bitonic@stringer /tmp % mkdir static
bitonic@stringer /tmp % cat > static/blah
quux
bitonic@stringer /tmp % ghci foo.hs
GHCi, version 7.4.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 1] Compiling Main             ( foo.hs, interpreted )
Loading package filepath-1.3.0.0 ... linking ... done.
Loading package old-locale-1.0.0.4 ... linking ... done.
Loading package old-time-1.1.0.0 ... linking ... done.
Loading package bytestring-0.9.2.1 ... linking ... done.
Loading package unix-2.5.1.0 ... linking ... done.
Loading package directory-1.1.0.2 ... linking ... done.
Loading package array-0.4.0.0 ... linking ... done.
Loading package deepseq-1.3.0.0 ... linking ... done.
Loading package containers-0.4.2.1 ... linking ... done.
Loading package pretty-1.1.1.0 ... linking ... done.
Loading package template-haskell ... linking ... done.
Loading package file-embed-0.0.4.2 ... linking ... done.
*** Exception: blah: getFileStatus: does not exist (No such file or directory)
λ>

I'm on an Ubuntu 12.04, 64bit.

Compiling with ghc yields the same result.
embedFile and getDir work fine, which puzzles me. Maybe the path when running ghci/ghc is different?

The docs on makeRelativeToProject seem to be broken

The haddock isn't showing up for some reason.

(file-embed version 0.0.11.1)

`template-haskell-2.18.0.0` has it's own `makeRelativeToProject`

Building shakespeare fails with this error:

Text/Shakespeare/Base.hs:288:9: error:
    Ambiguous occurrence ‘makeRelativeToProject’
    It could refer to
       either ‘Language.Haskell.TH.Syntax.makeRelativeToProject’,
              imported from ‘Language.Haskell.TH.Syntax’ at Text/Shakespeare/Base.hs:30:1-33
           or ‘Data.FileEmbed.makeRelativeToProject’,
              imported from ‘Data.FileEmbed’ at Text/Shakespeare/Base.hs:33:24-44
    |
288 |   fp <- makeRelativeToProject rawFp
    |         ^^^^^^^^^^^^^^^^^^^^^

Text/Shakespeare/Base.hs:297:9: error:
    Ambiguous occurrence ‘makeRelativeToProject’
    It could refer to
       either ‘Language.Haskell.TH.Syntax.makeRelativeToProject’,
              imported from ‘Language.Haskell.TH.Syntax’ at Text/Shakespeare/Base.hs:30:1-33
           or ‘Data.FileEmbed.makeRelativeToProject’,
              imported from ‘Data.FileEmbed’ at Text/Shakespeare/Base.hs:33:24-44
    |
297 |   fp <- makeRelativeToProject rawFp
    |         ^^^^^^^^^^^^^^^^^^^^^

Naturally, the implementation is different - in template-haskell-2.18, this is relying on a new method on the Q class, which acccepts the package-root flag.

Details:

There are a few extra flags which have been introduced to make working with multiple
units easier.

.. ghc-flag:: -working-dir ⟨dir⟩
    :shortdesc: Specify the directory a unit is expected to be compiled in.
    :type: dynamic
    :category:

    It is common to assume that a package is compiled in the directory where its
    cabal file resides. Thus, all paths used in the compiler are assumed to be relative
    to this directory. When there are multiple home units the compiler is often
    not operating in the standard directory and instead where the cabal.project
    file is located. In this case the `-working-dir` option can be passed which specifies
    the path from the current directory to the directory the unit assumes to be it's root,
    normally the directory which contains the cabal file.

    When the flag is passed, any relative paths used by the compiler are offset
    by the working directory. Notably this includes `-i`:ghc-flag: and `-I⟨dir⟩`:ghc-flag: flags.


    This option can also be queried by the ``getPackageRoot`` Template Haskell
    function. It is intended to be used with helper functions such as ``makeRelativeToProject``
    which make relative filepaths relative to the compilation directory rather than
    the directory which contains the .cabal file.

Implementation:

-- | Get the package root for the current package which is being compiled.
-- This can be set explicitly with the -package-root flag but is normally
-- just the current working directory.
--
-- The motivation for this flag is to provide a principled means to remove the
-- assumption from splices that they will be executed in the directory where the
-- cabal file resides. Projects such as haskell-language-server can't and don't
-- change directory when compiling files but instead set the -package-root flag
-- appropiately.
getPackageRoot :: Q FilePath
getPackageRoot = Q qGetPackageRoot

-- | The input is a filepath, which if relative is offset by the package root.
makeRelativeToProject :: FilePath -> Q FilePath
makeRelativeToProject fp | isRelative fp = do
  root <- getPackageRoot
  return (root </> fp)
makeRelativeToProject fp = return fp

Memory usage during compilation

When I use either this package or wai-app-static to embed a file around 25MB in size (an executable to be served over http), I see that compilation takes over 3-4 GB of RAM and sometimes crashes my laptop.

Is this expected and can anything be done to make it more efficient? It seems that such high memory usage should not actually be needed during compliation.

Does `$(embedDir "someDir")` produce its result lazily?

There are various language-agnostic test suites for things like JSON floating around the internet containing lots of test .json files, and I often want to wrap them up into libraries. This could be a problem though if the test suite is very large. If I use $(embedDir "someDir") in my code, will the resulting list be produced lazily?

Encoding-specific variants?

Using file-embed with non-ASCII files is rather fragile, as it depends on the build system's locale, and it doesn't work at all when building e.g. in a plain docker container (simonmichael/hledger#420 (comment)). In hledger, we've solved this by setting the handle encoding to UTF-8 prior to reading from it, which solved our problem.

I'm actually surprised that I haven't any issues here regarding this - maybe building in a pure Nix shell or a plain Docker container is not such a common scenario? Nevertheless, I think that extending the API with variants of the existing functions like embedFile' :: TextEncoding -> FilePath -> Q Exp and others would be a useful change not only for the hledger project.

What do you think?

Have dummySpace / inject / injectFile take an additional magic string

It looks to me like an executable can only have one injected bytestring, because the magic string is always the same for dummySpace.

I can make a pull request adding this, if you want me to!

Document that file paths need to be added to extra-source-files list to get recompilation detection in cabal projects

I opened a PR but github ate it and my tabs are gone, sorry

GHC 8.10 updates

Notes from testing with 8.10.1-alpha2 (https://gist.github.com/DanBurton/277f5fd807711816a3422f03a9fa2723):

/Users/simon/src/file-embed/Data/FileEmbed.hs:146:21: error:
    • Couldn't match expected type ‘Maybe Exp’ with actual type ‘Exp’
    • In the expression: LitE $ StringL path
      In the first argument of ‘TupE’, namely
        ‘[LitE $ StringL path, exp']’
      In the second argument of ‘($!)’, namely
        ‘TupE [LitE $ StringL path, exp']’
    |
146 |     return $! TupE [LitE $ StringL path, exp']
    |                     ^^^^^^^^^^^^^^^^^^^

/Users/simon/src/file-embed/Data/FileEmbed.hs:146:42: error:
    • Couldn't match expected type ‘Maybe Exp’ with actual type ‘Exp’
    • In the expression: exp'
      In the first argument of ‘TupE’, namely
        ‘[LitE $ StringL path, exp']’
      In the second argument of ‘($!)’, namely
        ‘TupE [LitE $ StringL path, exp']’
    |
146 |     return $! TupE [LitE $ StringL path, exp']
    |                                          ^^^^

embedTextFile

Could we add such function, I can make a PR?

import qualified Data.Text

myFile :: Data.Text.Text
myFile = $(embedTextFile "dirName/fileName")

Documentation for injection?

I was mystified by the injection utilities in this package. Maybe there could be more documentation about what they are used for?

Is the summary accurate?

The summary is:

"Use Template Haskell to read a file or all the files in a directory, and turn them into (path, text) pairs embedded in your haskell code."

But it seems like "(path, bytestring)" would be more accurate than "(path, text)". It's almost too small of a thing to bring up, but I thought it was worthwhile since it's the first thing people see on Hackage.

add function: maybeEmbedFile

Would be useful to me to have it available.

I'm unfamiliar with template-haskell. If embedFile is (runIO $ B.readFile fp) >>= bsToExp. What would this function look like? I don't know how to embed Maybe as Q Exp.

Thanks

License does not match cabal file

The license file appears to be a BSD 2-clause license, but the file-embed.cabal lists the license as BSD3.

file-embed makes building more fragile ?

I love file-embed, but I'm not sure how to avoid file path-related breakage if I use it. I've embedded some docs in hledger. I'm seeing two problems:

I can no longer run ghci or stack ghci from the top of a multi-package project, eg stack ghci PKG; I have to do cd PKG; stack ghci PKG. This is not the end of the world, but I wonder if it can be avoided.
The travis build breaks: https://travis-ci.org/simonmichael/hledger/builds/124363146 . The command (stack +RTS -N2 -RTS build --test --haddock) works when I run it locally. I'm not sure what's going wrong here.

Adding Template Haskell functions to transform a list of tuples into filenames

Hi Michael,
I'm trying to talk a GUI framework maintainer into using file-embed, and I realize that if we built utilities into the file itself, we'd need a template-haskell dependency, so it might be better doing it at the upstream.

Here, I'd like to add a few TH definitions to transform a list of tuples (should I use strict-tuples? Well, it's only likely to memory-leak at the TH compile stage, so probably not worth the dependency) into separate name and asset definitions. Given framework goals (simple to use), it'd be more stable than embed dir by requiring the user to specify both the filenames and the target definition.

Not sure if that'd be okay with you.

Regards,

Liam

Does embedding result in more memory usage?

When embedding a file it increases the binary size, obviously. But is it always loaded into memory? And can it get GC'd (I think this would be hard as I define it as top-level symbol)?

I tried figuring this out by simply embedding a big file, but it did not yield any obvious results... Measuring mem usage in Linux is not as straight forward as it might seem.

snoyberg / file-embed Goto Github PK

file-embed's Introduction

file-embed

file-embed's People

Contributors

Stargazers

Watchers

Forkers

file-embed's Issues

Recommend Projects

Recommend Topics

Recommend Org