shmsw25 / factscore Goto Github PK

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

Home Page: https://arxiv.org/abs/2305.14251

License: MIT License

Python 100.00%

emnlp2023 evaluation factuality language language-modeling

factscore's People

Contributors

Stargazers

Watchers

factscore's Issues

Some cases couldn't be found in the default db

assert results is not None and len(results)==1, f"topic in your data ({title}) is likely to be not a valid title in the DB."

It seems that Francisco Urroz (unlabeled) will cause the error. I use the default file enwiki-20230401.db and manually query the db about this topic. But it returns nothing.

Scoring existing generated text

Hi, thanks a lot for sharing your work.

I am wondering if FActScore can be used for scoring any existing generated text. For example, for a QA task where the model generates an answer along with some reasoning (CoT like), would it be possible to give a 'factuality score' to the generation? Note that I am talking about the case where we have an existing KB that can be used to fact-check (e.g. Wikipedia), but we don't know exactly which article(s) would be relevant for it (unlike in your example where you generate biographies and therefore you already know which is the relevant entity to search for in the KB).

I am looking for a way to get a 'factuality score' for a piece of text, which I can then use as a feature for a separate ML task. Would be grateful for any pointer/suggestions.

Sof

edit: clarified the question

How to modify the code to calculate factscore between any two paragraphs/sentences without retrieval steps?

I have a golden paragraph that doesn't need to be retrieved from any source.
How to bypass all unnecessary retrieval steps to calculate the FactScore for a predicted sentence..

Thank you in advance.

MAX_LENGTH is not defined in retrieval

factscore/retrieval.py line 70, in build_db
max_length = MAX_LENGTH - len(passages[-1])

The chatgpt-based AtomicFactGenerator of FactScore sometimes abstain from answering

Dear authors,

Firstly, I would like to express my gratitude for your valuable work.

I recently utilized your tool to assess the factual precision of bios generated by my LLMs. However, I have observed that when the generated bio includes toxic content, the AtomicFactGenerator based on ChatGPT consistently declines to break them down into atomic facts. I was wondering if you have any suggestions or ideas to address this issue.

Thank you once again for your remarkable contribution.

Maintaining core factscore in a new repository

I have created a Python package for extracting facts from texts and scoring them based on knowledge sources. This package is maintained and up-to-date with the latest OpenAI APIs. You can access the package in the following links:

https://pypi.org/project/FactScoreLite/
https://github.com/armingh2000/FactScoreLite

If anyone wants to improve/fix/change any functionalities, feel free to contribute.

sqlite3.DatabaseError: file is not a database error after Installation

Hello

I am simply leaving this issue for others who might run into the same trouble as me.
When downloading the dataset, the enwiki-20230401.db file was downloaded but contained the HTML below instead.

This HTML file is obviously not a valid SQLite database, so this results in the error: sqlite3.DatabaseError: file is not a database when the DocDB class in factscore/retrieval.py is instantiated.

 <!DOCTYPE html><html><head><title>Google Drive - Virus scan warning</title><meta http-equiv="content-type" content="text/html; charset=utf-8"/><style nonce="gsdNE0FBuGc2QHXrVczymg">.goog-link-button{position:relative;color:#15c;text-dec
oration:underline;cursor:pointer}.goog-link-button-disabled{color:#ccc;text-decoration:none;cursor:default}body{color:#222;font:normal 13px/1.4 arial,sans-serif;margin:0}.grecaptcha-badge{visibility:hidden}.uc-main{padding-top:50px;text
-align:center}#uc-dl-icon{display:inline-block;margin-top:16px;padding-right:1em;vertical-align:top}#uc-text{display:inline-block;max-width:68ex;text-align:left}.uc-error-caption,.uc-warning-caption{color:#222;font-size:16px}#uc-downloa
d-link{text-decoration:none}.uc-name-size a{color:#15c;text-decoration:none}.uc-name-size a:visited{color:#61c;text-decoration:none}.uc-name-size a:active{color:#d14836;text-decoration:none}.uc-footer{color:#777;font-size:11px;padding-b
ottom:5ex;padding-top:5ex;text-align:center}.uc-footer a{color:#15c}.uc-footer a:visited{color:#61c}.uc-footer a:active{color:#d14836}.uc-footer-divider{color:#ccc;width:100%}.goog-inline-block{position:relative;display:-moz-inline-box;
display:inline-block}* html .goog-inline-block{display:inline}*:first-child+html .goog-inline-block{display:inline}sentinel{}</style><link rel="icon" href="//ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png"/></head><body><div 
class="uc-main"><div id="uc-dl-icon" class="image-container"><div class="drive-sprite-aux-download-file"></div></div><div id="uc-text"><p class="uc-warning-caption">Google Drive can't scan this file for viruses.</p><p class="uc-warning-
subcaption"><span class="uc-name-size"><a href="/open?id=1mekls6OGOKLmt7gYtHs0WGf5oTamTNat">enwiki-20230401.db</a> (20G)</span> is too large for Google to scan for viruses. Would you still like to download this file?</p><form id="downlo
ad-form" action="https://drive.usercontent.google.com/download" method="get"><input type="submit" id="uc-download-link" class="goog-inline-block jfk-button jfk-button-action" value="Download anyway"/><input type="hidden" name="id" value
="1mekls6OGOKLmt7gYtHs0WGf5oTamTNat"><input type="hidden" name="export" value="download"><input type="hidden" name="confirm" value="t"><input type="hidden" name="uuid" value="2a9c3286-763e-4367-9f8a-7030271416f4"></form></div></div><div
 class="uc-footer"><hr class="uc-footer-divider"></div></body></html>

This is the Google Drive Warning site, telling us that it couldn't scan the file for viruses.

Workaround

Simply download the file from the author's link and replace the wrong file with it.

Link: https://drive.google.com/file/d/1mekls6OGOKLmt7gYtHs0WGf5oTamTNat/view
Replace enwiki-20230401.db whereever you ran the installation. It is located in {INSTALL_FOLDER}/.cache/factscore/.

Reproducing this bug

I ran into this issue while trying to use FactScore on a Debian 12 machine with python 3.9.18.
I simply followed the instructions for installation:

pip install --upgrade factscore
python -m spacy download en_core_web_sm

python -m factscore.download_data

Issue running get_score() on M1 Mac due to Torch build

I'm having trouble running get_score() on an M1 Macbook Pro.

I tried toggling instances in the source code from "cuda" to "mps" for the functions loaded locally but haven't had any luck with the following error:

    [222](file:///.../lib/python3.10/site-packages/torch/cuda/__init__.py?line=221) if _cudart is None:
    [223](file:///...env/lib/python3.10/site-packages/torch/cuda/__init__.py?line=222)     raise AssertionError(
    [224](file:///...env/lib/python3.10/site-packages/torch/cuda/__init__.py?line=223)         "libcudart functions unavailable. It looks like you have a broken build?")

AssertionError: Torch not compiled with CUDA enabled

Do you have any suggestions?

I have verified MPS support. My environment follows:

# Name                    Version                   Build  Channel
aiohttp                   3.9.1                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
annotated-types           0.6.0                    pypi_0    pypi
anyio                     3.7.1                    pypi_0    pypi
appnope                   0.1.3              pyhd8ed1ab_0    conda-forge
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py310h1a28f6b_0  
arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
async-lru                 2.0.4           py310hca03da5_0  
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.1.0          py310hca03da5_0  
babel                     2.13.1             pyhd8ed1ab_0    conda-forge
beautifulsoup4            4.12.2          py310hca03da5_0  
bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
blis                      0.7.11                   pypi_0    pypi
brotli-python             1.0.9           py310hc377ac9_7  
bzip2                     1.0.8                h620ffc9_4  
ca-certificates           2023.11.17           hf0a4a13_0    conda-forge
cached-property           1.5.2                      py_0  
catalogue                 2.0.10                   pypi_0    pypi
certifi                   2023.11.17      py310hca03da5_0  
cffi                      1.16.0          py310h80987f9_0  
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
click                     8.1.7                    pypi_0    pypi
cloudpathlib              0.16.0                   pypi_0    pypi
comm                      0.1.4              pyhd8ed1ab_0    conda-forge
confection                0.1.4                    pypi_0    pypi
cryptography              41.0.3          py310hd4332d6_0  
cymem                     2.0.8                    pypi_0    pypi
cyrus-sasl                2.1.28               h9131b1a_1  
dataclasses-json          0.6.3                    pypi_0    pypi
debugpy                   1.6.7           py310h313beb8_0  
decorator                 5.1.1              pyhd3eb1b0_0  
defusedxml                0.7.1              pyhd3eb1b0_0  
en-core-web-sm            3.7.1                    pypi_0    pypi
entrypoints               0.4             py310hca03da5_0  
exceptiongroup            1.2.0              pyhd8ed1ab_0    conda-forge
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
factscore                 0.2.0                    pypi_0    pypi
fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
frozenlist                1.4.0                    pypi_0    pypi
fsspec                    2023.10.0                pypi_0    pypi
gettext                   0.21.1               h0186832_0    conda-forge
glib                      2.78.1               h9e231a4_1    conda-forge
glib-tools                2.78.1               h9e231a4_1    conda-forge
gst-plugins-base          1.14.1               h313beb8_1  
gstreamer                 1.14.1               h80987f9_1  
huggingface-hub           0.19.4                   pypi_0    pypi
icu                       73.2                 hc8870d7_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
importlib-metadata        6.9.0              pyha770c72_0    conda-forge
importlib_metadata        6.9.0                hd8ed1ab_0    conda-forge
importlib_resources       6.1.1              pyhd8ed1ab_0    conda-forge
ipykernel                 6.26.0             pyh3cd1d5f_0    conda-forge
ipython                   8.18.1             pyh31011fe_1    conda-forge
ipywidgets                8.1.1              pyhd8ed1ab_0    conda-forge
isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2           py310hca03da5_0  
joblib                    1.3.2                    pypi_0    pypi
jpeg                      9e                   h80987f9_1  
json5                     0.9.14             pyhd8ed1ab_0    conda-forge
jsonpatch                 1.33                     pypi_0    pypi
jsonpointer               2.1                pyhd3eb1b0_0  
jsonschema                4.20.0             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.11.2          pyhd8ed1ab_0    conda-forge
jsonschema-with-format-nongpl 4.20.0             pyhd8ed1ab_0    conda-forge
jupyter                   1.0.0           py310hca03da5_8  
jupyter-lsp               2.2.1              pyhd8ed1ab_0    conda-forge
jupyter_client            8.6.0           py310hca03da5_0  
jupyter_console           6.6.3           py310hca03da5_0  
jupyter_core              5.5.0           py310hca03da5_0  
jupyter_events            0.9.0              pyhd8ed1ab_0    conda-forge
jupyter_server            2.11.1             pyhd8ed1ab_0    conda-forge
jupyter_server_terminals  0.4.4           py310hca03da5_1  
jupyterlab                4.0.9              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.3.0              pyhd8ed1ab_0    conda-forge
jupyterlab_server         2.25.2             pyhd8ed1ab_0    conda-forge
jupyterlab_widgets        3.0.9           py310hca03da5_0  
krb5                      1.20.1               hf3e1bf2_1  
langchain                 0.0.344                  pypi_0    pypi
langchain-core            0.0.8                    pypi_0    pypi
langcodes                 3.3.0                    pypi_0    pypi
langsmith                 0.0.68                   pypi_0    pypi
libclang                  14.0.6          default_h1b80db6_1  
libclang13                14.0.6          default_h24352ff_1  
libcxx                    16.0.6               h4653b0c_0    conda-forge
libedit                   3.1.20221030         h80987f9_0  
libffi                    3.4.4                hca03da5_0  
libglib                   2.78.1               hb438215_1    conda-forge
libiconv                  1.17                 he4db4b2_0    conda-forge
libllvm14                 14.0.6               h7ec7a93_3  
libpng                    1.6.39               h80987f9_0  
libpq                     12.15                h02f6b3c_1  
libsodium                 1.0.18               h1a28f6b_0  
libsqlite                 3.44.2               h091b4b1_0    conda-forge
libzlib                   1.2.13               h53f4e23_5    conda-forge
lz4-c                     1.9.4                h313beb8_0  
markupsafe                2.1.1           py310h1a28f6b_0  
marshmallow               3.20.1                   pypi_0    pypi
matplotlib-inline         0.1.6           py310hca03da5_0  
mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
multidict                 6.0.4                    pypi_0    pypi
murmurhash                1.0.10                   pypi_0    pypi
mypy-extensions           1.0.0                    pypi_0    pypi
mysql                     5.7.24               ha71a6ea_2  
nbclient                  0.8.0           py310hca03da5_0  
nbconvert                 7.11.0             pyhd8ed1ab_0    conda-forge
nbconvert-core            7.11.0             pyhd8ed1ab_0    conda-forge
nbconvert-pandoc          7.11.0             pyhd8ed1ab_0    conda-forge
nbformat                  5.9.2           py310hca03da5_0  
ncurses                   6.4                  h313beb8_0  
nest-asyncio              1.5.8              pyhd8ed1ab_0    conda-forge
nltk                      3.8.1                    pypi_0    pypi
notebook                  7.0.6           py310hca03da5_0  
notebook-shim             0.2.3           py310hca03da5_0  
numpy                     1.26.2                   pypi_0    pypi
openai                    0.27.10                  pypi_0    pypi
openssl                   3.2.0                h0d3ecfb_1    conda-forge
overrides                 7.4.0           py310hca03da5_0  
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pandoc                    3.1.3                hce30654_0    conda-forge
pandocfilters             1.5.0              pyhd3eb1b0_0  
parso                     0.8.3              pyhd3eb1b0_0  
pcre2                     10.42                hb066dcc_0  
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pillow                    10.1.0                   pypi_0    pypi
pip                       23.3.1          py310hca03da5_0  
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.0.0              pyhd8ed1ab_0    conda-forge
ply                       3.11            py310hca03da5_0  
preshed                   3.0.9                    pypi_0    pypi
prometheus_client         0.19.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.41             pyha770c72_0    conda-forge
prompt_toolkit            3.0.41               hd8ed1ab_0    conda-forge
psutil                    5.9.0           py310h1a28f6b_0  
ptyprocess                0.7.0              pyhd3eb1b0_2  
pure_eval                 0.2.2              pyhd3eb1b0_0  
pycparser                 2.21               pyhd3eb1b0_0  
pydantic                  2.5.2                    pypi_0    pypi
pydantic-core             2.14.5                   pypi_0    pypi
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pyopenssl                 23.2.0          py310hca03da5_0  
pyqt                      5.15.10         py310h313beb8_0  
pyqt5-sip                 12.13.0         py310h80987f9_0  
pysocks                   1.7.1           py310hca03da5_0  
pysqlite-binary           0.5.1.3380300            pypi_0    pypi
python                    3.10.13              hb885b13_0  
python-dateutil           2.8.2              pyhd3eb1b0_0  
python-dotenv             1.0.0              pyhd8ed1ab_1    conda-forge
python-fastjsonschema     2.19.0             pyhd8ed1ab_0    conda-forge
python-json-logger        2.0.7           py310hca03da5_0  
pytorch                   1.13.1                 py3.10_0    pytorch
pytz                      2023.3.post1    py310hca03da5_0  
pyyaml                    6.0.1           py310h80987f9_0  
pyzmq                     25.1.0          py310h313beb8_0  
qt-main                   5.15.2              h0917680_10  
qtconsole                 5.5.1              pyhd8ed1ab_0    conda-forge
qtconsole-base            5.5.1              pyha770c72_0    conda-forge
qtpy                      2.4.1           py310hca03da5_0  
rank-bm25                 0.2.2                    pypi_0    pypi
readline                  8.2                  h1a28f6b_0  
referencing               0.31.1             pyhd8ed1ab_0    conda-forge
regex                     2023.10.3                pypi_0    pypi
requests                  2.31.0          py310hca03da5_0  
rfc3339-validator         0.1.4           py310hca03da5_0  
rfc3986-validator         0.1.1           py310hca03da5_0  
rpds-py                   0.10.6          py310hf0e4da2_0  
safetensors               0.4.1                    pypi_0    pypi
scikit-learn              1.3.2                    pypi_0    pypi
scipy                     1.11.4                   pypi_0    pypi
send2trash                1.8.2           py310hca03da5_0  
sentence-transformers     2.2.2                    pypi_0    pypi
sentencepiece             0.1.99                   pypi_0    pypi
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
sip                       6.7.12          py310h313beb8_0  
six                       1.16.0             pyhd3eb1b0_1  
smart-open                6.4.0                    pypi_0    pypi
sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.5             py310hca03da5_0  
spacy                     3.7.2                    pypi_0    pypi
spacy-legacy              3.0.12                   pypi_0    pypi
spacy-loggers             1.0.5                    pypi_0    pypi
sqlalchemy                2.0.23                   pypi_0    pypi
sqlite                    3.44.2               hf2abe2d_0    conda-forge
srsly                     2.4.8                    pypi_0    pypi
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
tenacity                  8.2.3                    pypi_0    pypi
terminado                 0.18.0             pyh31c8845_0    conda-forge
thinc                     8.2.1                    pypi_0    pypi
threadpoolctl             3.2.0                    pypi_0    pypi
tinycss2                  1.2.1           py310hca03da5_0  
tk                        8.6.13               h5083fa2_1    conda-forge
tokenizers                0.15.0                   pypi_0    pypi
tomli                     2.0.1           py310hca03da5_0  
torchvision               0.14.1                   pypi_0    pypi
tornado                   6.3.3           py310h80987f9_0  
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.14.0             pyhd8ed1ab_0    conda-forge
transformers              4.35.2                   pypi_0    pypi
typer                     0.9.0                    pypi_0    pypi
types-python-dateutil     2.8.19.14          pyhd8ed1ab_0    conda-forge
typing-extensions         4.8.0                hd8ed1ab_0    conda-forge
typing-inspect            0.9.0                    pypi_0    pypi
typing_extensions         4.8.0              pyha770c72_0    conda-forge
tzdata                    2023c                h04d1e81_0  
uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
urllib3                   1.26.18         py310hca03da5_0  
wasabi                    1.1.2                    pypi_0    pypi
wcwidth                   0.2.12             pyhd8ed1ab_0    conda-forge
weasel                    0.3.4                    pypi_0    pypi
webcolors                 1.13               pyhd8ed1ab_0    conda-forge
webencodings              0.5.1           py310hca03da5_1  
websocket-client          1.6.4              pyhd8ed1ab_0    conda-forge
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
widgetsnbextension        4.0.9              pyhd8ed1ab_0    conda-forge
xz                        5.4.2                h80987f9_0  
yaml                      0.2.5                h1a28f6b_0  
yarl                      1.9.3                    pypi_0    pypi
zeromq                    4.3.5                h965bd2d_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h53f4e23_5    conda-forge
zstd                      1.5.5                hd90d995_0

License?

Hi thank you for the great work. What a well-executed study. I want to use your data and code to probe into a new idea that I was thinking about. Do you have any thoughts of licensing your code, such that I can use it in my study?

Having issues on using LLAMA-7B model

I tried to use llama-7B for factscore, but have problem when running download_data.
after loading checkpoint shards, RecursionError: maximum recursion depth exceeded while calling a Python object occurs while
running on recover_instruct_llama(args.llama_7B_HF_path, os.path.join(args.model_dir, "inst-llama-7B"))

FileNotFoundError (demons.json) for custom knowledge base

Hello,

First off, I'd like to express my appreciation for this great package you've developed. I'm in the process of testing a scenario where I evaluate the quality of generated summaries based on a custom knowledge base. Any guidance or pointers would be greatly appreciated!

For this purpose, I create the following knowledge.jsonl file:

{"title": "Gravity", "text": "Gravity is a force by which a planet or other body draws objects toward its center. The force of gravity keeps all of the planets in orbit around the sun."}
{"title": "Photosynthesis", "text": ["Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll pigments.", "In simple words, it is the process where plants make their own food using sunlight."]}
{"title": "Pythagorean Theorem", "text": "In mathematics, the Pythagorean theorem, also known as Pythagoras's theorem, is a fundamental relation in Euclidean geometry among the three sides of a right triangle. It states that the square of the hypotenuse is equal to the sum of the squares of the other two sides."}

and, following the example in the README, run the code:

fs = FactScorer(openai_key="...")
fs.register_knowledge_source("science_knowledge_base",
                             data_path="/content/knowledge.jsonl",
                             db_path="/content/knowledge_db")
topics = ["Gravity", "Photosynthesis", "Pythagorean Theorem"]
generations = ["Gravity is a force that draws objects toward the center of a planet or body, keeping planets in orbit around the sun.",
               "Photosynthesis allows plants and certain organisms to create food using sunlight and chlorophyll.",
               "This theorem in Euclidean geometry relates the three sides of a right triangle, stating that the hypotenuse's square is the sum of the squares of the other sides."]

out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")

In the last line however I receive the following error message:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-47-58ee9f60532e>](https://localhost:8080/#) in <cell line: 2>()
      1 # now, when you compute a score, specify knowledge source to use
----> 2 out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")
      3 print (out["score"]) # FActScore
      4 print (out["respond_ratio"]) # % of responding (not abstaining from answering)
      5 print (out["num_facts_per_response"]) # average number of atomic facts per response

1 frames
[/usr/local/lib/python3.10/dist-packages/factscore/factscorer.py](https://localhost:8080/#) in get_score(self, topics, generations, gamma, atomic_facts, knowledge_source, verbose)
    127         else:
    128             if self.af_generator is None:
--> 129                 self.af_generator = AtomicFactGenerator(key_path=self.openai_key,
    130                                                         demon_dir=os.path.join(self.data_dir, "demos"),
    131                                                         gpt3_cache_file=os.path.join(self.cache_dir, "InstructGPT.pkl"))

[/usr/local/lib/python3.10/dist-packages/factscore/atomic_facts.py](https://localhost:8080/#) in __init__(self, key_path, demon_dir, gpt3_cache_file)
     27 
     28         # get the demos
---> 29         with open(self.demon_path, 'r') as f:
     30             self.demons = json.load(f)
     31 

FileNotFoundError: [Errno 2] No such file or directory: '.cache/factscore/demos/demons.json'

I'm trying to understand the role of demons.json and necessity. Despite my efforts to comb through the code, I couldn't quite grasp its purpose. Could you shed some light on this?

System: I am running this on colab and installed the factscore package using pip install --upgrade factscore.

Thank you very much in advance!

Using unlabeled data to generate atomic facts and retrieving evidence

Hello! I have a few questions.

Do I understand correctly that in the current pipeline, the generation of atomic facts for unlabeled data is done both using the actual unlabeled data and demos data, even if we need only to process the unlabeled data? Why demos data is always used in these calculations?
Did you notice that the retrieved evidence is the same for all atomic facts inside a particular bio? Seems that it should be dependent on each particular atomic fact.

InvalidRequestError: The model `text-davinci-003` has been deprecated

Is there a way to bypass this or change the model from my end to run factscore using OpenAI models?

Return respond rate and number of facts together with FActScore

Create HF weight diffs for LLAMA model and add it to the README

Any way to run without an OpenAI key?

Hi, I have access to Llama through Huggingface and am trying to run FactScore through the CLI: python -m factscore.factscorer --input_path data/unlabeled/InstructGPT.jsonl --model_name retrieval+llama+npm

Shouldn't this use llama to do atomic fact generation? I am getting an error: AssertionError: Please place your OpenAI APT Key in api.key. Is there a way to use FactScore without an OpenAI key?

Generation details for models under data/labeled

Thank you for the excellent work!

I have a question regarding the generation of outputs in the data/labeled files. Specifically, I'm curious about the parameters and prompts you used during this process. I've noticed that my generated text (e.g. from ChatGPT) is much longer than the content in your file. Could you please provide information on the settings you employed, such as temperature, max_tokens, and prompts, when generating the biographies? Your assistance in this matter would be greatly appreciated.

Thank you in advance!

Move to ChatCompletion endpoint for call_GPT3

Now that OpenAI has deprecated the Completion API, we should move our code to ChatCompletion: https://github.com/shmsw25/FActScore/blob/main/factscore/openai_lm.py#L75

FActScore can be slow to run

Pointed out by a number of people including Katherine Tian.

Sewon's reply / points:

You are right that FactScore does not work in a batch. I think it should be possible to modify the code to make it work in a batch. My recommendation is to identify which part of the pipeline is causing a bottleneck in speed, and then parallelize from the slowest parts one by one. For instance, there are four possible bottlenecks: (1) atomic fact generation, (2) GTR retrieval, (3) InstLLAMA generation, (4) NPM verification.

If (1) is the bottleneck: the bottleneck is coming from OpenAI API, which is possibly because you are sharing the API key with many others and you run into the Rate Limit error. You can check if this is the case by always printing these lines. If this is the case, the best strategy is to use another API key that is not shared by others. Other than that, it's not straightforward how to speed this part up. (In the past, we've heard from users that this is the main bottleneck in speed in their cases, but it's definitely possible it's not the case for you.)

If (2) is the bottleneck, you can make encoding of the query vector (in passage retrieval) work in a batch.

If (3) is the bottleneck, you can make the _generate function work in a batch.

If (4) is the bottleneck, you can make npm work in a batch, or skip NPM by specifying retrieval+llama instead of retrieval+llama+npm which I think should give descent results.

Typo here

FActScore/factscore/factscorer.py

Line 118 in 03ff672

if type(topics)==len(generations)==str:

should be type(topics)==type(generations)==str:

Batching for speed.

Hi, i would like to ask if it has been tried out with llama that batch inference works?

i followed this https://huggingface.co/docs/transformers/llm_tutorial#wrong-padding-side , where they pass both the input_ids and attention_mask to the model, but i got 'nan' values. If i only passed in the input_ids, it is fine, however I'm not sure if not passing in the attention mask will have any effects on the final output.

Also, there seems to be a bug in

FActScore/factscore/lm.py

Line 27 in c6e150f

if prompt.endswith(" True or False?\nAnswer:"):

where " True or False?\nAnswer:" is not detected since

FActScore/factscore/factscorer.py

Line 223 in c6e150f

    
           prompt = "{}\n\nInput: {} True or False?\nOutput:".format(definition.strip(), atom.strip())

ends with a different output, hence the model generated length is 128 instead of 1. This will waste cost and if in the case of gpt3.5 being used, there may be both 'true' and 'false' in the 128 tokens leading to wrong decisions.

problem using the factscorer function with LLaMa

Good morning,

I was trying to use the factscorer function.

With this code :

I received this message error :

If I remove the openai_key=None in factscore = FactScorer(model_name="retrieval+llama", data_dir="/kaggle/working/"), I have this error :
AssertionError: Please place your OpenAI APT Key in api.key.

I would like to know if I need to have an OpenAI key for using FactScore even if my model is LLaMa as you can see on the first picture.

Thank you!

Improved version for speed and openai updates

Hi, i have inserted some updates into the code to improve speed and also make it compatible with the latest openai API updates.
Could you have a look and see if anything is missing or wrong?

Link: https://github.com/wj210/factscore

AssertionError: `topic` in your data (Jessie Mae Brown Beavers) is likely to be not a valid title in the DB.

AssertionError: topic in your data (Jessie Mae Brown Beavers) is likely to be not a valid title in the DB.

Add citation / bibtex to README

About the enwiki-20230401

after download the data and set the environment, I run this command python -m factscore.factscorer --input_path "/root/FNDLLM/test.jsonl" --model_name "retrieval+llama+npm" --use_atomic_facts --data_dir '/root/.cache/factscore/ and get this File "/root/anaconda3/envs/factstore/lib/python3.7/site-packages/factscore/retrieval.py", line 57, in build_db
with open(data_path, "r") as f:'FileNotFoundError: [Errno 2] No such file or directory: '/root/.cache/factscore/enwiki-20230401.jsonl'
I didn't find the enwiki-20230401.jsonl in the download data, where is it?

Regarding human evaluation

Hi, thanks for your great work!
To reproduce your results of error rate, is it possible for you to release the human evaluation results?
If not, could you tell which 500 entities you were using for the experiments?