Hello,
First off, I'd like to express my appreciation for this great package you've developed. I'm in the process of testing a scenario where I evaluate the quality of generated summaries based on a custom knowledge base. Any guidance or pointers would be greatly appreciated!
For this purpose, I create the following knowledge.jsonl file:
{"title": "Gravity", "text": "Gravity is a force by which a planet or other body draws objects toward its center. The force of gravity keeps all of the planets in orbit around the sun."}
{"title": "Photosynthesis", "text": ["Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll pigments.", "In simple words, it is the process where plants make their own food using sunlight."]}
{"title": "Pythagorean Theorem", "text": "In mathematics, the Pythagorean theorem, also known as Pythagoras's theorem, is a fundamental relation in Euclidean geometry among the three sides of a right triangle. It states that the square of the hypotenuse is equal to the sum of the squares of the other two sides."}
and, following the example in the README, run the code:
fs = FactScorer(openai_key="...")
fs.register_knowledge_source("science_knowledge_base",
data_path="/content/knowledge.jsonl",
db_path="/content/knowledge_db")
topics = ["Gravity", "Photosynthesis", "Pythagorean Theorem"]
generations = ["Gravity is a force that draws objects toward the center of a planet or body, keeping planets in orbit around the sun.",
"Photosynthesis allows plants and certain organisms to create food using sunlight and chlorophyll.",
"This theorem in Euclidean geometry relates the three sides of a right triangle, stating that the hypotenuse's square is the sum of the squares of the other sides."]
out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")
In the last line however I receive the following error message:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
[<ipython-input-47-58ee9f60532e>](https://localhost:8080/#) in <cell line: 2>()
1 # now, when you compute a score, specify knowledge source to use
----> 2 out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")
3 print (out["score"]) # FActScore
4 print (out["respond_ratio"]) # % of responding (not abstaining from answering)
5 print (out["num_facts_per_response"]) # average number of atomic facts per response
1 frames
[/usr/local/lib/python3.10/dist-packages/factscore/factscorer.py](https://localhost:8080/#) in get_score(self, topics, generations, gamma, atomic_facts, knowledge_source, verbose)
127 else:
128 if self.af_generator is None:
--> 129 self.af_generator = AtomicFactGenerator(key_path=self.openai_key,
130 demon_dir=os.path.join(self.data_dir, "demos"),
131 gpt3_cache_file=os.path.join(self.cache_dir, "InstructGPT.pkl"))
[/usr/local/lib/python3.10/dist-packages/factscore/atomic_facts.py](https://localhost:8080/#) in __init__(self, key_path, demon_dir, gpt3_cache_file)
27
28 # get the demos
---> 29 with open(self.demon_path, 'r') as f:
30 self.demons = json.load(f)
31
FileNotFoundError: [Errno 2] No such file or directory: '.cache/factscore/demos/demons.json'
I'm trying to understand the role of demons.json and necessity. Despite my efforts to comb through the code, I couldn't quite grasp its purpose. Could you shed some light on this?
System: I am running this on colab and installed the factscore package using pip install --upgrade factscore.
Thank you very much in advance!