Code Monkey home page Code Monkey logo

meanrecipe's Introduction

meanrecipe

Sometimes when I want a recipe to cook something new I will find several recipes for the same thing and try to use them as a guide to generate an average or "consensus" recipe. This code should make it easy to generate consensus recipes (useful!) and also show variation between recipes (interesting!).

Finding a consensus recipe requires first clustering many recipes. This is because a single recipe (e.g. a recipe for brownies) might have many significant variations (e.g. brownies can have just cocoa, just chocolate, or both). This code will first cluster recipes and then use the clusters to deliver the consensus recipe.

How does it work

The quick-and-dirty implementation goes like this:

  1. Choose a recipe (e.g. brownies, crepes, pancakes).
  2. Search to find thousands of corresponding recipes.
  3. Download all the recipes convert to gzipped text for processing.
  4. Use a really simple (read: bad) context-extractor to grab ingredients.
  5. Cluster the recipes based on the presence of ingredients.
  6. Take the mean values (after removing outliers) for ingredients in a given cluster to create an average recipe.

The context-extractor works by finding the most likely "ingredient" section in the web page and then trying to parse those ingredients using a greedy search from a list of likely ingredients (top_5k.txt). Its not a great implementation. However, the errors in it are pretty random, which means you can get okay results as long as you have ~hundreds of recipes.

Here's some examples of running the code.

Chocolate chip cookies

$ meanrecipe -recipe 'chocolate chip cookies'

The output will container multiple recipes for 'chocolate chip cookies', clustered according to ingredients. For example, here is the first cluster, the most popular recipe:

Cluster 1 (35% of 1041)

Ingredients:
- 1 teaspoon baking soda (± 42%)
- ⅞ cup brown sugar (± 40%)
- ⅞ cup butter (± 29%)
- 1 ⅝ cup chocolate (± 62%)
- 2 eggs (± 62%)
- 2 ¼ cup flour (± 40%)
- ¾ teaspoon salt (± 67%)
- ¾ cup sugar (± 51%)
- 1 ⅝ teaspoon vanilla (± 75%)

Directions:
1. Preheat oven to 300 degrees F (150 degrees C).

2. Sift together the flour, baking powder and salt, set aside. In a
medium bowl, cream the butter and sugar together until fluffy.
Gradually stir in the dry ingredients, then stir in the walnuts and
chocolate chips.

3. Roll or scoop dough into walnut sized balls. Place them on unprepared
cookie sheets 1 1/2 inches apart. Flatten cookies slightly. Bake for
15 to 20 minutes, until light golden brown. Remove from sheets to cool
on racks.

The second cluster (the second most popular 'chocolate chip cookie' recipe) has pinpointed a variation - the inclusion of baking powder.

Cluster 2 (16% of 1041)
Variation:
 +baking powder

Ingredients:
- 1 ⅛ teaspoon baking powder (± 80%)
- ⅞ teaspoon baking soda (± 49%)
- 1 ⅛ cup brown sugar (± 80%)
- ⅞ cup butter (± 40%)
- 1 ⅝ cup chocolate (± 85%)
- 2 eggs (± 55%)
- 2 ⅛ cup flour (± 51%)
- ¾ teaspoon salt (± 71%)
- ¾ cup sugar (± 80%)
- 2 teaspoon vanilla (± 78%)

Directions:
1. Preheat oven to 300 degrees F (150 degrees C).

2. Sift together the flour, baking powder and salt, set aside. In a
medium bowl, cream the butter and sugar together until fluffy.
Gradually stir in the dry ingredients, then stir in the walnuts and
chocolate chips.

3. Roll or scoop dough into walnut sized balls. Place them on unprepared
cookie sheets 1 1/2 inches apart. Flatten cookies slightly. Bake for
15 to 20 minutes, until light golden brown. Remove from sheets to cool
on racks.

Reading further down you can find even more variations, for example this recipe which uses cocoa:

Cluster 5 (4% of 1041)
Variation:
 +cocoa

Ingredients:
- ¾ teaspoon baking soda (± 57%)
- ¾ cup brown sugar (± 57%)
- ¾ cup butter (± 52%)
- 1 ⅛ cup chocolate (± 72%)
- ⅜ cup cocoa (± 80%)
- 1 eggs (± 44%)
- 1 ½ cup flour (± 67%)
- ⅝ teaspoon salt (± 88%)
- ¾ cup sugar (± 88%)
- 1 ½ teaspoon vanilla (± 67%)

Directions:
1. Preheat oven to 350 degrees F (175 degrees C). Grease cookie sheets.
Stir together the flour, cocoa, baking powder, baking soda, salt and
cinnamon; set aside.

2. In a large bowl, cream together the margarine, brown sugar and white
sugar. Beat in the egg and vanilla. Stir in the dry ingredients using
a wooden spoon. Mix in the oats and chocolate chips. Drop by
tablespoonfuls onto cookie sheets, leaving 2 inches between cookies.

3. Bake for 8 to 10 minutes in the preheated oven, or until lightly
browned.  Allow cookies to cool on baking sheet for 5 minutes before
removing to a wire rack to cool completely.

Try it

Web

You can try it on the web at https://meanrecipe.schollz.com/

Install

Download from the latest releases, or download with Go:

$ go get github.com/schollz/meanrecipe

Run

Just run from the command line and specify the food that you want.

$ meanrecipe -recipe 'chocolate chip cookies'

Be patient as it will take 3-5 minutes to download and pre-process the data. Data is only downloaded once, if you run it a second time it will use the previous data.

You can also generate different number of clusters using -clusters X where X is the number of clusters.

To make sure certain ingredients are included just use -include 'chocolate, oats' (for example).

Roadmap

This is a quick-and-dirty project. I don't plan to do much more on it, it was just a fun thing.

However, here are some things I realize this project does not do and would be great to implement:

  • Making food volumes more accurate. The code specifies a constant density for ingredients that are specified in weight so that they can be converted to volumes (volumes are necessary for normalization before taking means). In reality different foods have different densities, of course.
  • Making proportions more accurate. This will be easier if the previous item is finished.
  • Adding in a specifier for the variation in the amount (show the mean and the standard deviation of the mean?).
  • Adding in recipe directions. Is there a way towards consensus directions? This might be really really hard.
  • In general, making the parsing (from websites) and the food tagging better. There are more sophisticated taggers (see NYT food tagger).

License

MIT

meanrecipe's People

Contributors

georgiosgoniotakis avatar schollz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

meanrecipe's Issues

ERROR not enough data

Awesome tool by the way but wasnt able to get it to work. How can i troubleshoot? Heres what i got

INFO] querying urls for 'best beef taco recipe'
[INFO] found 344 urls for 'best beef taco recipe'
[INFO] downloading...
 100% |████████████████████████████████████████| [10s:0s]
[INFO] downloaded 292 urls in 11.126007525s
[INFO] querying urls for 'favorite beef taco recipe'
[INFO] found 345 urls for 'favorite beef taco recipe'
[INFO] downloading...
 100% |████████████████████████████████████████| [6s:0s]
[INFO] downloaded 58 urls in 7.084882323s
[INFO] querying urls for 'homemade beef taco recipe'
[INFO] found 362 urls for 'homemade beef taco recipe'
[INFO] downloading...
 100% |████████████████████████████████████████| [3s:0s]
[INFO] downloaded 45 urls in 6.166418219s
[INFO] querying urls for 'simple recipe for beef taco'
[INFO] found 375 urls for 'simple recipe for beef taco'
[INFO] downloading...
 100% |████████████████████████████████████████| [4s:0s]
[INFO] downloaded 0 urls in 4.192123679s
[INFO] querying urls for 'basic beef taco recipe'
[INFO] found 402 urls for 'basic beef taco recipe'
[INFO] downloading...
 100% |████████████████████████████████████████| [3s:0s]
[INFO] downloaded 40 urls in 3.356336038s
[INFO] querying urls for 'recipe for beef taco from scratch'
[INFO] found 412 urls for 'recipe for beef taco from scratch'
[INFO] downloading...
 100% |████████████████████████████████████████| [5s:0s]
[INFO] downloaded 5 urls in 7.41211692s
[INFO] querying urls for 'yummy beef taco recipe'
[INFO] found 396 urls for 'yummy beef taco recipe'
[INFO] downloading...
 100% |████████████████████████████████████████| [4s:0s]
[INFO] downloaded 41 urls in 4.646652188s
[INFO] getting all recipes
[INFO] parsing 387 prospective recipes
 100% |████████████████████████████████████████| [0s:0s]
[INFO] got 1 recipes
[INFO] requiring 1 ingredients: [beef]
[INFO] finding best cluster
ERROR not enough data

code stuck?

1st try, seems to just get stuck

python3 run.py --recipe 'sweet potato soup'

Runs as far as

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1199/1201 [06:00<00:00,  3.24it/s]
1199it [06:00,  3.33it/s]

Then seems to hang. 100% cpu used by pandoc, but seems to be making no progress.

I had tried your example

python3 run.py --recipe brownies

which ran to completion OK

"ERROR no urls" on direction-gathering?

This happened both for meanrecipe -recipe coleslaw and meanrecipe -recipe 'chocolate chip cookies'; seems like a widespread issue.
I think only the end where it errors out is relevant, but here's the full log just in case:

meanrecipe -recipe coleslaw

         ___ ___    ___   ____  ____       ____     ___    __  ____  ____   ___
        |   |   |  /  _] /    ||    \     |    \   /  _]  /  ]|    ||    \ /  _]
        | _   _ | /  [_ |  o  ||  _  |    |  D  ) /  [_  /  /  |  | |  o  )  [_
        |  \_/  ||    _]|     ||  |  |    |    / |    _]/  /   |  | |   _/    _]
        |   |   ||   [_ |  _  ||  |  |    |    \ |   [_/   \_  |  | |  | |   [_
        |   |   ||     ||  |  ||  |  |    |  .  \|     \     | |  | |  | |     |
        |___|___||_____||__|__||__|__|    |__|\_||_____|\____||____||__| |_____|
                  _  _
                _/0\/ \_
        .-.   .-` \_/\0/ '-.
       /:::\ / ,_________,  \
      /\:::/ \  '. (:::/  `'-;
      \ `-'`\ '._ `"'"'\__    \
      `'-.  \   `)-=-=(  `,   |
          \  `-"`      `"-`   /


←[36m[INFO]←[0m querying urls for 'best coleslaw recipe'
←[36m[INFO]←[0m found 405 urls for 'best coleslaw recipe'
←[36m[INFO]←[0m downloading...
 100% |████████████████████████████████████████| [9s:0s]
←[36m[INFO]←[0m downloaded 391 urls in 9.7500575s
←[36m[INFO]←[0m querying urls for 'favorite coleslaw recipe'
←[36m[INFO]←[0m found 336 urls for 'favorite coleslaw recipe'
←[36m[INFO]←[0m downloading...
 100% |████████████████████████████████████████| [9s:0s]
←[36m[INFO]←[0m downloaded 334 urls in 9.0114886s
←[36m[INFO]←[0m querying urls for 'homemade coleslaw recipe'
←[36m[INFO]←[0m found 393 urls for 'homemade coleslaw recipe'
←[36m[INFO]←[0m downloading...
 100% |████████████████████████████████████████| [6s:0s]
←[36m[INFO]←[0m downloaded 391 urls in 6.6838181s
←[36m[INFO]←[0m querying urls for 'simple recipe for coleslaw'
←[36m[INFO]←[0m found 412 urls for 'simple recipe for coleslaw'
←[36m[INFO]←[0m downloading...
 100% |████████████████████████████████████████| [7s:0s]
←[36m[INFO]←[0m downloaded 405 urls in 8.2969438s
←[36m[INFO]←[0m querying urls for 'basic coleslaw recipe'
←[36m[INFO]←[0m found 426 urls for 'basic coleslaw recipe'
←[36m[INFO]←[0m downloading...
 100% |████████████████████████████████████████| [7s:0s]
←[36m[INFO]←[0m downloaded 424 urls in 7.6144668s
←[36m[INFO]←[0m querying urls for 'recipe for coleslaw from scratch'
←[36m[INFO]←[0m found 378 urls for 'recipe for coleslaw from scratch'
←[36m[INFO]←[0m downloading...
 100% |████████████████████████████████████████| [6s:0s]
←[36m[INFO]←[0m downloaded 374 urls in 6.619885s
←[36m[INFO]←[0m querying urls for 'yummy coleslaw recipe'
←[36m[INFO]←[0m found 358 urls for 'yummy coleslaw recipe'
←[36m[INFO]←[0m downloading...
 100% |████████████████████████████████████████| [7s:0s]
←[36m[INFO]←[0m downloaded 354 urls in 8.1550018s
←[36m[INFO]←[0m getting all recipes
←[36m[INFO]←[0m parsing 1571 prospective recipes
 100% |████████████████████████████████████████| [12s:0s]
←[36m[INFO]←[0m got 1571 recipes
←[36m[INFO]←[0m finding best cluster
←[36m[INFO]←[0m clustering with 1090 / 1571 recipes
←[36m[INFO]←[0m wrote analyzed recipes to recipes/coleslaw/mean_recipes.json
←[36m[INFO]←[0m clustering with 1090 / 1571 recipes
←[36m[INFO]←[0m wrote analyzed recipes to recipes/coleslaw/mean_recipes.json
←[36m[INFO]←[0m clustering with 1090 / 1571 recipes
←[36m[INFO]←[0m wrote analyzed recipes to recipes/coleslaw/mean_recipes.json
←[36m[INFO]←[0m clustering with 1090 / 1571 recipes
←[36m[INFO]←[0m wrote analyzed recipes to recipes/coleslaw/mean_recipes.json
←[36m[INFO]←[0m clustering with 1090 / 1571 recipes
←[36m[INFO]←[0m wrote analyzed recipes to recipes/coleslaw/mean_recipes.json
←[36m[INFO]←[0m map[cabbage:6 lemon juice:1 sugar:7 apple cider vinegar:2 celery seed:4 dijon mustard:1 milk:1 vinegar:1 carrot:5 mayonnaise:7 pepper:4 salt:5 onion:2 white vinegar:2 mustard:1 white wine vinegar:1 buttermilk:1]
←[36m[INFO]←[0m getting directions for recipe 0
←[36m[INFO]←[0m getting recipe url for coleslaw +[] -[salt] (https://www.allrecipes.com/search/results/?wt=coleslaw&ingIncl=&ingExcl=salt&sort=re)
2021-10-21 21:48:23 ←[33m[WARN]←[0m run.go Run:145 no urls
←[36m[INFO]←[0m getting directions for recipe 1
←[36m[INFO]←[0m getting recipe url for coleslaw +[apple+cider+vinegar dijon+mustard] -[] (https://www.allrecipes.com/search/results/?wt=coleslaw&ingIncl=apple+cider+vinegar,dijon+mustard&ingExcl=&sort=re)
2021-10-21 21:48:24 ←[33m[WARN]←[0m run.go Run:145 no urls
←[36m[INFO]←[0m getting directions for recipe 2
←[36m[INFO]←[0m getting recipe url for coleslaw +[apple+cider+vinegar] -[carrot] (https://www.allrecipes.com/search/results/?wt=coleslaw&ingIncl=apple+cider+vinegar&ingExcl=carrot&sort=re)
2021-10-21 21:48:25 ←[33m[WARN]←[0m run.go Run:145 no urls
←[36m[INFO]←[0m getting directions for recipe 3
←[36m[INFO]←[0m getting recipe url for coleslaw +[buttermilk lemon+juice milk onion white+vinegar] -[] (https://www.allrecipes.com/search/results/?wt=coleslaw&ingIncl=buttermilk,lemon+juice,milk,onion,white+vinegar&ingExcl=&sort=re)
2021-10-21 21:48:25 ←[33m[WARN]←[0m run.go Run:145 no urls
←[36m[INFO]←[0m getting directions for recipe 4
←[36m[INFO]←[0m getting recipe url for coleslaw +[onion vinegar] -[] (https://www.allrecipes.com/search/results/?wt=coleslaw&ingIncl=onion,vinegar&ingExcl=&sort=re)
2021-10-21 21:48:25 ←[33m[WARN]←[0m run.go Run:145 no urls
←[36m[INFO]←[0m getting directions for recipe 5
←[36m[INFO]←[0m getting recipe url for coleslaw +[white+vinegar] -[carrot cabbage] (https://www.allrecipes.com/search/results/?wt=coleslaw&ingIncl=white+vinegar&ingExcl=carrot,cabbage&sort=re)
2021-10-21 21:48:26 ←[33m[WARN]←[0m run.go Run:145 no urls
←[36m[INFO]←[0m getting directions for recipe 6
←[36m[INFO]←[0m getting recipe url for coleslaw +[mustard white+wine+vinegar] -[salt] (https://www.allrecipes.com/search/results/?wt=coleslaw&ingIncl=mustard,white+wine+vinegar&ingExcl=salt&sort=re)
2021-10-21 21:48:27 ←[33m[WARN]←[0m run.go Run:145 no urls
ERROR no urls

Same error when I re-run the command, I assume it's related to gathering instructions.
I can still manually read the mean_recipes.json file to find cluster info, but only using the JSON removes a lot of the usability with the poor readability(formatting) and the lack of directions.

prettytable dependency

Fun project! FYI you seem to have forgotten to include prettytable in the dependency list.

getting urls...
https://duckduckgo.com/?q=simple+cheesecake+recipe
https://duckduckgo.com/?q=favorite+cheesecake+recipe
https://duckduckgo.com/?q=delicious+cheesecake+recipe
https://duckduckgo.com/?q=diy+cheesecake+recipe
https://duckduckgo.com/?q=homemade+cheesecake+recipe
https://duckduckgo.com/?q=best+cheesecake+recipe
https://duckduckgo.com/?q=recipes+for+cheesecake+
https://duckduckgo.com/?q=easy+cheesecake+recipe
downloading 556 cheesecake recipes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 556/556 [01:32<00:00,  1.63it/s]
Traceback (most recent call last):
  File "run.py", line 88, in <module>
    start()
  File "/usr/local/var/pyenv/versions/sandbox3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/var/pyenv/versions/sandbox3/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/var/pyenv/versions/sandbox3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/var/pyenv/versions/sandbox3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "run.py", line 83, in start
    from analyze import get_clusters
  File "/Users/ariebovenberg/Git/consensus-cookery/analyze.py", line 18, in <module>
    from prettytable import PrettyTable
ModuleNotFoundError: No module named 'prettytable'

Everything ran fine after installing it 😄

edit: typo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.