pathos315 / sciscraper Goto Github PK
View Code? Open in Web Editor NEWA bulk academic PDF extractor program, designed specifically for papers about behavioral science and design.
License: MIT License
A bulk academic PDF extractor program, designed specifically for papers about behavioral science and design.
License: MIT License
See the below:
Traceback (most recent call last): a doi-style search | Total Entries: 127 | Less than 7 minutes remaining from 2021-09-27 10:21:42.66888842.6688888888888 File "/Users/johnfallot/venv/dim_scraper_classed/program_v008.py", line 97, in <module> main() File "/Users/johnfallot/venv/dim_scraper_classed/program_v008.py", line 87, in main res1 = doi_scrape(path) File "/Users/johnfallot/venv/dim_scraper_classed/program_v008.py", line 25, in doi_scrape return pd.DataFrame([run_scrape(search_text, search_field='doi', total=numb_files) for search_text in (_search_terms)]) File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 570, in __init__ arrays, columns = to_arrays(data, columns, dtype=dtype) File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 530, in to_arrays return _list_of_dict_to_arrays( File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 643, in _list_of_dict_to_arrays columns = lib.fast_unique_multiple_list_gen(gen, sort=sort) File "pandas/_libs/lib.pyx", line 353, in pandas._libs.lib.fast_unique_multiple_list_gen File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 641, in <genexpr> gen = (list(x.keys()) for x in data) AttributeError: 'NoneType' object has no attribute 'keys' johnfallot@Johns-iMac ~ %
Here's some feedback after a quick look over the code.
Some of the feedback here is mostly related to style and formatting. Although that's usually not going to keep the code from working, that can impact readability.
I've linked some sections of the PEP8 and other sources that I usually follow.
One thing I would suggest, but this is mostly personal preference, is to use a code formatter and linter, such as Black and Flake8. Those, among other tools, will greatly help with keeping a consistent style and making the code easier to read.
If you're interested, here's a nice list of the tools I personally use:
https://github.com/MicaelJarniac/BuildURL/blob/main/CONTRIBUTING.md (expand the "Quick Reference" section)
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L55-L59
I think that this for
loop will iterate over the first element only, and immediately return, without iterating through the other elements.
One way of working around that would be to create an empty list before the for
, and instead of returning inside the loop, appending to that list, and then returning that list outside the for
loop, after it's done.
Another option would be to use yield
instead of return
, thus turning that function into a generator.
Imports should usually be on separate lines
https://pep8.org/#imports
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L16
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L17
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L26
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L34
Avoid extraneous whitespace in the following situations
Immediately before the open parenthesis that starts the argument list of a function call
https://pep8.org/#whitespace-in-expressions-and-statements
# Add default value for an argument after the type annotation
def f(num1: int, my_float: float = 3.5) -> float:
return num1 + my_float
https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#functions
Lines 56 to 67 in e9b3711
=
.
Always surround these binary operators with a single space on either side: assignment (=) [...]
https://pep8.org/#other-recommendations
Lines 84 to 94 in e9b3711
s_bool
is a bool
, so on your if
statement, you don't need to compare to True
and False
, you can simply do:
if s_bool:
slookup_code = 'sci'
else:
slookup_code = 'json'
Notice that I've "inverted" the order in which the tests happen, as to avoid using if not s_bool: ... else: ...
(that could be confusing).
And since you're already specifying that s_bool
is supposed to be a bool
, I believe it's not necessary to handle cases where it's not a bool
, as that's not supposed to happen, and tools like mypy
can already do this kind of check.
But if you believe it's necessary, then it could be done like so:
if not isinstance(s_bool, bool):
raise TypeError
if s_bool:
slookup_code = 'sci'
else:
slookup_code = 'json'
Also notice that I'm raising a TypeError
, instead of a generic Exception
since that's pretty much what TypeError
is for.
https://docs.quantifiedcode.com/python-anti-patterns/readability/comparison_to_true.html
https://docs.python.org/3/library/exceptions.html#TypeError
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.