Code Monkey home page Code Monkey logo

sciscraper's People

Contributors

pathos315 avatar trag1c avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

sciscraper's Issues

New code returns an AttributeError

See the below:

Traceback (most recent call last): a doi-style search | Total Entries: 127 | Less than 7 minutes remaining from 2021-09-27 10:21:42.66888842.6688888888888 File "/Users/johnfallot/venv/dim_scraper_classed/program_v008.py", line 97, in <module> main() File "/Users/johnfallot/venv/dim_scraper_classed/program_v008.py", line 87, in main res1 = doi_scrape(path) File "/Users/johnfallot/venv/dim_scraper_classed/program_v008.py", line 25, in doi_scrape return pd.DataFrame([run_scrape(search_text, search_field='doi', total=numb_files) for search_text in (_search_terms)]) File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 570, in __init__ arrays, columns = to_arrays(data, columns, dtype=dtype) File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 530, in to_arrays return _list_of_dict_to_arrays( File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 643, in _list_of_dict_to_arrays columns = lib.fast_unique_multiple_list_gen(gen, sort=sort) File "pandas/_libs/lib.pyx", line 353, in pandas._libs.lib.fast_unique_multiple_list_gen File "/usr/local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 641, in <genexpr> gen = (list(x.keys()) for x in data) AttributeError: 'NoneType' object has no attribute 'keys' johnfallot@Johns-iMac ~ %

Feedback

Here's some feedback after a quick look over the code.

Some of the feedback here is mostly related to style and formatting. Although that's usually not going to keep the code from working, that can impact readability.

I've linked some sections of the PEP8 and other sources that I usually follow.

One thing I would suggest, but this is mostly personal preference, is to use a code formatter and linter, such as Black and Flake8. Those, among other tools, will greatly help with keeping a consistent style and making the code easier to read.

If you're interested, here's a nice list of the tools I personally use:
https://github.com/MicaelJarniac/BuildURL/blob/main/CONTRIBUTING.md (expand the "Quick Reference" section)


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L55-L59
I think that this for loop will iterate over the first element only, and immediately return, without iterating through the other elements.

One way of working around that would be to create an empty list before the for, and instead of returning inside the loop, appending to that list, and then returning that list outside the for loop, after it's done.

Another option would be to use yield instead of return, thus turning that function into a generator.


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L1

Imports should usually be on separate lines
https://pep8.org/#imports


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L16
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L17
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L26
https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L34

Avoid extraneous whitespace in the following situations
Immediately before the open parenthesis that starts the argument list of a function call
https://pep8.org/#whitespace-in-expressions-and-statements


https://github.com/Pathos315/pdfcurate/blob/63f41324d71a6e8425dfd8693bad3638237da80f/altscraper/program_v008.py#L33

# Add default value for an argument after the type annotation
def f(num1: int, my_float: float = 3.5) -> float:
    return num1 + my_float

https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#functions

Feedback

sciscraper/main.py

Lines 56 to 67 in e9b3711

now=datetime.datetime.now()
date=now.strftime('%y%m%d')
export_dir=os.path.realpath('PDN Scraper Exports')
msg_error_1='[sciscraper]: HTTP Error Encountered, moving to next available object. Reason Given:'
logging.basicConfig(filename=f'{date}_scraper.log', level=logging.DEBUG,
format = '%(asctime)s - %(message)s', datefmt='%d-%b-%y %H:%M:%S')
PRIME_SRC =os.path.realpath('211001_PDN_studies_9.csv')
URL_DMNSNS ='https://app.dimensions.ai/discover/publication/results.json'
RESEARCH_DIR=os.path.realpath(f'{date}_PDN Research Papers From Scrape')
URL_SCIHUB='https://sci-hubtw.hkvisa.net/'

In many places, there are assignment operators without spaces around the =.

Always surround these binary operators with a single space on either side: assignment (=) [...]
https://pep8.org/#other-recommendations


sciscraper/main.py

Lines 84 to 94 in e9b3711

def __new__(cls, s_bool: bool):
'''
The ScrapeRequest class looks for the boolean value passed to it from the FileRequest class.
A value of True, or 1, would return a SciHubScrape subclass. Whereas a value of False, of 0, would return a JSONScrape subclass.
'''
if s_bool == False:
slookup_code = 'json'
elif s_bool == True:
slookup_code = 'sci'
else:
raise Exception('[sciscraper]: Invalid prefix detected.')

You've already specified that s_bool is a bool, so on your if statement, you don't need to compare to True and False, you can simply do:

         if s_bool:
             slookup_code = 'sci'
         else:
             slookup_code = 'json'

Notice that I've "inverted" the order in which the tests happen, as to avoid using if not s_bool: ... else: ... (that could be confusing).
And since you're already specifying that s_bool is supposed to be a bool, I believe it's not necessary to handle cases where it's not a bool, as that's not supposed to happen, and tools like mypy can already do this kind of check.
But if you believe it's necessary, then it could be done like so:

         if not isinstance(s_bool, bool):
             raise TypeError
         if s_bool:
             slookup_code = 'sci'
         else:
             slookup_code = 'json'

Also notice that I'm raising a TypeError, instead of a generic Exception since that's pretty much what TypeError is for.

https://docs.quantifiedcode.com/python-anti-patterns/readability/comparison_to_true.html
https://docs.python.org/3/library/exceptions.html#TypeError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.