kapt-labs / django-check-seo Goto Github PK

Django Check SEO will check the SEO aspects of your site for you, and will provide advice in case of problems. Compatible with Django & Django-CMS!

License: GNU General Public License v3.0

Python 91.06% HTML 2.73% CSS 3.12% Shell 3.10%

seo django-cms django

django-check-seo's People

Contributors

Stargazers

Watchers

Forkers

zigzag-cs khabya kjs2941 mih333 prettywork2021 fluxility xshapira technicalconsultant321 datafields cruncher lnxg33k pyshawon

django-check-seo's Issues

Improve organization of checks

Currently, all checks are in a single Class, in the views.py file:

django-check-seo/django-check-seo/views.py

Lines 47 to 589 in 69832da

    
           class DjangoCheckSeo: 
        
               def __init__(self, soup, full_url): 
        
                   """Populate some vars. 
        
                   Arguments: 
        
                       soup {bs4.element} -- beautiful soup content (html) 
        
                   """ 
        
                   self.soup = soup 
        
                   # Get content of the page (exclude header/footer) 
        
                   self.content = self.soup.find("div", {"class": "container"}) 
        
                   # remove ul with nav class from content (<ul class="nav"> is the menu) 
        
                   self.content.find("ul", {"class": "nav"}).extract() 
        
                   self.full_url = full_url 
        
                   self.keywords = [] 
        
                   self.problems = [] 
        
                   self.warnings = [] 
        
               def check(self): 
        
                   """Magic happens here. 
        
                   Returns: 
        
                       tuple -- Two arrays of dict of form {name, settings, description}. 
        
                   """ 
        
                   self.check_keywords() 
        
                   self.check_title() 
        
                   self.check_description() 
        
                   self.check_links() 
        
                   self.check_keyword_occurence() 
        
                   self.check_keyword_url() 
        
                   self.check_h1() 
        
                   self.check_h2() 
        
                   self.check_images() 
        
                   self.check_url() 
        
                   self.keyword_present_in_first_paragraph() 
        
                   self.count_words_without_stopwords() 
        
                   return (self.problems, self.warnings) 
        
               def check_keywords(self): 
        
                   """First check to ensure that keywords are present. 
        
                   """ 
        
                   meta = self.soup.find_all("meta") 
        
                   for tag in meta: 
        
                       if tag.attrs["name"] == "keywords" and tag.attrs["content"] != "": 
        
                           # get keywords for next checks 
        
                           self.keywords = tag.attrs["content"].split( 
        
                               ",  " 
        
                           )  # may be dangerous to hard code the case where keywords are separated with a comma and two spaces 
        
                           return 
        
                   self.problems.append( 
        
                       { 
        
                           "name": _("No meta keywords"), 
        
                           "settings": _("at least 1"), 
        
                           "description": _( 
        
                               "Meta keywords were important in this meta tag, however django-check-seo uses these keywords to check all other tests related to keywords. You will be flooded with problems and warnings and this SEO tool will not work as well as it should if you don't add some keywords." 
        
                           ), 
        
                       } 
        
                   ) 
        
               def check_title(self): 
        
                   """Check all title-related conditions. 
        
                   """ 
        
                   # title presence 
        
                   if self.soup.title == "None": 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("No title tag"), 
        
                               "settings": _("at least 1"), 
        
                               "description": _( 
        
                                   "Titles tags are ones of the most important things to add to your pages, sinces they are the main text displayed on result search pages." 
        
                               ), 
        
                           } 
        
                       ) 
        
                       return 
        
                   # title length too short 
        
                   if len(self.soup.title.string) < settings.SEO_SETTINGS["meta_title_length"][0]: 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("Title tag is too short"), 
        
                               "settings": "&ge;{}".format( 
        
                                   settings.SEO_SETTINGS["meta_title_length"][0] 
        
                               ), 
        
                               "description": _( 
        
                                   "Titles tags need to describe the content of the page, and need to contain at least a few words." 
        
                               ), 
        
                           } 
        
                       ) 
        
                   # title length too long 
        
                   if len(self.soup.title.string) > settings.SEO_SETTINGS["meta_title_length"][1]: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("Title tag is too long"), 
        
                               "settings": "&le;{}".format( 
        
                                   settings.SEO_SETTINGS["meta_title_length"][1] 
        
                               ), 
        
                               "description": _( 
        
                                   "Only the first ~55-60 chars are displayed on modern search engines results. Writing a longer title is not really required and can lead to make the user miss informations." 
        
                               ), 
        
                           } 
        
                       ) 
        
                   title_words = self.soup.title.string.split() 
        
                   # title do not contain any keyword 
        
                   if set(self.keywords).isdisjoint(set(title_words)): 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("Title do not contain any keyword"), 
        
                               "settings": _("at least 1"), 
        
                               "description": _( 
        
                                   "Titles tags need to contain at least one keyword, since they are one of the most important content of the page for search engines." 
        
                               ), 
        
                           } 
        
                       ) 
        
               def check_description(self): 
        
                   meta = self.soup.find_all("meta") 
        
                   for tag in meta: 
        
                       if tag.attrs["name"] == "description" and tag.attrs["content"] != "": 
        
                           if ( 
        
                               len(tag.attrs["content"]) 
        
                               < settings.SEO_SETTINGS["meta_description_length"][0] 
        
                           ): 
        
                               self.problems.append( 
        
                                   { 
        
                                       "name": _("Meta description is too short"), 
        
                                       "settings": _("needed"), 
        
                                       "description": _( 
        
                                           "Meta description can be displayed below your page title in search results. If Google find your description too short or not relevant, it will generate it's own description, based on your page content. This generated description will be less accurate than a good writen description." 
        
                                       ), 
        
                                   } 
        
                               ) 
        
                           elif ( 
        
                               len(tag.attrs["content"]) 
        
                               > settings.SEO_SETTINGS["meta_description_length"][1] 
        
                           ): 
        
                               self.problems.append( 
        
                                   { 
        
                                       "name": _("Meta description is too long"), 
        
                                       "settings": _("needed"), 
        
                                       "description": _( 
        
                                           "Meta description can be displayed below your page title in search results. If Google find your description too long, it may crop it and your potential visitors will not be able to read all its content. Sometimes, long pertinent meta descriptions will be displayed, but in the vast majority of the results, the description's lengths are 150-170 chars." 
        
                                       ), 
        
                                   } 
        
                               ) 
        
                           occurence = [] 
        
                           for keyword in self.keywords: 
        
                               occurence.append( 
        
                                   sum( 
        
                                       1 
        
                                       for _ in re.finditer( 
        
                                           r"\b%s\b" % re.escape(keyword.lower()), 
        
                                           tag.attrs["content"].lower(), 
        
                                       ) 
        
                                   ) 
        
                               ) 
        
                           # if no keyword is found in h1 
        
                           if not any(i > 0 for i in occurence): 
        
                               self.warnings.append( 
        
                                   { 
        
                                       "name": _("No keyword in meta description"), 
        
                                       "settings": _("at least 1"), 
        
                                       "description": _( 
        
                                           "Meta description is not used by search engines to calculate the rank of the page, but users will read it (if the meta description is selected by Google). The bonus point is that Google will put the keywords searched by the users in bold, so the users can eaily verify that the content of your page fit their needs." 
        
                                       ), 
        
                                   } 
        
                               ) 
        
                           return 
        
                   self.problems.append( 
        
                       { 
        
                           "name": _("No meta description"), 
        
                           "settings": _("needed"), 
        
                           "description": _( 
        
                               'Even if search engines states that they don\'t use meta description for ranking (<a href="https://webmasters.googleblog.com/2009/09/google-does-not-use-keywords-meta-tag.html">source</a>), they can be displayed below the title of your page in search results. Since search engines uses users clics to rank your website, an appealing description can make the difference.<br />Google has affirmed that they display a shorter text (~155 chars) below the title of the page (<a href="https://twitter.com/dannysullivan/status/996065145443893249">source</a>).' 
        
                           ), 
        
                       } 
        
                   ) 
        
               def check_links(self): 
        
                   """Check all link-related conditions 
        
                   """ 
        
                   links = self.content.find_all("a") 
        
                   internal_links = 0 
        
                   external_links = 0 
        
                   for link in links: 
        
                       # internal links = absolute links that contains domain name or relative links 
        
                       if os.environ["DOMAIN_NAME"] in link["href"] or not link["href"].startswith( 
        
                           "http" 
        
                       ): 
        
                           internal_links += 1 
        
                       else: 
        
                           external_links += 1 
        
                   # not enough internal links 
        
                   if internal_links < settings.SEO_SETTINGS["internal_links"][0]: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("Not enough internal links"), 
        
                               "settings": "&ge;{}".format( 
        
                                   settings.SEO_SETTINGS["internal_links"][0] 
        
                               ), 
        
                               "description": _( 
        
                                   "Internal links are useful because they link your content and can give any search engine the structure of your website, so they can create a hierarchy of your pages." 
        
                               ), 
        
                           } 
        
                       ) 
        
                   # too much internal links 
        
                   if internal_links > settings.SEO_SETTINGS["internal_links"][1]: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("Too many internal links"), 
        
                               "settings": "&le;{}".format( 
        
                                   settings.SEO_SETTINGS["internal_links"][1] 
        
                               ), 
        
                               "description": _( 
        
                                   'Google is vague about the max number of internal links on your site. <a href="https://neilpatel.com/blog/commandments-of-internal-linking/">Neil Patel</a> advises 3 to 4 internal links in the content of your page (excluding header/footer), but he says that you can go up to 10-20 links if your content is long enough.' 
        
                               ), 
        
                           } 
        
                       ) 
        
                   # not enough external links 
        
                   if external_links < settings.SEO_SETTINGS["external_links"][0]: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("Not enough external links"), 
        
                               "settings": "&ge;{}".format( 
        
                                   settings.SEO_SETTINGS["external_links"][0] 
        
                               ), 
        
                               "description": _( 
        
                                   'Some recent SEO-related articles advise you to add some external links to help SEO on other websites (<a href="https://yoast.com/outbound-links/">source</a>) while at the other end an old (2015) study found that links to websites with an high authority help incresing websites ranking (<a href="https://www.rebootonline.com/blog/long-term-outgoing-link-experiment/">source</a>).' 
        
                               ), 
        
                           } 
        
                       ) 
        
                   # too much external links 
        
                   if external_links > settings.SEO_SETTINGS["external_links"][1]: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("Too many external links"), 
        
                               "settings": "&le;{}".format( 
        
                                   settings.SEO_SETTINGS["external_links"][1] 
        
                               ), 
        
                               "description": _( 
        
                                   '"Thanks to updates like Google Penguin, Google now focuses on link quality (not just link quantity)". There\'s no need to have too many external links on your main content, but the reputation of the websites you are linking to is important.' 
        
                               ), 
        
                           } 
        
                       ) 
        
               def check_keyword_occurence(self): 
        
                   """Check if one of the keywords is present between keywords_repeat[0] & keywords_repeat[1] in the page. If no keywords is in this range, then will fire a problem. 
        
                   no case sensitive (keyword & text are lowered before comparison). 
        
                   Thx https://stackoverflow.com/a/17268979/6813732 for finditer. 
        
                   """ 
        
                   occurence = [] 
        
                   for keyword in self.keywords: 
        
                       occurence.append( 
        
                           sum( 
        
                               1 
        
                               for _ in re.finditer( 
        
                                   r"\b%s\b" % re.escape(keyword.lower()), 
        
                                   self.content.text.lower(), 
        
                               ) 
        
                           ) 
        
                       ) 
        
                   if not occurence: 
        
                       occurence = [0] 
        
                   content = re.findall(r"\w+", self.content.text.lower()) 
        
                   nb_words = len(content) if len(content) > 0 else 1 
        
                   # if no keyword is repeated more than ["keywords_repeat"][0] % 
        
                   if not any( 
        
                       i / nb_words >= settings.SEO_SETTINGS["keywords_repeat"][0] 
        
                       for i in occurence 
        
                   ): 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("Not enough keyword occurences"), 
        
                               "settings": "&ge;{min}%, max found is {actual:.2f}% ({actual_nb} times)".format( 
        
                                   min=settings.SEO_SETTINGS["keywords_repeat"][0] * 100, 
        
                                   actual=max(occurence) / nb_words, 
        
                                   actual_nb=max(occurence), 
        
                               ), 
        
                               "description": _( 
        
                                   'Presence of keywords are important for search engines like Google, who will "understand" what your content is about, and will better serve your page in answer to structured queries that uses your keywords.' 
        
                               ), 
        
                           } 
        
                       ) 
        
                   # there is at least 1 keyword that is repeated > ["keywords_repeat"][0] 
        
                   else: 
        
                       # there is at least 1 keyword that is repeated > ["keywords_repeat"][1] 
        
                       if not all( 
        
                           i / nb_words <= settings.SEO_SETTINGS["keywords_repeat"][1] 
        
                           for i in occurence 
        
                       ): 
        
                           self.problems.append( 
        
                               { 
        
                                   "name": _("Too many keyword occurences"), 
        
                                   # settings: ≤5, found X "keyword" 
        
                                   "settings": '&le;{max}%, found {actual:.2f}% ({actual_nb} times) of "{kw}"'.format( 
        
                                       max=settings.SEO_SETTINGS["keywords_repeat"][1] * 100, 
        
                                       actual=max(occurence) / nb_words * 100, 
        
                                       actual_nb=max(occurence), 
        
                                       kw=self.keywords[occurence.index(max(occurence))], 
        
                                   ), 
        
                                   "description": _( 
        
                                       "Some SEO websites advise you to get 1% of your words to be keywords. For other websites (like Yoast) it's 0.25-0.5%. We use a constant for keywords repetition. Too many keywords on a page will lead search engines to think that you're doing some keyword stuffing (put too many keywords in order to manipulate the page rank)." 
        
                                   ), 
        
                               } 
        
                           ) 
        
               def check_keyword_url(self): 
        
                   """Check presence of keywords in url 
        
                   """ 
        
                   for keyword in self.keywords: 
        
                       if keyword in self.full_url: 
        
                           return 
        
                   self.problems.append( 
        
                       { 
        
                           "name": _("No keyword in URL"), 
        
                           "settings": _("at least 1"), 
        
                           "description": _( 
        
                               'Keywords in URL are a small ranking factor for Google (<a href="https://twitter.com/JohnMu/status/1070634500022001666">source</a>), but it will help your users understand the organisation of your website (/?product=50 talk less than /products/camping/). On the other hand Bing says : "<i>URL structure and keyword usage - keep it clean and keyword rich when possible</i>" (<a href="https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a">source</a>).' 
        
                           ), 
        
                       } 
        
                   ) 
        
               def check_h1(self): 
        
                   """Check all h1-related conditions 
        
                   """ 
        
                   h1 = self.soup.find_all("h1") 
        
                   if len(h1) > 1: 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("Too much h1 tags"), 
        
                               "settings": _("exactly 1"), 
        
                               "description": _( 
        
                                   'Google has told that they do not consider using multiple h1 a bad thing (<a href="https://www.youtube.com/watch?v=WsgrSxCmMbM">source</a>), but Google is not the unique search engine out there. Bing webmaster guidelines says "Use only one <h1> tag per page".' 
        
                               ), 
        
                           } 
        
                       ) 
        
                   elif not h1: 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("No h1 tag"), 
        
                               "settings": _("exactly 1"), 
        
                               "description": _( 
        
                                   "H1 is the most visually notable content of your page for your users, and is one of the most important ranking factor for search engines. A good h1 tag content is required in order to progress in SERP." 
        
                               ), 
        
                           } 
        
                       ) 
        
                   else: 
        
                       occurence = [] 
        
                       for keyword in self.keywords: 
        
                           for single_h1 in h1: 
        
                               occurence.append( 
        
                                   sum( 
        
                                       1 
        
                                       for _ in re.finditer( 
        
                                           r"\b%s\b" % re.escape(keyword.lower()), 
        
                                           single_h1.text.lower(), 
        
                                       ) 
        
                                   ) 
        
                               ) 
        
                       # if no keyword is found in h1 
        
                       if not any(i > 0 for i in occurence): 
        
                           self.problems.append( 
        
                               { 
        
                                   "name": _("No keyword in h1"), 
        
                                   "settings": _("at least 1"), 
        
                                   "description": _( 
        
                                       "H1 are crawled by search engines as the title of your page. You may populate them with appropriate content in order to be sure that search engines correctly understand what your pages are all about." 
        
                                   ), 
        
                               } 
        
                           ) 
        
               def check_h2(self): 
        
                   h2 = self.soup.find_all("h2") 
        
                   if not h2: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("No h2 tag"), 
        
                               "settings": _("at least 1"), 
        
                               "description": _( 
        
                                   'H2 tags are useful because they are explored by search engines and can help them understand the subject of your page (<a href="https://robsnell.com/matt-cutts-transcript.html">source</a>). It\'s a "section title", so every time you start talking about a new topic, you can put an h2 tag, which will explain what the content will be about.' 
        
                               ), 
        
                           } 
        
                       ) 
        
                   else: 
        
                       occurence = [] 
        
                       # check if each keyword 
        
                       for keyword in self.keywords: 
        
                           # is present at least 
        
                           for single_h2 in h2: 
        
                               occurence.append( 
        
                                   sum( 
        
                                       1 
        
                                       for _ in re.finditer( 
        
                                           r"\b%s\b" % re.escape(keyword.lower()), 
        
                                           single_h2.text.lower(), 
        
                                       ) 
        
                                   ) 
        
                               ) 
        
                       # if no keyword is found in h2 
        
                       if not any(i > 0 for i in occurence): 
        
                           self.warnings.append( 
        
                               { 
        
                                   "name": _("No keyword in h2"), 
        
                                   "settings": _("at least 1"), 
        
                                   "description": _( 
        
                                       'Matt Cutts (creator of Google SafeSearch) <a href="https://robsnell.com/matt-cutts-transcript.html">stated in 2009</a> that "[...] we use things in the title, things in the URL, even things that are really highlighted, like h2 tags and stuff like that. ". Even if there is not really a more recent acknowledgement, h2 titles are important (but maybe not as important as h1 & title tags).' 
        
                                   ), 
        
                               } 
        
                           ) 
        
               def check_images(self): 
        
                   images = self.content.find_all("img") 
        
                   for image in images: 
        
                       if "alt" not in image.attrs or image.attrs["alt"] == "None": 
        
                           self.problems.append( 
        
                               { 
        
                                   "name": _("Img lack alt tag"), 
        
                                   "settings": _("all images"), 
        
                                   "description": _( 
        
                                       'Your images should always have an alt tag, because it improves accessibility for visually impaired people.<br />The name of your image is important too, because Google will look at it to know what the picture is about (<a href="https://support.google.com/webmasters/answer/114016">source</a>).<br /><a href="{img_url}">This is the image</a> without alt tag.'.format( 
        
                                           img_url=image.attrs["src"] 
        
                                       ) 
        
                                   ), 
        
                               } 
        
                           ) 
        
               def check_url(self): 
        
                   """All the url-related checks. 
        
                   """ 
        
                   # check url depth 
        
                   # do not count first slash after domain name, nor // in the "http://" 
        
                   url_without_two_points_slash_slash = self.full_url.replace("://", "") 
        
                   number_of_slashes = url_without_two_points_slash_slash.count("/") - 1 
        
                   if number_of_slashes > settings.SEO_SETTINGS["max_link_depth"]: 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("Too many levels in path"), 
        
                               "settings": "&le;{settings}, found {path_depth}".format( 
        
                                   settings=settings.SEO_SETTINGS["max_link_depth"], 
        
                                   path_depth=number_of_slashes, 
        
                               ), 
        
                               "description": _( 
        
                                   'Google recommand to organize your content by adding depth in your url, but advises against putting too much repertories (<a href="https://support.google.com/webmasters/answer/7451184">source</a>).<br />Yoast says that "In a perfect world, we would place everything in one sublevel at most. Today, many sites use secondary menus to accommodate for additional content" (<a href="https://yoast.com/how-to-clean-site-structure/">source</a>).' 
        
                               ), 
        
                           } 
        
                       ) 
        
                   # check url length 
        
                   url_without_protocol = self.full_url.replace("http://", "").replace( 
        
                       "https://", "" 
        
                   ) 
        
                   if len(url_without_protocol) > settings.SEO_SETTINGS["max_url_length"]: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("URL is too long"), 
        
                               "settings": "&le;{settings}, found {len_url} chars".format( 
        
                                   settings=settings.SEO_SETTINGS["max_url_length"], 
        
                                   len_url=len(url_without_protocol), 
        
                               ), 
        
                               "description": _( 
        
                                   'A study from 2016 found a correlation between URL length & ranking (<a href="https://backlinko.com/search-engine-ranking">source</a>).' 
        
                               ), 
        
                           } 
        
                       ) 
        
               def keyword_present_in_first_paragraph(self): 
        
                   """Get [keywords_in_first_words] first words of the content, and ensure that there is a keyword among them. 
        
                   """ 
        
                   content = self.content.text.lower().split()[ 
        
                       : settings.SEO_SETTINGS["keywords_in_first_words"] 
        
                   ] 
        
                   for keyword in self.keywords: 
        
                       if keyword in content: 
        
                           return 
        
                   self.problems.append( 
        
                       { 
        
                           "name": _("No keyword in first sentence"), 
        
                           "settings": "before {settings} words".format( 
        
                               settings=settings.SEO_SETTINGS["keywords_in_first_words"] 
        
                           ), 
        
                           "description": _( 
        
                               'Yoast advises to put a keyword in the first sentence of your content. The person who reads it will be relieved because he will quickly retrieve the keyword he was looking for (<a href="https://yoast.com/text-structure-important-seo/">source</a>).' 
        
                           ), 
        
                       } 
        
                   ) 
        
               def count_words_without_stopwords(self): 
        
                   """[summary] 
        
                   """ 
        
                   content = re.findall(r"\w+", self.content.text.lower()) 
        
                   nb_words = len(content) 
        
                   # too few words 
        
                   if nb_words < settings.SEO_SETTINGS["content_words_number"][0]: 
        
                       self.problems.append( 
        
                           { 
        
                               "name": _("Content is too short"), 
        
                               "settings": "at least {min} words, more than {min2} if possible, found {nb_words}".format( 
        
                                   min=settings.SEO_SETTINGS["content_words_number"][0], 
        
                                   min2=settings.SEO_SETTINGS["content_words_number"][1], 
        
                                   nb_words=nb_words, 
        
                               ), 
        
                               "description": _( 
        
                                   'Yoast provide us some knowledge : "A blog post should contain at least 300 words in order to rank well in the search engines. Long posts will rank more easily than short posts. However, long posts require strong writing skills" (<a href="https://yoast.com/blog-post-length/">source</a>).<br />An article from Forbes from 2017 says that "<i>content with 1,000 words or more tends to attract significantly more links and shares</i>", and "<i>the average content length for top 3 rankings was about 750 words, while the average content length for position 20 rankings was about 500 words</i>" (<a href="https://web.archive.org/web/20190708230659/http://www.forbes.com/">source</a>).' 
        
                               ), 
        
                           } 
        
                       ) 
        
                   elif nb_words < settings.SEO_SETTINGS["content_words_number"][1]: 
        
                       self.warnings.append( 
        
                           { 
        
                               "name": _("Content is too short"), 
        
                               "settings": "at least {min} words, more than {min2} if possible, found {nb_words}".format( 
        
                                   min=settings.SEO_SETTINGS["content_words_number"][0], 
        
                                   min2=settings.SEO_SETTINGS["content_words_number"][1], 
        
                                   nb_words=nb_words, 
        
                               ), 
        
                               "description": _( 
        
                                   'Yoast provide us some knowledge : "A blog post should contain at least 300 words in order to rank well in the search engines. Long posts will rank more easily than short posts. However, long posts require strong writing skills" (<a href="https://yoast.com/blog-post-length/">source</a>).<br />An article from Forbes from 2017 says that "<i>content with 1,000 words or more tends to attract significantly more links and shares</i>", and "<i>the average content length for top 3 rankings was about 750 words, while the average content length for position 20 rankings was about 500 words</i>" (<a href="https://web.archive.org/web/20190708230659/https://www.forbes.com/sites/jaysondemers/2017/07/18/how-long-should-your-content-be-for-optimal-seo/2">source</a>).' 
        
                               ), 
        
                           } 
        
                       )

The ideal implementation should be a config parameter with a list of checks files and a Class with a few information (html of the page, content of the page, url, keywords...).

Each check will be in a separate file, with some information available through the class.

Send DJANGO_CHECK_SEO_AUTH along with the request in case of redirect [3xx]

Hi!

First of all, I would like to thank you for the great work! This is a very helpful package and It saved us a lot of time on enhancing the SEO of our posts.

Recently, we started using the DJANGO_CHECK_SEO_AUTH to use auth credentials along with the request. But for reasons we don't fully control, the DNS server is redirecting us to another URL.

In this scenario, because of a characteristic of the requests package, the HTTP_AUTHORATION is not sent in the redirect request (after a 3xx) and we end up with a 404 (the view wasn't able to authenticate the request).

I was wondering if the request made by django_check_seo could check for a 3xx
status code and follow the new URL location with the DJANGO_CHECK_SEO_AUTH information.

Here is a simple example:

import requests

r = requests.get(
    URL, 
    auth=(
        settings.DJANGO_CHECK_SEO_AUTH["user"],
        settings.DJANGO_CHECK_SEO_AUTH["pass"],
     ),
    allow_redirects=False
)

if 300 < r.status_code < 400:
    r = requests.get(
    r.headers['location'], 
    auth=(
        settings.DJANGO_CHECK_SEO_AUTH["user"],
        settings.DJANGO_CHECK_SEO_AUTH["pass"],
     ),
)

Versions

Python 3.6.8
django-check-seo==0.3.6
requests==2.18.4

Thank you very much,
[]s

lower all text in tests

Keywords can be "postgresql", and actual word used in text can be "PostgreSQL".
Check all files to replace soup.element.string to soup.element.string.lower()

Max retries exceeded

I deployed a djangocms in heroku and checked the SEO check but it returned this error HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: //astaqc-djangocms.herokuapp.com//en/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f45e24480b8>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Here is the screenshot

Add informational pop-ups for technical details

Users may not know where to modify the information.
We could add a button that would open a popup with an instruction manual.

Actual look:

Desired result:
Just add a "guide" or "manual" entry in the CustomList object and add a pop-up in the view:

Exclude certain images from checks

We don't want to alert about missing alt tags for certain images (like a tracking pixel, or something like this). We could add a list of urls to ignore in the alt tag check.

Change status of internal & external links

Since the number of internal links can vary greatly depending on the subject treated, the default limits are "between 1-15" for internal links, and "between 1-5" for external links.

In addition, not having a number of links in these ranges should not trigger a problem, only a warning.

Running Without CMS or Installation errors

I am not sure whether this will be helpfull to others, but this is what I did after going through some issues while installing this.

You can follow the installation, and after installation make sure this are the settings present.

add 'django.contrib.sites' to INSTALLED_APPS
SITE_ID = 1

This step is important if running without CMS

Go to template 'default.html' in site packages of installed apps
remove 'cms_tag'
save it
access your page seo by,
>. base.domain/django-check-seo/?page=/page_extnsn

Hope this helps!

it is possible to use without djangocms?

it is possible to use without djangocms?
I have a handmade website, and I wanted to see how my SEO is if it is correct or has problems, open some way to install and use this package for sites that are not built from djangocms

Accentuated char in URL throw exception

If the url contain an accentuated character then the cms populate function will throw an exception.

Add categories to checks

Currently, all checks are processed in the same way and are displayed in a large list.
A cool thing to add could be a category tag for checks, which could add readability to lists.

The category will be accessible via the Site instance in each check, and it will be of the form "retrieve the list according to its name, or create a new list with the new name".

Old display:

New display:

InsecureRequestWarning

When using django-check-seo, we can have some logs that says this:

InsecureRequestWarning,
/home/me/projets/projectname/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py:1004:
InsecureRequestWarning: Unverified HTTPS request is being made to host 'my-https-django-website.ext'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

We should redirect http requests to https ones. And django-check-seo is used in order to analyse public content, that is available all over the web.

Change the way keywords occurences checks work

Actual way:

Check if at least one keyword is repeated at least keywords_repeat[0] times.
Check if all keywords are not repeated more than keywords_repeat[1] times.

New way: check occurrences using percentages of keywords found in total number of words of the content.

Check if at least one keyword is repeated more than keywords_repeat[0] (0.25%).
Check if all keywords are not repeated more thant keywords_repeat[1] (1%).

Treat missing alt attributes as Warning instead of Problem

Describe the bug
An <img> missing an alt attribute is filed as Problem.

According to WCAG, some images must not have alt attributes:

Sometimes there is non-text content that really is not meant to be seen or understood by the user. Transparent images used to move text over on a page; an invisible image that is used to track usage statistics; and a swirl in the corner that conveys no information but just fills up a blank space to create an aesthetic effect are all examples of this. Putting alternative text on such items just distracts people using screen readers from the content on the page. Not marking the content in any way, though, leaves users guessing what the non-text content is and what information they may have missed (even though they have not missed anything in reality). This type of non-text content, therefore, is marked or implemented in a way that assistive technologies (AT) will ignore it and not present anything to the user.

Expected behavior
File missing alt attributes as warnings and explain why and how to fill alt attributes.
Quotes and links to WCAG (and the RGAA french equivalent) should be present.

meta description searched_in error

A meta-description is shown for each keyword present in meta keywords:

We want to show only one meta description "searched_in" content if there is only one meta description:

slugify urls and keywords

Currently only keywords are slugified when checking presence of keywords in urls.

If the url contain accented characters they will not be slugified.

Ex:

keyword: keyword éé
slugified keyword: keyword-ee
url: https://dom.ext/fr/my-url-which-contain-keyword-éé/

-> slugified keyword is not in url since keyword-ee ≠ keyword-éé

Absolute links are throwing errors.

Describe the bug
Error occurs when using an absolute link (/link) during check_links check.

Expected behavior
check_links should append absolute url to domain name.

Bonus
check_links should check if the site.full_url contains env var DOMAIN_NAME to create the domain url (so external tests like http://domain/django-check-seo/?page=https://kapt.mobi should not show any errors).

Add config option to allow request module to access protected websites.

Adding a config option to allow request module to access protected websites can be useful when:

a website is protected with a HTTP Authentication system, and django-check-seo is installed on this website:
the user will fill his/her credentials so the website will become accessible, but django-check-seo will only get the 401 message

Meta description check is not working

When there's no meta description on a page, the check returns "params: only one, found: one, searched in: no data".

DJANGO_CHECK_SEO_EXCLUDE_CONTENT does not exclude content in conditional comments

Describe the bug
The content inside html conditional comments is not excluded even though it is referenced in DJANGO_CHECK_SEO_EXCLUDE_CONTENT setting.

To Reproduce
Steps to reproduce the behavior:

Considering the following template:

<body>
  <!--[if lt IE 8]>
    <p class="catch-me-if-you-can"></p>
  <![endif]-->
</body>

Add DJANGO_CHECK_SEO_EXCLUDE_CONTENT = ".catch-me-if-you-can" in your settings
Click on "Check SEO" toolbar button
Check the "Raw data" section
See that .catch-me-if-you-can HTML tag is not excluded

Expected behavior
.catch-me-if-you-can HTML tag is excluded.

Lack of affordance

Some users do not know that they can click on a check result to show more informations:

check result lines have no affordance (no button, no link, no icon)
mouse cursor when hovering a check result line, is not usual

Update of class organization

If we move the Site class out of the views.py file and place it in a checks/ folder (after renaming the checks/ folder containing all control files to checks_list/), we could create a new CustomList class, which will be a new way to organize the textual data of the checks.

new organization:

old way to add a new problem/warning:

new way:

(checks code here)

No H1 found when in <nav> tag

Describe the bug
When the H1 is in a <nav> tag, django-check-seo does not find it.

<body>
  <nav>
    <h1>not found</h1>
  </nav>
</body>

All people with the url can see the seo page

Describe the bug
Only connected administrators should be able to see the django-check-seo page.

To Reproduce
Grab the url (the thing that looks like http://mysite/django-check-seo/?page=/), and open a private browsing tab.

Expected behavior
The non-connected user should be redirected to a login screen.

Screenshots

bug:

expected behavior:

Make checks really flexible

Currently checks are only searched inside checks/ folder in django-check-seo/ folder (deep inside .venv).

Maybe add a custom path to look for other checks in the project ?

Svg title tags are counted as meta title tags

Problem when using django check seo for django 3+ sites

Here's the problem:

https://github.com/kapt-labs/django-check-seo/blob/master/django_check_seo/urls.py#L11

maybe we should check not version.startswith('1') ?

Change default values for internal & external links

If you compare our old guideline of 100 links and you look at what the web looks like now, it's quite common to have 200 or 300 or 400 links on a page, as long as the page is long, it has value add, there's substantial amount of substance and real stuff on that page.
So the short answer is really not to worry about it, or not to limit yourself to 100 links anymore.
Matt Cutts - https://www.youtube.com/watch?v=QHG6BkmzDEM

The number of links on a page can vary considerably, we just need to make sure that the page has at least one link.

Django check seo search in wrong content

Currently, django-check-seo is searching the content in the <div class="container"></div> tag.

It could be a problem if the main content of the pages of the crawled website is in another tag (like <main> or <div class="cms_main"> ...).

We should provide a setting where we could select a tag/class/list of tags/classes to search content in.

Errors due to escaped special chars

Say you have this keyword:

that's awesome

Then your meta keywords will maybe contain this:

that&#39;s awesome

But your html content will contain this:

[...] and that's awesome!

Django check seo does not unescape content in keywords or in meta description (and that's cool because there can be a XSS). However, for websites who escapes specials chars in meta keywords/description tags, maybe we could use a list of authorized chars in the settings, and unescape only the keywords/description tags, like this:

DJANGO_CHECK_SEO_UNESCAPE_AUTHORIZED_CHARS = ["'", "\"", "!", "and", "so", "on", "..."]

I don't really know what to do regarding this issue for now. Its way simpler to just fix the escaping in keywords & description tags.

Add style to results

The goal here is to display data differently than the classic "problem name" - "found X" - "description".

Here's three screenshots, which one display a better interface ?

django-check-seo

audit page in google chrome dev tools

semrush

Distinguish English terms in the French translation

Sometimes some technical terms are kept in English in the names/descriptions of checks, because their French equivalent does not exist.
It would be better to show that they are written in English.

Stop words are not really necessary

It seems that all SEO-related websites that speaks about stop words don't have any credible source.

We will not use stop words in the count_words function.

Show what's wrong in the "searched in" content

Currently there is not so much informations on what's wrong directly in the "searched in" content:

It seems to be a good idea to show what's wrong (like in this picture; the URL is too deep):

Failed parse

Describe the bug
Failed requests.exceptions.InvalidURL: Failed to parse: http://127.0.0.1:8000b'/fr/'

To Reproduce
Django<3.1
django-cms>=3.7,<3.8
django-check-seo==0.3.6

Expected behavior
Expected http://127.0.0.1:8000/fr/

Desktop :

Python 3.8.5

** Temporary Fix **
Override IndexView and add at line 30 :

        page = self.request.GET.get("page", None)
        page = page[2:(len(page) - 1)]
        full_url = (
                protocol
                + Site.objects.get_current().domain
                + page
        )

Add the browsed content to the check results

Before:

After:

In-a-perfect-world after:

(because explicit is better than implicit)

Display keywords

Describe the solution you'd like
Currently, keywords are not displayed on the django-check-seo page, and sometimes we need them to check something.

A good improvement could be to display them on the upper part of the page.

Additional context

Footer is present in content

When we have a page like this:

...
<footer>
  <div class="container">
    Footer
  </div>
</footer>

The self.content var in Site Class only retrieved the <div class="container">...</div> block, and so it is impossible to tell if this item in self.content:

...
<div class="container">
  Footer
</div>

is a footer that need to be removed.

nav menu is in content

django-check-seo should not include things like header, menu & footer in the core content of the page, but on some websites that's actually the case, leading to the "no keyword found in first §" problem.

keywords with special chars are not found in h1

If a keyword contain a special char it will not be found in the h1 tag.

Mention that the check is done on the public version of the page

It may be helpful to add a mention that the current check is done on the public version of the page (and not the draft).

For example:

7 problèmes trouvés, et 2 avertissements levés sur la page publique

Slugify keywords in checks

The goal here is to standardizes the way keywords are processed to facilitate the detection of keywords in content.

Ex:

Find keywords in url -> use slugify to effectively compare keywords and url.

Rework descriptions of checks

Transform something like
"Meta description can be displayed below your page title in search results. If Google find your description too long, it may crop it and your potential visitors will not be able to read all its content. Sometimes, long pertinent meta descriptions will be displayed, but in the vast majority of the results, the description's lengths are 150-170 chars."
to
"The meta description tag can be displayed in search results if it has the right length, and can influence users. And Google classifies sites according to user behaviour."

Alt tags for images in links are not displayed

Ex of links:

Content of the middle empty link:

If content is empty (img with no alt tag), we should display the html tag itself.

Allow the application to be installed from github

Right now, the installation method is to "copy a bunch of files into your project folder, then edit all of them, then spend three in three days debugging this buggy application".

The new approach should be the same as for all other django applications;

install application using pipenv or pip inside a virtualenv,
update your settings.py with values,
add values into your urls.py,
add a custom file in your project_folder/ to create the button to launch this application.

Add dependancies in setup.cfg

User currently need to manually install some packages that django-check-seo depends on, or to manually add them to their Pipfile.

But Setup Configuration File syntax include the declaration of dependencies:

[options]
include_package_data = true
packages = find:
install_requires =
    packagename>=version
    otherpackagename

I think the right way to treat dependencies is to let pipenv (or pip) do it for us.

Old way:

New way:

h2 wrong number "found"

"trouvé" value should not be "1"

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	class DjangoCheckSeo:
	def __init__(self, soup, full_url):
	"""Populate some vars.

	Arguments:
	soup {bs4.element} -- beautiful soup content (html)
	"""
	self.soup = soup
	# Get content of the page (exclude header/footer)
	self.content = self.soup.find("div", {"class": "container"})
	# remove ul with nav class from content (<ul class="nav"> is the menu)
	self.content.find("ul", {"class": "nav"}).extract()
	self.full_url = full_url
	self.keywords = []
	self.problems = []
	self.warnings = []

	def check(self):
	"""Magic happens here.

	Returns:
	tuple -- Two arrays of dict of form {name, settings, description}.
	"""
	self.check_keywords()
	self.check_title()
	self.check_description()
	self.check_links()
	self.check_keyword_occurence()
	self.check_keyword_url()
	self.check_h1()
	self.check_h2()
	self.check_images()
	self.check_url()
	self.keyword_present_in_first_paragraph()
	self.count_words_without_stopwords()

	return (self.problems, self.warnings)

	def check_keywords(self):
	"""First check to ensure that keywords are present.
	"""
	meta = self.soup.find_all("meta")
	for tag in meta:
	if tag.attrs["name"] == "keywords" and tag.attrs["content"] != "":
	# get keywords for next checks
	self.keywords = tag.attrs["content"].split(
	", "
	) # may be dangerous to hard code the case where keywords are separated with a comma and two spaces
	return
	self.problems.append(
	{
	"name": _("No meta keywords"),
	"settings": _("at least 1"),
	"description": _(
	"Meta keywords were important in this meta tag, however django-check-seo uses these keywords to check all other tests related to keywords. You will be flooded with problems and warnings and this SEO tool will not work as well as it should if you don't add some keywords."
	),
	}
	)

	def check_title(self):
	"""Check all title-related conditions.
	"""
	# title presence
	if self.soup.title == "None":
	self.problems.append(
	{
	"name": _("No title tag"),
	"settings": _("at least 1"),
	"description": _(
	"Titles tags are ones of the most important things to add to your pages, sinces they are the main text displayed on result search pages."
	),
	}
	)
	return

	# title length too short
	if len(self.soup.title.string) < settings.SEO_SETTINGS["meta_title_length"][0]:
	self.problems.append(
	{
	"name": _("Title tag is too short"),
	"settings": "≥{}".format(
	settings.SEO_SETTINGS["meta_title_length"][0]
	),
	"description": _(
	"Titles tags need to describe the content of the page, and need to contain at least a few words."
	),
	}
	)

	# title length too long
	if len(self.soup.title.string) > settings.SEO_SETTINGS["meta_title_length"][1]:
	self.warnings.append(
	{
	"name": _("Title tag is too long"),
	"settings": "≤{}".format(
	settings.SEO_SETTINGS["meta_title_length"][1]
	),
	"description": _(
	"Only the first ~55-60 chars are displayed on modern search engines results. Writing a longer title is not really required and can lead to make the user miss informations."
	),
	}
	)

	title_words = self.soup.title.string.split()

	# title do not contain any keyword
	if set(self.keywords).isdisjoint(set(title_words)):
	self.problems.append(
	{
	"name": _("Title do not contain any keyword"),
	"settings": _("at least 1"),
	"description": _(
	"Titles tags need to contain at least one keyword, since they are one of the most important content of the page for search engines."
	),
	}
	)

	def check_description(self):
	meta = self.soup.find_all("meta")
	for tag in meta:
	if tag.attrs["name"] == "description" and tag.attrs["content"] != "":
	if (
	len(tag.attrs["content"])
	< settings.SEO_SETTINGS["meta_description_length"][0]
	):
	self.problems.append(
	{
	"name": _("Meta description is too short"),
	"settings": _("needed"),
	"description": _(
	"Meta description can be displayed below your page title in search results. If Google find your description too short or not relevant, it will generate it's own description, based on your page content. This generated description will be less accurate than a good writen description."
	),
	}
	)
	elif (
	len(tag.attrs["content"])
	> settings.SEO_SETTINGS["meta_description_length"][1]
	):
	self.problems.append(
	{
	"name": _("Meta description is too long"),
	"settings": _("needed"),
	"description": _(
	"Meta description can be displayed below your page title in search results. If Google find your description too long, it may crop it and your potential visitors will not be able to read all its content. Sometimes, long pertinent meta descriptions will be displayed, but in the vast majority of the results, the description's lengths are 150-170 chars."
	),
	}
	)

	occurence = []
	for keyword in self.keywords:
	occurence.append(
	sum(
	1
	for _ in re.finditer(
	r"\b%s\b" % re.escape(keyword.lower()),
	tag.attrs["content"].lower(),
	)
	)
	)
	# if no keyword is found in h1
	if not any(i > 0 for i in occurence):
	self.warnings.append(
	{
	"name": _("No keyword in meta description"),
	"settings": _("at least 1"),
	"description": _(
	"Meta description is not used by search engines to calculate the rank of the page, but users will read it (if the meta description is selected by Google). The bonus point is that Google will put the keywords searched by the users in bold, so the users can eaily verify that the content of your page fit their needs."
	),
	}
	)

	return
	self.problems.append(
	{
	"name": _("No meta description"),
	"settings": _("needed"),
	"description": _(
	'Even if search engines states that they don\'t use meta description for ranking (<a href="https://webmasters.googleblog.com/2009/09/google-does-not-use-keywords-meta-tag.html">source</a>), they can be displayed below the title of your page in search results. Since search engines uses users clics to rank your website, an appealing description can make the difference.<br />Google has affirmed that they display a shorter text (~155 chars) below the title of the page (<a href="https://twitter.com/dannysullivan/status/996065145443893249">source</a>).'
	),
	}
	)

	def check_links(self):
	"""Check all link-related conditions
	"""
	links = self.content.find_all("a")
	internal_links = 0
	external_links = 0

	for link in links:
	# internal links = absolute links that contains domain name or relative links
	if os.environ["DOMAIN_NAME"] in link["href"] or not link["href"].startswith(
	"http"
	):
	internal_links += 1
	else:
	external_links += 1

	# not enough internal links
	if internal_links < settings.SEO_SETTINGS["internal_links"][0]:
	self.warnings.append(
	{
	"name": _("Not enough internal links"),
	"settings": "≥{}".format(
	settings.SEO_SETTINGS["internal_links"][0]
	),
	"description": _(
	"Internal links are useful because they link your content and can give any search engine the structure of your website, so they can create a hierarchy of your pages."
	),
	}
	)

	# too much internal links
	if internal_links > settings.SEO_SETTINGS["internal_links"][1]:
	self.warnings.append(
	{
	"name": _("Too many internal links"),
	"settings": "≤{}".format(
	settings.SEO_SETTINGS["internal_links"][1]
	),
	"description": _(
	'Google is vague about the max number of internal links on your site. <a href="https://neilpatel.com/blog/commandments-of-internal-linking/">Neil Patel</a> advises 3 to 4 internal links in the content of your page (excluding header/footer), but he says that you can go up to 10-20 links if your content is long enough.'
	),
	}
	)

	# not enough external links
	if external_links < settings.SEO_SETTINGS["external_links"][0]:
	self.warnings.append(
	{
	"name": _("Not enough external links"),
	"settings": "≥{}".format(
	settings.SEO_SETTINGS["external_links"][0]
	),
	"description": _(
	'Some recent SEO-related articles advise you to add some external links to help SEO on other websites (<a href="https://yoast.com/outbound-links/">source</a>) while at the other end an old (2015) study found that links to websites with an high authority help incresing websites ranking (<a href="https://www.rebootonline.com/blog/long-term-outgoing-link-experiment/">source</a>).'
	),
	}
	)

	# too much external links
	if external_links > settings.SEO_SETTINGS["external_links"][1]:
	self.warnings.append(
	{
	"name": _("Too many external links"),
	"settings": "≤{}".format(
	settings.SEO_SETTINGS["external_links"][1]
	),
	"description": _(
	'"Thanks to updates like Google Penguin, Google now focuses on link quality (not just link quantity)". There\'s no need to have too many external links on your main content, but the reputation of the websites you are linking to is important.'
	),
	}
	)

	def check_keyword_occurence(self):
	"""Check if one of the keywords is present between keywords_repeat[0] & keywords_repeat[1] in the page. If no keywords is in this range, then will fire a problem.
	no case sensitive (keyword & text are lowered before comparison).
	Thx https://stackoverflow.com/a/17268979/6813732 for finditer.
	"""
	occurence = []
	for keyword in self.keywords:
	occurence.append(
	sum(
	1
	for _ in re.finditer(
	r"\b%s\b" % re.escape(keyword.lower()),
	self.content.text.lower(),
	)
	)
	)
	if not occurence:
	occurence = [0]

	content = re.findall(r"\w+", self.content.text.lower())
	nb_words = len(content) if len(content) > 0 else 1

	# if no keyword is repeated more than ["keywords_repeat"][0] %
	if not any(
	i / nb_words >= settings.SEO_SETTINGS["keywords_repeat"][0]
	for i in occurence
	):
	self.problems.append(
	{
	"name": _("Not enough keyword occurences"),
	"settings": "≥{min}%, max found is {actual:.2f}% ({actual_nb} times)".format(
	min=settings.SEO_SETTINGS["keywords_repeat"][0] * 100,
	actual=max(occurence) / nb_words,
	actual_nb=max(occurence),
	),
	"description": _(
	'Presence of keywords are important for search engines like Google, who will "understand" what your content is about, and will better serve your page in answer to structured queries that uses your keywords.'
	),
	}
	)
	# there is at least 1 keyword that is repeated > ["keywords_repeat"][0]
	else:
	# there is at least 1 keyword that is repeated > ["keywords_repeat"][1]
	if not all(
	i / nb_words <= settings.SEO_SETTINGS["keywords_repeat"][1]
	for i in occurence
	):
	self.problems.append(
	{
	"name": _("Too many keyword occurences"),
	# settings: ≤5, found X "keyword"
	"settings": '≤{max}%, found {actual:.2f}% ({actual_nb} times) of "{kw}"'.format(
	max=settings.SEO_SETTINGS["keywords_repeat"][1] * 100,
	actual=max(occurence) / nb_words * 100,
	actual_nb=max(occurence),
	kw=self.keywords[occurence.index(max(occurence))],
	),
	"description": _(
	"Some SEO websites advise you to get 1% of your words to be keywords. For other websites (like Yoast) it's 0.25-0.5%. We use a constant for keywords repetition. Too many keywords on a page will lead search engines to think that you're doing some keyword stuffing (put too many keywords in order to manipulate the page rank)."
	),
	}
	)

	def check_keyword_url(self):
	"""Check presence of keywords in url
	"""
	for keyword in self.keywords:
	if keyword in self.full_url:
	return
	self.problems.append(
	{
	"name": _("No keyword in URL"),
	"settings": _("at least 1"),
	"description": _(
	'Keywords in URL are a small ranking factor for Google (<a href="https://twitter.com/JohnMu/status/1070634500022001666">source</a>), but it will help your users understand the organisation of your website (/?product=50 talk less than /products/camping/). On the other hand Bing says : "<i>URL structure and keyword usage - keep it clean and keyword rich when possible</i>" (<a href="https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a">source</a>).'
	),
	}
	)

	def check_h1(self):
	"""Check all h1-related conditions
	"""

	h1 = self.soup.find_all("h1")
	if len(h1) > 1:
	self.problems.append(
	{
	"name": _("Too much h1 tags"),
	"settings": _("exactly 1"),
	"description": _(
	'Google has told that they do not consider using multiple h1 a bad thing (<a href="https://www.youtube.com/watch?v=WsgrSxCmMbM">source</a>), but Google is not the unique search engine out there. Bing webmaster guidelines says "Use only one <h1> tag per page".'
	),
	}
	)

	elif not h1:
	self.problems.append(
	{
	"name": _("No h1 tag"),
	"settings": _("exactly 1"),
	"description": _(
	"H1 is the most visually notable content of your page for your users, and is one of the most important ranking factor for search engines. A good h1 tag content is required in order to progress in SERP."
	),
	}
	)

	else:
	occurence = []
	for keyword in self.keywords:
	for single_h1 in h1:
	occurence.append(
	sum(
	1
	for _ in re.finditer(
	r"\b%s\b" % re.escape(keyword.lower()),
	single_h1.text.lower(),
	)
	)
	)
	# if no keyword is found in h1
	if not any(i > 0 for i in occurence):
	self.problems.append(
	{
	"name": _("No keyword in h1"),
	"settings": _("at least 1"),
	"description": _(
	"H1 are crawled by search engines as the title of your page. You may populate them with appropriate content in order to be sure that search engines correctly understand what your pages are all about."
	),
	}
	)

	def check_h2(self):
	h2 = self.soup.find_all("h2")
	if not h2:
	self.warnings.append(
	{
	"name": _("No h2 tag"),
	"settings": _("at least 1"),
	"description": _(
	'H2 tags are useful because they are explored by search engines and can help them understand the subject of your page (<a href="https://robsnell.com/matt-cutts-transcript.html">source</a>). It\'s a "section title", so every time you start talking about a new topic, you can put an h2 tag, which will explain what the content will be about.'
	),
	}
	)
	else:
	occurence = []
	# check if each keyword
	for keyword in self.keywords:
	# is present at least
	for single_h2 in h2:
	occurence.append(
	sum(
	1
	for _ in re.finditer(
	r"\b%s\b" % re.escape(keyword.lower()),
	single_h2.text.lower(),
	)
	)
	)
	# if no keyword is found in h2
	if not any(i > 0 for i in occurence):
	self.warnings.append(
	{
	"name": _("No keyword in h2"),
	"settings": _("at least 1"),
	"description": _(
	'Matt Cutts (creator of Google SafeSearch) <a href="https://robsnell.com/matt-cutts-transcript.html">stated in 2009</a> that "[...] we use things in the title, things in the URL, even things that are really highlighted, like h2 tags and stuff like that. ". Even if there is not really a more recent acknowledgement, h2 titles are important (but maybe not as important as h1 & title tags).'
	),
	}
	)

	def check_images(self):
	images = self.content.find_all("img")

	for image in images:
	if "alt" not in image.attrs or image.attrs["alt"] == "None":
	self.problems.append(
	{
	"name": _("Img lack alt tag"),
	"settings": _("all images"),
	"description": _(
	'Your images should always have an alt tag, because it improves accessibility for visually impaired people.<br />The name of your image is important too, because Google will look at it to know what the picture is about (<a href="https://support.google.com/webmasters/answer/114016">source</a>).<br /><a href="{img_url}">This is the image</a> without alt tag.'.format(
	img_url=image.attrs["src"]
	)
	),
	}
	)

	def check_url(self):
	"""All the url-related checks.
	"""

	# check url depth
	# do not count first slash after domain name, nor // in the "http://"
	url_without_two_points_slash_slash = self.full_url.replace("://", "")
	number_of_slashes = url_without_two_points_slash_slash.count("/") - 1

	if number_of_slashes > settings.SEO_SETTINGS["max_link_depth"]:
	self.problems.append(
	{
	"name": _("Too many levels in path"),
	"settings": "≤{settings}, found {path_depth}".format(
	settings=settings.SEO_SETTINGS["max_link_depth"],
	path_depth=number_of_slashes,
	),
	"description": _(
	'Google recommand to organize your content by adding depth in your url, but advises against putting too much repertories (<a href="https://support.google.com/webmasters/answer/7451184">source</a>).<br />Yoast says that "In a perfect world, we would place everything in one sublevel at most. Today, many sites use secondary menus to accommodate for additional content" (<a href="https://yoast.com/how-to-clean-site-structure/">source</a>).'
	),
	}
	)

	# check url length
	url_without_protocol = self.full_url.replace("http://", "").replace(
	"https://", ""
	)
	if len(url_without_protocol) > settings.SEO_SETTINGS["max_url_length"]:
	self.warnings.append(
	{
	"name": _("URL is too long"),
	"settings": "≤{settings}, found {len_url} chars".format(
	settings=settings.SEO_SETTINGS["max_url_length"],
	len_url=len(url_without_protocol),
	),
	"description": _(
	'A study from 2016 found a correlation between URL length & ranking (<a href="https://backlinko.com/search-engine-ranking">source</a>).'
	),
	}
	)

	def keyword_present_in_first_paragraph(self):
	"""Get [keywords_in_first_words] first words of the content, and ensure that there is a keyword among them.
	"""
	content = self.content.text.lower().split()[
	: settings.SEO_SETTINGS["keywords_in_first_words"]
	]

	for keyword in self.keywords:
	if keyword in content:
	return

	self.problems.append(
	{
	"name": _("No keyword in first sentence"),
	"settings": "before {settings} words".format(
	settings=settings.SEO_SETTINGS["keywords_in_first_words"]
	),
	"description": _(
	'Yoast advises to put a keyword in the first sentence of your content. The person who reads it will be relieved because he will quickly retrieve the keyword he was looking for (<a href="https://yoast.com/text-structure-important-seo/">source</a>).'
	),
	}
	)

	def count_words_without_stopwords(self):
	"""[summary]
	"""

	content = re.findall(r"\w+", self.content.text.lower())

	nb_words = len(content)

	# too few words
	if nb_words < settings.SEO_SETTINGS["content_words_number"][0]:
	self.problems.append(
	{
	"name": _("Content is too short"),
	"settings": "at least {min} words, more than {min2} if possible, found {nb_words}".format(
	min=settings.SEO_SETTINGS["content_words_number"][0],
	min2=settings.SEO_SETTINGS["content_words_number"][1],
	nb_words=nb_words,
	),
	"description": _(
	'Yoast provide us some knowledge : "A blog post should contain at least 300 words in order to rank well in the search engines. Long posts will rank more easily than short posts. However, long posts require strong writing skills" (<a href="https://yoast.com/blog-post-length/">source</a>).<br />An article from Forbes from 2017 says that "<i>content with 1,000 words or more tends to attract significantly more links and shares</i>", and "<i>the average content length for top 3 rankings was about 750 words, while the average content length for position 20 rankings was about 500 words</i>" (<a href="https://web.archive.org/web/20190708230659/http://www.forbes.com/">source</a>).'
	),
	}
	)

	elif nb_words < settings.SEO_SETTINGS["content_words_number"][1]:
	self.warnings.append(
	{
	"name": _("Content is too short"),
	"settings": "at least {min} words, more than {min2} if possible, found {nb_words}".format(
	min=settings.SEO_SETTINGS["content_words_number"][0],
	min2=settings.SEO_SETTINGS["content_words_number"][1],
	nb_words=nb_words,
	),
	"description": _(
	'Yoast provide us some knowledge : "A blog post should contain at least 300 words in order to rank well in the search engines. Long posts will rank more easily than short posts. However, long posts require strong writing skills" (<a href="https://yoast.com/blog-post-length/">source</a>).<br />An article from Forbes from 2017 says that "<i>content with 1,000 words or more tends to attract significantly more links and shares</i>", and "<i>the average content length for top 3 rankings was about 750 words, while the average content length for position 20 rankings was about 500 words</i>" (<a href="https://web.archive.org/web/20190708230659/https://www.forbes.com/sites/jaysondemers/2017/07/18/how-long-should-your-content-be-for-optimal-seo/2">source</a>).'
	),
	}
	)