Code Monkey home page Code Monkey logo

django-check-seo's People

Contributors

corentinbettiol avatar dmytrolitvinov avatar jgadelange avatar lnxg33k avatar mbi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

django-check-seo's Issues

Improve organization of checks

Currently, all checks are in a single Class, in the views.py file:

class DjangoCheckSeo:
def __init__(self, soup, full_url):
"""Populate some vars.
Arguments:
soup {bs4.element} -- beautiful soup content (html)
"""
self.soup = soup
# Get content of the page (exclude header/footer)
self.content = self.soup.find("div", {"class": "container"})
# remove ul with nav class from content (<ul class="nav"> is the menu)
self.content.find("ul", {"class": "nav"}).extract()
self.full_url = full_url
self.keywords = []
self.problems = []
self.warnings = []
def check(self):
"""Magic happens here.
Returns:
tuple -- Two arrays of dict of form {name, settings, description}.
"""
self.check_keywords()
self.check_title()
self.check_description()
self.check_links()
self.check_keyword_occurence()
self.check_keyword_url()
self.check_h1()
self.check_h2()
self.check_images()
self.check_url()
self.keyword_present_in_first_paragraph()
self.count_words_without_stopwords()
return (self.problems, self.warnings)
def check_keywords(self):
"""First check to ensure that keywords are present.
"""
meta = self.soup.find_all("meta")
for tag in meta:
if tag.attrs["name"] == "keywords" and tag.attrs["content"] != "":
# get keywords for next checks
self.keywords = tag.attrs["content"].split(
", "
) # may be dangerous to hard code the case where keywords are separated with a comma and two spaces
return
self.problems.append(
{
"name": _("No meta keywords"),
"settings": _("at least 1"),
"description": _(
"Meta keywords were important in this meta tag, however django-check-seo uses these keywords to check all other tests related to keywords. You will be flooded with problems and warnings and this SEO tool will not work as well as it should if you don't add some keywords."
),
}
)
def check_title(self):
"""Check all title-related conditions.
"""
# title presence
if self.soup.title == "None":
self.problems.append(
{
"name": _("No title tag"),
"settings": _("at least 1"),
"description": _(
"Titles tags are ones of the most important things to add to your pages, sinces they are the main text displayed on result search pages."
),
}
)
return
# title length too short
if len(self.soup.title.string) < settings.SEO_SETTINGS["meta_title_length"][0]:
self.problems.append(
{
"name": _("Title tag is too short"),
"settings": "&ge;{}".format(
settings.SEO_SETTINGS["meta_title_length"][0]
),
"description": _(
"Titles tags need to describe the content of the page, and need to contain at least a few words."
),
}
)
# title length too long
if len(self.soup.title.string) > settings.SEO_SETTINGS["meta_title_length"][1]:
self.warnings.append(
{
"name": _("Title tag is too long"),
"settings": "&le;{}".format(
settings.SEO_SETTINGS["meta_title_length"][1]
),
"description": _(
"Only the first ~55-60 chars are displayed on modern search engines results. Writing a longer title is not really required and can lead to make the user miss informations."
),
}
)
title_words = self.soup.title.string.split()
# title do not contain any keyword
if set(self.keywords).isdisjoint(set(title_words)):
self.problems.append(
{
"name": _("Title do not contain any keyword"),
"settings": _("at least 1"),
"description": _(
"Titles tags need to contain at least one keyword, since they are one of the most important content of the page for search engines."
),
}
)
def check_description(self):
meta = self.soup.find_all("meta")
for tag in meta:
if tag.attrs["name"] == "description" and tag.attrs["content"] != "":
if (
len(tag.attrs["content"])
< settings.SEO_SETTINGS["meta_description_length"][0]
):
self.problems.append(
{
"name": _("Meta description is too short"),
"settings": _("needed"),
"description": _(
"Meta description can be displayed below your page title in search results. If Google find your description too short or not relevant, it will generate it's own description, based on your page content. This generated description will be less accurate than a good writen description."
),
}
)
elif (
len(tag.attrs["content"])
> settings.SEO_SETTINGS["meta_description_length"][1]
):
self.problems.append(
{
"name": _("Meta description is too long"),
"settings": _("needed"),
"description": _(
"Meta description can be displayed below your page title in search results. If Google find your description too long, it may crop it and your potential visitors will not be able to read all its content. Sometimes, long pertinent meta descriptions will be displayed, but in the vast majority of the results, the description's lengths are 150-170 chars."
),
}
)
occurence = []
for keyword in self.keywords:
occurence.append(
sum(
1
for _ in re.finditer(
r"\b%s\b" % re.escape(keyword.lower()),
tag.attrs["content"].lower(),
)
)
)
# if no keyword is found in h1
if not any(i > 0 for i in occurence):
self.warnings.append(
{
"name": _("No keyword in meta description"),
"settings": _("at least 1"),
"description": _(
"Meta description is not used by search engines to calculate the rank of the page, but users will read it (if the meta description is selected by Google). The bonus point is that Google will put the keywords searched by the users in bold, so the users can eaily verify that the content of your page fit their needs."
),
}
)
return
self.problems.append(
{
"name": _("No meta description"),
"settings": _("needed"),
"description": _(
'Even if search engines states that they don\'t use meta description for ranking (<a href="https://webmasters.googleblog.com/2009/09/google-does-not-use-keywords-meta-tag.html">source</a>), they can be displayed below the title of your page in search results. Since search engines uses users clics to rank your website, an appealing description can make the difference.<br />Google has affirmed that they display a shorter text (~155 chars) below the title of the page (<a href="https://twitter.com/dannysullivan/status/996065145443893249">source</a>).'
),
}
)
def check_links(self):
"""Check all link-related conditions
"""
links = self.content.find_all("a")
internal_links = 0
external_links = 0
for link in links:
# internal links = absolute links that contains domain name or relative links
if os.environ["DOMAIN_NAME"] in link["href"] or not link["href"].startswith(
"http"
):
internal_links += 1
else:
external_links += 1
# not enough internal links
if internal_links < settings.SEO_SETTINGS["internal_links"][0]:
self.warnings.append(
{
"name": _("Not enough internal links"),
"settings": "&ge;{}".format(
settings.SEO_SETTINGS["internal_links"][0]
),
"description": _(
"Internal links are useful because they link your content and can give any search engine the structure of your website, so they can create a hierarchy of your pages."
),
}
)
# too much internal links
if internal_links > settings.SEO_SETTINGS["internal_links"][1]:
self.warnings.append(
{
"name": _("Too many internal links"),
"settings": "&le;{}".format(
settings.SEO_SETTINGS["internal_links"][1]
),
"description": _(
'Google is vague about the max number of internal links on your site. <a href="https://neilpatel.com/blog/commandments-of-internal-linking/">Neil Patel</a> advises 3 to 4 internal links in the content of your page (excluding header/footer), but he says that you can go up to 10-20 links if your content is long enough.'
),
}
)
# not enough external links
if external_links < settings.SEO_SETTINGS["external_links"][0]:
self.warnings.append(
{
"name": _("Not enough external links"),
"settings": "&ge;{}".format(
settings.SEO_SETTINGS["external_links"][0]
),
"description": _(
'Some recent SEO-related articles advise you to add some external links to help SEO on other websites (<a href="https://yoast.com/outbound-links/">source</a>) while at the other end an old (2015) study found that links to websites with an high authority help incresing websites ranking (<a href="https://www.rebootonline.com/blog/long-term-outgoing-link-experiment/">source</a>).'
),
}
)
# too much external links
if external_links > settings.SEO_SETTINGS["external_links"][1]:
self.warnings.append(
{
"name": _("Too many external links"),
"settings": "&le;{}".format(
settings.SEO_SETTINGS["external_links"][1]
),
"description": _(
'"Thanks to updates like Google Penguin, Google now focuses on link quality (not just link quantity)". There\'s no need to have too many external links on your main content, but the reputation of the websites you are linking to is important.'
),
}
)
def check_keyword_occurence(self):
"""Check if one of the keywords is present between keywords_repeat[0] & keywords_repeat[1] in the page. If no keywords is in this range, then will fire a problem.
no case sensitive (keyword & text are lowered before comparison).
Thx https://stackoverflow.com/a/17268979/6813732 for finditer.
"""
occurence = []
for keyword in self.keywords:
occurence.append(
sum(
1
for _ in re.finditer(
r"\b%s\b" % re.escape(keyword.lower()),
self.content.text.lower(),
)
)
)
if not occurence:
occurence = [0]
content = re.findall(r"\w+", self.content.text.lower())
nb_words = len(content) if len(content) > 0 else 1
# if no keyword is repeated more than ["keywords_repeat"][0] %
if not any(
i / nb_words >= settings.SEO_SETTINGS["keywords_repeat"][0]
for i in occurence
):
self.problems.append(
{
"name": _("Not enough keyword occurences"),
"settings": "&ge;{min}%, max found is {actual:.2f}% ({actual_nb} times)".format(
min=settings.SEO_SETTINGS["keywords_repeat"][0] * 100,
actual=max(occurence) / nb_words,
actual_nb=max(occurence),
),
"description": _(
'Presence of keywords are important for search engines like Google, who will "understand" what your content is about, and will better serve your page in answer to structured queries that uses your keywords.'
),
}
)
# there is at least 1 keyword that is repeated > ["keywords_repeat"][0]
else:
# there is at least 1 keyword that is repeated > ["keywords_repeat"][1]
if not all(
i / nb_words <= settings.SEO_SETTINGS["keywords_repeat"][1]
for i in occurence
):
self.problems.append(
{
"name": _("Too many keyword occurences"),
# settings: ≤5, found X "keyword"
"settings": '&le;{max}%, found {actual:.2f}% ({actual_nb} times) of "{kw}"'.format(
max=settings.SEO_SETTINGS["keywords_repeat"][1] * 100,
actual=max(occurence) / nb_words * 100,
actual_nb=max(occurence),
kw=self.keywords[occurence.index(max(occurence))],
),
"description": _(
"Some SEO websites advise you to get 1% of your words to be keywords. For other websites (like Yoast) it's 0.25-0.5%. We use a constant for keywords repetition. Too many keywords on a page will lead search engines to think that you're doing some keyword stuffing (put too many keywords in order to manipulate the page rank)."
),
}
)
def check_keyword_url(self):
"""Check presence of keywords in url
"""
for keyword in self.keywords:
if keyword in self.full_url:
return
self.problems.append(
{
"name": _("No keyword in URL"),
"settings": _("at least 1"),
"description": _(
'Keywords in URL are a small ranking factor for Google (<a href="https://twitter.com/JohnMu/status/1070634500022001666">source</a>), but it will help your users understand the organisation of your website (/?product=50 talk less than /products/camping/). On the other hand Bing says : "<i>URL structure and keyword usage - keep it clean and keyword rich when possible</i>" (<a href="https://www.bing.com/webmaster/help/webmaster-guidelines-30fba23a">source</a>).'
),
}
)
def check_h1(self):
"""Check all h1-related conditions
"""
h1 = self.soup.find_all("h1")
if len(h1) > 1:
self.problems.append(
{
"name": _("Too much h1 tags"),
"settings": _("exactly 1"),
"description": _(
'Google has told that they do not consider using multiple h1 a bad thing (<a href="https://www.youtube.com/watch?v=WsgrSxCmMbM">source</a>), but Google is not the unique search engine out there. Bing webmaster guidelines says "Use only one <h1> tag per page".'
),
}
)
elif not h1:
self.problems.append(
{
"name": _("No h1 tag"),
"settings": _("exactly 1"),
"description": _(
"H1 is the most visually notable content of your page for your users, and is one of the most important ranking factor for search engines. A good h1 tag content is required in order to progress in SERP."
),
}
)
else:
occurence = []
for keyword in self.keywords:
for single_h1 in h1:
occurence.append(
sum(
1
for _ in re.finditer(
r"\b%s\b" % re.escape(keyword.lower()),
single_h1.text.lower(),
)
)
)
# if no keyword is found in h1
if not any(i > 0 for i in occurence):
self.problems.append(
{
"name": _("No keyword in h1"),
"settings": _("at least 1"),
"description": _(
"H1 are crawled by search engines as the title of your page. You may populate them with appropriate content in order to be sure that search engines correctly understand what your pages are all about."
),
}
)
def check_h2(self):
h2 = self.soup.find_all("h2")
if not h2:
self.warnings.append(
{
"name": _("No h2 tag"),
"settings": _("at least 1"),
"description": _(
'H2 tags are useful because they are explored by search engines and can help them understand the subject of your page (<a href="https://robsnell.com/matt-cutts-transcript.html">source</a>). It\'s a "section title", so every time you start talking about a new topic, you can put an h2 tag, which will explain what the content will be about.'
),
}
)
else:
occurence = []
# check if each keyword
for keyword in self.keywords:
# is present at least
for single_h2 in h2:
occurence.append(
sum(
1
for _ in re.finditer(
r"\b%s\b" % re.escape(keyword.lower()),
single_h2.text.lower(),
)
)
)
# if no keyword is found in h2
if not any(i > 0 for i in occurence):
self.warnings.append(
{
"name": _("No keyword in h2"),
"settings": _("at least 1"),
"description": _(
'Matt Cutts (creator of Google SafeSearch) <a href="https://robsnell.com/matt-cutts-transcript.html">stated in 2009</a> that "[...] we use things in the title, things in the URL, even things that are really highlighted, like h2 tags and stuff like that. ". Even if there is not really a more recent acknowledgement, h2 titles are important (but maybe not as important as h1 & title tags).'
),
}
)
def check_images(self):
images = self.content.find_all("img")
for image in images:
if "alt" not in image.attrs or image.attrs["alt"] == "None":
self.problems.append(
{
"name": _("Img lack alt tag"),
"settings": _("all images"),
"description": _(
'Your images should always have an alt tag, because it improves accessibility for visually impaired people.<br />The name of your image is important too, because Google will look at it to know what the picture is about (<a href="https://support.google.com/webmasters/answer/114016">source</a>).<br /><a href="{img_url}">This is the image</a> without alt tag.'.format(
img_url=image.attrs["src"]
)
),
}
)
def check_url(self):
"""All the url-related checks.
"""
# check url depth
# do not count first slash after domain name, nor // in the "http://"
url_without_two_points_slash_slash = self.full_url.replace("://", "")
number_of_slashes = url_without_two_points_slash_slash.count("/") - 1
if number_of_slashes > settings.SEO_SETTINGS["max_link_depth"]:
self.problems.append(
{
"name": _("Too many levels in path"),
"settings": "&le;{settings}, found {path_depth}".format(
settings=settings.SEO_SETTINGS["max_link_depth"],
path_depth=number_of_slashes,
),
"description": _(
'Google recommand to organize your content by adding depth in your url, but advises against putting too much repertories (<a href="https://support.google.com/webmasters/answer/7451184">source</a>).<br />Yoast says that "In a perfect world, we would place everything in one sublevel at most. Today, many sites use secondary menus to accommodate for additional content" (<a href="https://yoast.com/how-to-clean-site-structure/">source</a>).'
),
}
)
# check url length
url_without_protocol = self.full_url.replace("http://", "").replace(
"https://", ""
)
if len(url_without_protocol) > settings.SEO_SETTINGS["max_url_length"]:
self.warnings.append(
{
"name": _("URL is too long"),
"settings": "&le;{settings}, found {len_url} chars".format(
settings=settings.SEO_SETTINGS["max_url_length"],
len_url=len(url_without_protocol),
),
"description": _(
'A study from 2016 found a correlation between URL length & ranking (<a href="https://backlinko.com/search-engine-ranking">source</a>).'
),
}
)
def keyword_present_in_first_paragraph(self):
"""Get [keywords_in_first_words] first words of the content, and ensure that there is a keyword among them.
"""
content = self.content.text.lower().split()[
: settings.SEO_SETTINGS["keywords_in_first_words"]
]
for keyword in self.keywords:
if keyword in content:
return
self.problems.append(
{
"name": _("No keyword in first sentence"),
"settings": "before {settings} words".format(
settings=settings.SEO_SETTINGS["keywords_in_first_words"]
),
"description": _(
'Yoast advises to put a keyword in the first sentence of your content. The person who reads it will be relieved because he will quickly retrieve the keyword he was looking for (<a href="https://yoast.com/text-structure-important-seo/">source</a>).'
),
}
)
def count_words_without_stopwords(self):
"""[summary]
"""
content = re.findall(r"\w+", self.content.text.lower())
nb_words = len(content)
# too few words
if nb_words < settings.SEO_SETTINGS["content_words_number"][0]:
self.problems.append(
{
"name": _("Content is too short"),
"settings": "at least {min} words, more than {min2} if possible, found {nb_words}".format(
min=settings.SEO_SETTINGS["content_words_number"][0],
min2=settings.SEO_SETTINGS["content_words_number"][1],
nb_words=nb_words,
),
"description": _(
'Yoast provide us some knowledge : "A blog post should contain at least 300 words in order to rank well in the search engines. Long posts will rank more easily than short posts. However, long posts require strong writing skills" (<a href="https://yoast.com/blog-post-length/">source</a>).<br />An article from Forbes from 2017 says that "<i>content with 1,000 words or more tends to attract significantly more links and shares</i>", and "<i>the average content length for top 3 rankings was about 750 words, while the average content length for position 20 rankings was about 500 words</i>" (<a href="https://web.archive.org/web/20190708230659/http://www.forbes.com/">source</a>).'
),
}
)
elif nb_words < settings.SEO_SETTINGS["content_words_number"][1]:
self.warnings.append(
{
"name": _("Content is too short"),
"settings": "at least {min} words, more than {min2} if possible, found {nb_words}".format(
min=settings.SEO_SETTINGS["content_words_number"][0],
min2=settings.SEO_SETTINGS["content_words_number"][1],
nb_words=nb_words,
),
"description": _(
'Yoast provide us some knowledge : "A blog post should contain at least 300 words in order to rank well in the search engines. Long posts will rank more easily than short posts. However, long posts require strong writing skills" (<a href="https://yoast.com/blog-post-length/">source</a>).<br />An article from Forbes from 2017 says that "<i>content with 1,000 words or more tends to attract significantly more links and shares</i>", and "<i>the average content length for top 3 rankings was about 750 words, while the average content length for position 20 rankings was about 500 words</i>" (<a href="https://web.archive.org/web/20190708230659/https://www.forbes.com/sites/jaysondemers/2017/07/18/how-long-should-your-content-be-for-optimal-seo/2">source</a>).'
),
}
)

The ideal implementation should be a config parameter with a list of checks files and a Class with a few information (html of the page, content of the page, url, keywords...).

Each check will be in a separate file, with some information available through the class.

Send DJANGO_CHECK_SEO_AUTH along with the request in case of redirect [3xx]

Hi!

First of all, I would like to thank you for the great work! This is a very helpful package and It saved us a lot of time on enhancing the SEO of our posts.

Recently, we started using the DJANGO_CHECK_SEO_AUTH to use auth credentials along with the request. But for reasons we don't fully control, the DNS server is redirecting us to another URL.

In this scenario, because of a characteristic of the requests package, the HTTP_AUTHORATION is not sent in the redirect request (after a 3xx) and we end up with a 404 (the view wasn't able to authenticate the request).

I was wondering if the request made by django_check_seo could check for a 3xx
status code and follow the new URL location with the DJANGO_CHECK_SEO_AUTH information.

Here is a simple example:

import requests

r = requests.get(
    URL, 
    auth=(
        settings.DJANGO_CHECK_SEO_AUTH["user"],
        settings.DJANGO_CHECK_SEO_AUTH["pass"],
     ),
    allow_redirects=False
)

if 300 < r.status_code < 400:
    r = requests.get(
    r.headers['location'], 
    auth=(
        settings.DJANGO_CHECK_SEO_AUTH["user"],
        settings.DJANGO_CHECK_SEO_AUTH["pass"],
     ),
)

Versions

  • Python 3.6.8
  • django-check-seo==0.3.6
  • requests==2.18.4

Thank you very much,
[]s

lower all text in tests

Keywords can be "postgresql", and actual word used in text can be "PostgreSQL".
Check all files to replace soup.element.string to soup.element.string.lower()

Max retries exceeded

I deployed a djangocms in heroku and checked the SEO check but it returned this error HTTPSConnectionPool(host='https', port=443): Max retries exceeded with url: //astaqc-djangocms.herokuapp.com//en/ (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f45e24480b8>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Here is the screenshot
image

Add informational pop-ups for technical details

Users may not know where to modify the information.
We could add a button that would open a popup with an instruction manual.


Actual look:
image


Desired result:
Just add a "guide" or "manual" entry in the CustomList object and add a pop-up in the view:
image

Exclude certain images from checks

We don't want to alert about missing alt tags for certain images (like a tracking pixel, or something like this). We could add a list of urls to ignore in the alt tag check.

Change status of internal & external links

Since the number of internal links can vary greatly depending on the subject treated, the default limits are "between 1-15" for internal links, and "between 1-5" for external links.

In addition, not having a number of links in these ranges should not trigger a problem, only a warning.

Running Without CMS or Installation errors

I am not sure whether this will be helpfull to others, but this is what I did after going through some issues while installing this.

You can follow the installation, and after installation make sure this are the settings present.

add 'django.contrib.sites' to INSTALLED_APPS
SITE_ID = 1

This step is important if running without CMS

Go to template 'default.html' in site packages of installed apps
remove 'cms_tag'
save it
access your page seo by,
>. base.domain/django-check-seo/?page=/page_extnsn



Hope this helps!

it is possible to use without djangocms?

it is possible to use without djangocms?
I have a handmade website, and I wanted to see how my SEO is if it is correct or has problems, open some way to install and use this package for sites that are not built from djangocms

Add categories to checks

Currently, all checks are processed in the same way and are displayed in a large list.
A cool thing to add could be a category tag for checks, which could add readability to lists.

The category will be accessible via the Site instance in each check, and it will be of the form "retrieve the list according to its name, or create a new list with the new name".


Old display:

image


New display:

image

InsecureRequestWarning

When using django-check-seo, we can have some logs that says this:

InsecureRequestWarning,
/home/me/projets/projectname/.venv/lib/python3.7/site-packages/urllib3/connectionpool.py:1004:
InsecureRequestWarning: Unverified HTTPS request is being made to host 'my-https-django-website.ext'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

We should redirect http requests to https ones. And django-check-seo is used in order to analyse public content, that is available all over the web.

Change the way keywords occurences checks work

Actual way:

  • Check if at least one keyword is repeated at least keywords_repeat[0] times.
  • Check if all keywords are not repeated more than keywords_repeat[1] times.

New way: check occurrences using percentages of keywords found in total number of words of the content.

  • Check if at least one keyword is repeated more than keywords_repeat[0] (0.25%).
  • Check if all keywords are not repeated more thant keywords_repeat[1] (1%).

Treat missing alt attributes as Warning instead of Problem

Describe the bug
An <img> missing an alt attribute is filed as Problem.

According to WCAG, some images must not have alt attributes:

Sometimes there is non-text content that really is not meant to be seen or understood by the user. Transparent images used to move text over on a page; an invisible image that is used to track usage statistics; and a swirl in the corner that conveys no information but just fills up a blank space to create an aesthetic effect are all examples of this. Putting alternative text on such items just distracts people using screen readers from the content on the page. Not marking the content in any way, though, leaves users guessing what the non-text content is and what information they may have missed (even though they have not missed anything in reality). This type of non-text content, therefore, is marked or implemented in a way that assistive technologies (AT) will ignore it and not present anything to the user.

Expected behavior
File missing alt attributes as warnings and explain why and how to fill alt attributes.
Quotes and links to WCAG (and the RGAA french equivalent) should be present.

meta description searched_in error

A meta-description is shown for each keyword present in meta keywords:
image

We want to show only one meta description "searched_in" content if there is only one meta description:
image

slugify urls and keywords

Currently only keywords are slugified when checking presence of keywords in urls.

If the url contain accented characters they will not be slugified.

Ex:

  • keyword: keyword éé
  • slugified keyword: keyword-ee
  • url: https://dom.ext/fr/my-url-which-contain-keyword-éé/

-> slugified keyword is not in url since keyword-eekeyword-éé

Absolute links are throwing errors.

Describe the bug
Error occurs when using an absolute link (/link) during check_links check.

Expected behavior
check_links should append absolute url to domain name.

Bonus
check_links should check if the site.full_url contains env var DOMAIN_NAME to create the domain url (so external tests like http://domain/django-check-seo/?page=https://kapt.mobi should not show any errors).

Add config option to allow request module to access protected websites.

Adding a config option to allow request module to access protected websites can be useful when:

  • a website is protected with a HTTP Authentication system, and django-check-seo is installed on this website:
    the user will fill his/her credentials so the website will become accessible, but django-check-seo will only get the 401 message

DJANGO_CHECK_SEO_EXCLUDE_CONTENT does not exclude content in conditional comments

Describe the bug
The content inside html conditional comments is not excluded even though it is referenced in DJANGO_CHECK_SEO_EXCLUDE_CONTENT setting.

To Reproduce
Steps to reproduce the behavior:

Considering the following template:

<body>
  <!--[if lt IE 8]>
    <p class="catch-me-if-you-can"></p>
  <![endif]-->
</body>
  1. Add DJANGO_CHECK_SEO_EXCLUDE_CONTENT = ".catch-me-if-you-can" in your settings
  2. Click on "Check SEO" toolbar button
  3. Check the "Raw data" section
  4. See that .catch-me-if-you-can HTML tag is not excluded

Expected behavior
.catch-me-if-you-can HTML tag is excluded.

Lack of affordance

Some users do not know that they can click on a check result to show more informations:

  • check result lines have no affordance (no button, no link, no icon)
  • mouse cursor when hovering a check result line, is not usual

Update of class organization

If we move the Site class out of the views.py file and place it in a checks/ folder (after renaming the checks/ folder containing all control files to checks_list/), we could create a new CustomList class, which will be a new way to organize the textual data of the checks.

new organization:
image


old way to add a new problem/warning:

image


new way:

image
(checks code here)
image

No H1 found when in <nav> tag

Describe the bug
When the H1 is in a <nav> tag, django-check-seo does not find it.

<body>
  <nav>
    <h1>not found</h1>
  </nav>
</body>

All people with the url can see the seo page

Describe the bug
Only connected administrators should be able to see the django-check-seo page.

To Reproduce
Grab the url (the thing that looks like http://mysite/django-check-seo/?page=/), and open a private browsing tab.

Expected behavior
The non-connected user should be redirected to a login screen.

Screenshots

bug:
image

expected behavior:
image

Make checks really flexible

Currently checks are only searched inside checks/ folder in django-check-seo/ folder (deep inside .venv).

Maybe add a custom path to look for other checks in the project ?

Change default values for internal & external links

If you compare our old guideline of 100 links and you look at what the web looks like now, it's quite common to have 200 or 300 or 400 links on a page, as long as the page is long, it has value add, there's substantial amount of substance and real stuff on that page.
So the short answer is really not to worry about it, or not to limit yourself to 100 links anymore.
Matt Cutts - https://www.youtube.com/watch?v=QHG6BkmzDEM

The number of links on a page can vary considerably, we just need to make sure that the page has at least one link.

Django check seo search in wrong content

Currently, django-check-seo is searching the content in the <div class="container"></div> tag.

It could be a problem if the main content of the pages of the crawled website is in another tag (like <main> or <div class="cms_main"> ...).

We should provide a setting where we could select a tag/class/list of tags/classes to search content in.

Errors due to escaped special chars

Say you have this keyword:

that's awesome

Then your meta keywords will maybe contain this:

that&#39;s awesome

But your html content will contain this:

[...] and that's awesome!

Django check seo does not unescape content in keywords or in meta description (and that's cool because there can be a XSS). However, for websites who escapes specials chars in meta keywords/description tags, maybe we could use a list of authorized chars in the settings, and unescape only the keywords/description tags, like this:

DJANGO_CHECK_SEO_UNESCAPE_AUTHORIZED_CHARS = ["'", "\"", "!", "and", "so", "on", "..."]

I don't really know what to do regarding this issue for now. Its way simpler to just fix the escaping in keywords & description tags.

Add style to results

The goal here is to display data differently than the classic "problem name" - "found X" - "description".

Here's three screenshots, which one display a better interface ?

image
django-check-seo


image
audit page in google chrome dev tools


image
semrush


Stop words are not really necessary

It seems that all SEO-related websites that speaks about stop words don't have any credible source.

We will not use stop words in the count_words function.

Show what's wrong in the "searched in" content

Currently there is not so much informations on what's wrong directly in the "searched in" content:
image

It seems to be a good idea to show what's wrong (like in this picture; the URL is too deep):

image

Failed parse

Describe the bug
Failed requests.exceptions.InvalidURL: Failed to parse: http://127.0.0.1:8000b'/fr/'

To Reproduce
Django<3.1
django-cms>=3.7,<3.8
django-check-seo==0.3.6

Expected behavior
Expected http://127.0.0.1:8000/fr/

Desktop :

  • Python 3.8.5

** Temporary Fix **
Override IndexView and add at line 30 :

        page = self.request.GET.get("page", None)
        page = page[2:(len(page) - 1)]
        full_url = (
                protocol
                + Site.objects.get_current().domain
                + page
        )

Display keywords

Describe the solution you'd like
Currently, keywords are not displayed on the django-check-seo page, and sometimes we need them to check something.

A good improvement could be to display them on the upper part of the page.

Additional context
image

Footer is present in content

When we have a page like this:

...
<footer>
  <div class="container">
    Footer
  </div>
</footer>

The self.content var in Site Class only retrieved the <div class="container">...</div> block, and so it is impossible to tell if this item in self.content:

...
<div class="container">
  Footer
</div>

is a footer that need to be removed.

nav menu is in content

django-check-seo should not include things like header, menu & footer in the core content of the page, but on some websites that's actually the case, leading to the "no keyword found in first §" problem.

Slugify keywords in checks

The goal here is to standardizes the way keywords are processed to facilitate the detection of keywords in content.

Ex:

Find keywords in url -> use slugify to effectively compare keywords and url.

Rework descriptions of checks

Transform something like
"Meta description can be displayed below your page title in search results. If Google find your description too long, it may crop it and your potential visitors will not be able to read all its content. Sometimes, long pertinent meta descriptions will be displayed, but in the vast majority of the results, the description's lengths are 150-170 chars."
to
"The meta description tag can be displayed in search results if it has the right length, and can influence users. And Google classifies sites according to user behaviour."

Allow the application to be installed from github

Right now, the installation method is to "copy a bunch of files into your project folder, then edit all of them, then spend three in three days debugging this buggy application".

The new approach should be the same as for all other django applications;

  • install application using pipenv or pip inside a virtualenv,
  • update your settings.py with values,
  • add values into your urls.py,
  • add a custom file in your project_folder/ to create the button to launch this application.

Add dependancies in setup.cfg

User currently need to manually install some packages that django-check-seo depends on, or to manually add them to their Pipfile.

But Setup Configuration File syntax include the declaration of dependencies:

[options]
include_package_data = true
packages = find:
install_requires =
    packagename>=version
    otherpackagename

I think the right way to treat dependencies is to let pipenv (or pip) do it for us.

Add importance factor

Currently the only factor is the "first called, first ranked" rule of the problems/warnings.

Another thing to do is add an "importance" to the problems/warnings, to allow them to be ordered by importance and remove the "first called, first ranked" rule.

Add new check: internal links status

Is your feature request related to a problem? Please describe.
Google does not like dead links on pages. Since checking external links can take time, it is better to test only internal links for now.

Describe the solution you'd like
Solution with requests & request.status_code.

Add successful checks list

If no check is failed, then no information is displayed.

It could be a good idea to show a list of successful checks.


Old way:
image


New way:

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.