Code Monkey home page Code Monkey logo

tfidfkeywordssuggest's Introduction

TF-IDF KeywordsSuggest - Version Alpha 0.1 Licence GPL 3

Version 0.1 : change tfidfkeywordsuggest.py due to change in googlesearch library . Minor bug fixed.

Anakeyn TF-IDF Keywords Suggest is a keywords suggestion tool for SEO and Web Maketing purpose. This tool searches and stores the first x pages responding to a given keyword in Google.

Next the system will get the content of the pages in order to find popular and original keywords/Expressions in the subject area. The system works with a TF-IDF algorithm.

TF-IDF means term frequency–inverse document frequency. TF-IDF is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

In order to calculate a "global" TF-IDF value we calculate a mean of TF-IDF for each term from all documents to find popular expressions and a non-zero mean of each term from all documents for original expressions.

The program is developed in Python in a Web format using Flask (web framework), Jinja2 (web template engine), SQLALchemy (Object-relational mapping for SQL databases),Bootstrap (front-end framework) ...

STRUCTURE :

KeywordsSuggest
|   database.db
|   favicon.ico
|   tfidfkeywordssuggest.py
|   license.txt
|   myconfig.py
|   requirements.txt
|   __init__.py
|   
+---configdata
|       tldLang.xlsx
|       user_agents-taglang.txt
|       
+---static
|       Anakeyn_Rectangle.jpg
|       tfidfkeywordssuggest.css
|       Oeil_Anakeyn.jpg
|       signin.css
|       starter-template.css
|              
+---templates
|       index.html
|       tfidfkeywordssuggest.html
|       login.html
|       signup.html
|       
+---uploads

By default the system works with a SQLite database called database.db which is created the first time you use the program. The main program is "tfidfkeywordssuggest.py".

Default config variables are in the myconfig.py file including the 2 default users : admin (pwd "adminpwd") and guest (pwd "guestpwd")

Other configuration data is available in the configdata subdiretory in 2 files : tldlang.xlsx : parameters for Google Top Level domains and Search Engines Results Pages languages (358 combinations) user_agents-taglang.txt : a list of valid user agents to provide to Google randomly to avoid blocking. (4281)

Static directory contains images and .css files

Templates directory contains .html templates.

Uploads directory is dedicated to create/save all keywords files to download.

The system creates 7 "popular" keywords/expressions files : 1 file with all sizes expression in words, and one file for respectively 1, 2, 3, 4, 5 or 6 words expressions. The same for "original" keywords/expressions files. If available, the system provides a maximum of 10.000 expressions for each file. This could be enough to get ideas :-)

How to test the program on your computer :

Download the .zip file of this application https://github.com/Anakeyn/TFIDFKeywordsSuggest/archive/master.zip and unzip it in a directory on your computer.

Download and Install Anaconda https://www.anaconda.com/distribution/#download-section

Anaconda will install tools on your computer :

Anaconda-Tools

Open Anaconda Prompt and go to the directory where you installed the application previously (for example for Windows : cd c:\Users\myname\document......\

Make sure you have the file "requirements.txt" in your directory : dir (Windows) or ls (Linux)

To install Library dependencies for the python code. You need to install these with the command :

For Linux : while read requirement; do conda install --yes $requirement || pip install $requirement; done < requirements.txt

For Windows : FOR /F "delims=~" %f in (requirements.txt) DO conda install --yes "%f" || pip install "%f"

AnacondaPrompt

Next launch Spyder and open the main Python file tfidfkeywordssuggest.py

spyder-keywordssuggest

make sure that you are in the good directory then click on the green arrow to run the Python File.

Next, open a browser an go to the address http://127.0.0.1:5000 :

AKS-Home

Click on "Keywords Suggest" : the system is protected; Provide the defaults admin credentials : admin, adminpwd or the default guest credentials : guest, guestpwd

Next Choose an expression and a Country/Language targeted.

AKS-Search

The system will search in Google pages responding to the Keyword, save the pages, get the content and calculate a TF-IDF for each term founded in pages. Next it will provides 14 files with up to 10.000 popular or original expressions.

AKS-Results

As you can, see not all languages are filtered by Google (see here "lr" parameter to get the list : https://developers.google.com/custom-search/docs/xml_results_appendices#lrsp). However, with the country filter and the language specified in the user agent, the results are often exploitable.

Here you will see results of original 2 words expression for "SEO" in Swahili in Democratic Republic of Congo

SEO-Swahili-RDC

tfidfkeywordssuggest's People

Contributors

anakeyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.