Code Monkey home page Code Monkey logo

Ishan Agarwal's Projects

a-b-testing-a-new-menu-launch icon a-b-testing-a-new-menu-launch

Analyze the results of the experiment to determine whether the menu changes should be applied to all stores. The predicted impact to profitability should be enough to justify the increased marketing budget: at least 18% increase in profit growth compared to the comparative period while compared to the control stores; otherwise known as incremental lift

android-apps-java icon android-apps-java

This is the repo which will contain the apps developed by me for Android using Java language.

forecasting-video-game-demand icon forecasting-video-game-demand

Forecast monthly sales data in order to synchronize supply with demand, aid in decision making that will help build a competitive infrastructure and measure company performance.

information-retrieval icon information-retrieval

Information retrieval (IR) is concerned with finding material (e.g., documents) of an unstructured nature (usually text) in response to an information need (e.g., a query) from large collections. One approach to identify relevant documents is to compute scores based on the matches between terms in the query and terms in the documents. For example, a document with words such as ball, team, score, championship is likely to be about sports. It is helpful to define a weight for each term in a document that can be meaningful for computing such a score. We describe below popular information retrieval metrics such as term frequency, inverse document frequency, and their product, term frequency-inverse document frequency (TF-IDF), that are used to define weights for terms. ​ Term​ ​Frequency: ​ Term frequency is the number of times a particular word t occurs in a document d. TF(t,​ ​d)​ ​=​ ​No.​ ​of​ ​times​ ​t​ ​appears​ ​in​ ​document​ ​d Since the importance of a word in a document does not necessarily scale linearly with the frequency of its appearance, a common modification is to instead use the logarithm of the raw term frequency. WF(t,d)​ ​=​ ​1​ ​+​ ​log​10​ (TF(t,d))​ ​ ​if​ ​TF(t,d)​ ​>​ ​0,​ ​and​ ​0​ ​otherwise ​ ​ ​ ​ ​ We will use this logarithmically scaled term frequency in what follows. Inverse​ ​Document​ ​Frequency: The inverse document frequency (IDF) is a measure of how common or rare a term is across all documents in the collection. It is the logarithmically scaled fraction of the documents that contain the word, and is obtained by taking the logarithm of the ratio of the total number of documents to the number of documents containing the term. IDF(t)​ ​=​ ​log​10​ ​ ​(Total​ ​#​ ​of​ ​documents​ ​/​ ​#​ ​of​ ​documents​ ​containing​ ​term​ ​t) ​ ​ ​ ​ ​ ​ Under this IDF formula, terms appearing in all documents are assumed to be stopwords and subsequently assigned IDF=0. We will use the smoothed version of this formula as follows: ​ IDF(t)​ ​=​ ​log​10​ ​ ​(1​ ​+​ ​Total​ ​#​ ​of​ ​documents​ ​/​ ​#​ ​of​ ​documents​ ​containing​ ​term​ ​t) ​ ​ ​ ​ ​ Practically, smoothed IDF helps alleviating the out of vocabulary problem (OOV), where it is better to return to the user results rather than nothing even if his query matches every single document in the collection. TF-IDF: Term frequency–inverse document frequency (TF-IDF) is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus of documents. It is often used as a weighting factor in information retrieval and text mining. TF-IDF(t,​ ​d)​ ​=​ ​WF(t,d)​ ​*​ ​IDF(t) ​ ​ ​ ​

page-rank-implementation icon page-rank-implementation

The goal of this programming assignment is to compute the PageRanks of an input set of hyperlinked Wikipedia documents using Hadoop MapReduce. The PageRank score of a web page serves as an indicator of the importance of the page. Many web search engines (e.g., Google) use PageRank scores in some form to rank user-submitted queries. The goals of this assignment are to: 1. Understand the PageRank algorithm and how it works in MapReduce. 2. Implement PageRank and execute it on a large corpus of data. 3. Examine the output from running PageRank on Simple English Wikipedia to measure the relative importance of pages in the corpus. To run your program on the full Simple English Wikipedia archive, you will need to run it on the dsba-hadoop cluster to which you have access.

profanity-checker icon profanity-checker

This application will take the document as the input and will give a profanity alert if there is any cuss word in the document else it display a message if there isn't any cuss word.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.