Code Monkey home page Code Monkey logo

patent_sentiment_analysis's Introduction

PaSa: A Novel Dataset for Patent Sentiment Analysis to Highlight Patent Paragraphs

overview

In the process of manual patent analysis, to attain better readability, identifying the semantic information by marking paragraph is in practice. Oftentimes patent examiners and attorneys highlight the text passages which could be essential entities for key arguments when patent documents are compared to the existing literature. The important entities such as technical advantageous of any invention, problems associated with previous efforts, or other plain descriptive texts are manually highlighted by attorneys to compare and contrast indivdual subject matters. This semantic annotation process is laborious and time consuming, to allevate such problem we proposed a novel dataset. The dataset consist of three classes such as 'positive', 'negative', and 'neutral' corresponding to 'technical advantageous of inventios', 'technical problems', and 'other boilerplate text' from patent documents respectively.

Patents docs, mainly from ASIA expresses the selling points of their invention via section so called " technical advantages", "advantageous effects", "effects of invention", etc. Such section usually have some sentences or a paragraphs which focus on advantages of invention, we call them positive samples in our dataset. Other section just above advantages, usually explains the idea of invetion or how does it work, etc and we call such sentences or paragraphs as neutral samples in our dataset. Other class of our dataset is negative samples such as, some sections/senetnces/paragraphs which explains the problems associated with existing approaches listed in patent doc.

Given a patent document, identifying different annotations (three different classes as above mentioned) would be a very interesting research aspect, which helps the patent practitioners such as examiners, patent attorneys to quickly identify the key arguments of invention, in turn providing a timely marking/highlighting the text of patent.

Patent grants from USPTO in the year 2020 are referred to create this dataset. USPTO full text bulk datasets are formatted in a xml file per week, eaxh xml is again nested in nature and contain thosdands of grants of that particular week.

patent_sentiment_analysis's People

Contributors

renuk9390 avatar vishvapalsinh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

patent_sentiment_analysis's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.