Code Monkey home page Code Monkey logo

siphs's Introduction

SIPHS

Open online data such as microblogs and discussion board messages have the potential to be an incredibly valuable source of information about health in populations. Such data has been rapidly growing, is low cost, real-time and seems likely to cover a significant proportion of the demographic. To take two examples, PatientsLikeMe has enjoyed 10% growth and now has over 200,000 users covering over 1500 health conditions; the generic Twitter service is expanding at a rate of 30% annually with over 200 million active users. Going beyond simple keyword search and harnessing this data for public health represents both an opportunity and a challenge to natural language processing (NLP).

The EPSRC SIPHS project (grant no. EP/M005089/1) is about helping health experts leverage social media for their own clinical and scientific studies through automatic techniques that encode messages according to a machine understandable semantic representation. There are three major challenges this project seeks to address: (1) knowledge brokering: to develop algorithms to identify and code the informal descriptions of conditions, treatments, medications, behaviours and attitudes to standard ontologies such as the UMLS; (2) knowledge management: to create a structured resource of patient vocabulary used in blog texts and link it to existing coding systems; and (3) adding insight to evidence: to work with domain experts to utilize the coded information to automatically generate meaningful summaries for follow up investigation.

At the technological level SIPHS seeks to pioneer new methods for NLP and machine learning (ML). Social media remains a challenging area for NLP for a variety of reasons: short de-contextualised messages, high levels of ambiguity/out of vocabulary words, use of slang and an evolving vocabulary, as well as inherent bias towards sensational topics. The fellowship seeks to harness the progress made so far in NLP for social media analysis in the commercial domain and develop it further to provide meaningful public health evidence. One key aspect not previously addressed is in the clinical coding of patient messages. Although knowledge brokering systems exist for clinical and scientific texts (e.g. MetaMap), their performance on social media messages has been poor. SIPHS aims to utilise the rich availability of ontological resources in biomedicine together with ML on annotated message data to disambiguate informal language. Research will also aim to understanding the communicative function of messages, for example whether the message reports direct experience or is related to news, humour or marketing. If these problems are successfully overcome an important barrier to data integration with other types of clinical data will be removed.

SIPHS is being led by Dr. Nigel Collier, Principal Research Associate and Co-Director of the Language Technology Lab at the Department of Theoretical and Applied Linguistics, University of Cambridge.

##Publications:

[1] Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to normalise medical terms in social media messages”, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17-21 September 2015, pp. 1675-1680. Available at https://www.repository.cam.ac.uk/handle/1810/249295

[2] Limsopatham, N. and Collier, N. (2015), “Towards the semantic interpretation of personal health messages from social media”, in Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Workshop on Understanding the City with Urban Informatics (UCUI 2015), Melbourne, Australia, 19-23 October 2015. Available at https://www.repository.cam.ac.uk/handle/1810/249275

Reference 2 above is the canonical reference for the SIPHS project.

##Related information:

[1] EPSRC SIPHS project grant details: http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/M005089/1

[2] Nigel Collier's Web site: https://sites.google.com/site/nhcollier/

[3] Nut Limsopatham's Web site: http://www.mml.cam.ac.uk/nl347

[4] The Language Technology Lab Web site: http://ltl.mml.cam.ac.uk/

siphs's People

Contributors

nhcollier avatar dimkart avatar

Stargazers

Gobs avatar Doan Tu My avatar AA avatar Albert Perrien II avatar Martin Sykora avatar Xiaoguang Zhu avatar  avatar BenJueWeng avatar KaLaMite avatar Eduard Saller avatar  avatar  avatar  avatar 爱可可-爱生活 avatar Shashank Gupta avatar

Watchers

Sampo Pyysalo avatar James Cloos avatar Chen avatar BenJueWeng avatar  avatar  avatar Simon Baker avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.