Code Monkey home page Code Monkey logo

Comments (1)

leej35 avatar leej35 commented on August 12, 2024

I solved this issue by finding my error of use 'POST' method incorrectly. Following code is you can use when you have text files in a speicfic folder (TEXT DIR) and want annotate them and export each file as a json.

import traceback
import sys
import re
import glob
import json
import urllib
import urllib2
from time import gmtime, strftime
from time import sleep

INPUT_DIR = '../data/text/'
OUTPUT_DIR = '../data/json/'

API_KEY= ''
annotatorUrl = 'http://data.bioontology.org/annotator' 

ontology_list = 'ICD9CM,LOINC'
tui_list = 'T017,T029,T023,T030'


def get_json(text):
    params = {
        'text':text, 
        'longest_only':'true',
        'whole_word_only':'true',
        'stop_words':'',
        'ontologies':ontology_list,   
        'ontologiesToKeepInResult':'',   
        'isVirtualOntologyId':'true', 
        'semantic_types':tui_list,
        'apikey':API_KEY
    }
    headers = {'Authorization': 'apikey token=' + API_KEY}
    data = urllib.urlencode(params)
    request = urllib2.Request(annotatorUrl, data, headers)
    # request.add_header('Content-type','text/xml')
    response = urllib2.urlopen(request)
    data_json = json.loads(response.read().decode('utf-8'))
    # print 'http status: '+ str(response.getcode())
    return data_json


def main():
    for filename in glob.glob(INPUT_DIR+'*.txt'):
        # for each file load file 
        text = ''
        lines = open(filename,"r").read().splitlines()
        for l in lines:
            text = text + l.rstrip()
        # remove special characters
        text = re.sub('[^A-Za-z0-9]+', ' ', text)
        try:
            # get json
            data = get_json(text)
            # save to json file
            filename_nodir = filename.split('/')[-1].split('.')[0]
            json_fn = '' + filename_nodir + '.json'

            with open(OUTPUT_DIR+json_fn, 'w') as outfile:
                json.dump(data, outfile)
            print strftime("%Y-%m-%d %H:%M:%S") + ' SUCCESS ' + filename_nodir
        except:
            print strftime("%Y-%m-%d %H:%M:%S") + ' FAIL ' + filename_nodir
            raise

if __name__ == "__main__":
    main()

from ncbo_annotator.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.