iesl / dilated-cnn-ner Goto Github PK

View Code? Open in Web Editor NEW

243.0 243.0 60.0 164 KB

Dilated CNNs for NER in TensorFlow

Shell 4.85% Python 95.15%

cnns machine-learning named-entity-recognition natural-language-processing neural-networks tensorflow

dilated-cnn-ner's People

Contributors

Stargazers

Watchers

Forkers

wangtingc adakasky adityanshastry benjamesbabala lanseyege zgsxwsdxg rillaha whus6207 joneswong theanhle lyfree132 xuhanvsxuhan minghui coopertian bingyupiaoyao whumatrix tachim leezqcst ttslr neutralino jind11 arfu2016 imaduddin76 qiangzi11hao lidanxu zhyuxie pnvphuong zhaosm ymohit howl-anderson jameshsu007 whyxzh yotofu munaachyuta februa22 swaileh whu933314 unosonu liyiran caoyuji1986 alaska47 qianrenjian inkrement hualangzeng yyht goingcoder daishengqi shaoqibnu sdwivedi louiskylake anigi98932 aiedward mars-wei soul-an bjiang911 elijahahianyo

dilated-cnn-ner's Issues

About the paper

Hello,

Is there any suggested reading that you'd recommend to understand your paper. I tried working through the architecture but, while using the expression for the final Dilation layer of the block, I found some points of confusion.
Please suggest further if any additional information is required.
Any help will be much appreciated.

Regards

Getting some issue with permission beyond my understanding

Hello,

I have replicated the git directory structure as provided here and I ran the following:

(This has the bin, conf, and src folders with the relevant files)
export DILATED_CNN_NER_ROOT=/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER``

(This has the english corpus in the form of train.txt test.txt and valid.txt)
train.txt excerpt:

-DOCSTART- -X- -X- O

EU NNP B-NP B-ORG
rejects VBZ B-VP O
German JJ B-NP B-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP B-MISC
lamb NN I-NP O
. . O O

Peter NNP B-NP B-PER
Blackburn NNP I-NP I-PER

BRUSSELS NNP B-NP B-LOC
1996-08-22 CD I-NP O

export DATA_DIR='/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/conll2003

Upon running the following command:
./bin/preprocess.sh conf/conll/dilated-cnn.conf
; I am getting a bunch of permission denied errors which, am unable to fathom why. Am sorry if the question is very lame but, can anyone help me here?

Tried redirecting the trace to a text file which could only capture this:

"Writing extra vocab (cutoff ) to "

Trace thrown at terminal:

find: ‘/proc/4416/map_files’: Permission denied
find: ‘/proc/4416/fdinfo’: Permission denied
find: ‘/proc/4416/ns’: Permission denied
find: ‘/proc/4430/task/4430/fd’: Permission denied
find: ‘/proc/4430/task/4430/fdinfo’: Permission denied
find: ‘/proc/4430/task/4430/ns’: Permission denied
find: ‘/proc/4430/task/4438/fd’: Permission denied
find: ‘/proc/4430/task/4438/fdinfo’: Permission denied
find: ‘/proc/4430/task/4438/ns’: Permission denied
find: ‘/proc/4430/task/4439/fd’: Permission denied
find: ‘/proc/4430/task/4439/fdinfo’: Permission denied
find: ‘/proc/4430/task/4439/ns’: Permission denied
find: ‘/proc/4430/task/4440/fd’: Permission denied
find: ‘/proc/4430/task/4440/fdinfo’: Permission denied
find: ‘/proc/4430/task/4440/ns’: Permission denied
find: ‘/proc/4430/task/4441/fd’: Permission denied
find: ‘/proc/4430/task/4441/fdinfo’: Permission denied
find: ‘/proc/4430/task/4441/ns’: Permission denied
find: ‘/proc/4430/task/4442/fd’: Permission denied
find: ‘/proc/4430/task/4442/fdinfo’: Permission denied
find: ‘/proc/4430/task/4442/ns’: Permission denied
find: ‘/proc/4430/task/4443/fd’: Permission denied
find: ‘/proc/4430/task/4443/fdinfo’: Permission denied
find: ‘/proc/4430/task/4443/ns’: Permission denied
find: ‘/proc/4430/task/4444/fd’: Permission denied
find: ‘/proc/4430/task/4444/fdinfo’: Permission denied
find: ‘/proc/4430/task/4444/ns’: Permission denied
find: ‘/proc/4430/task/4445/fd’: Permission denied
find: ‘/proc/4430/task/4445/fdinfo’: Permission denied
find: ‘/proc/4430/task/4445/ns’: Permission denied
find: ‘/proc/4430/task/4542/fd’: Permission denied
find: ‘/proc/4430/task/4542/fdinfo’: Permission denied
find: ‘/proc/4430/task/4542/ns’: Permission denied
find: ‘/proc/4430/task/4543/fd’: Permission denied
find: ‘/proc/4430/task/4543/fdinfo’: Permission denied
find: ‘/proc/4430/task/4543/ns’: Permission denied
find: ‘/proc/4430/task/4544/fd’: Permission denied
find: ‘/proc/4430/task/4544/fdinfo’: Permission denied
find: ‘/proc/4430/task/4544/ns’: Permission denied
find: ‘/proc/4430/task/4545/fd’: Permission denied
find: ‘/proc/4430/task/4545/fdinfo’: Permission denied
find: ‘/proc/4430/task/4545/ns’: Permission denied
find: ‘/proc/4430/task/4546/fd’: Permission denied
find: ‘/proc/4430/task/4546/fdinfo’: Permission denied
find: ‘/proc/4430/task/4546/ns’: Permission denied
find: ‘/proc/4430/task/4547/fd’: Permission denied
find: ‘/proc/4430/task/4547/fdinfo’: Permission denied
find: ‘/proc/4430/task/4547/ns’: Permission denied
find: ‘/proc/4430/task/4548/fd’: Permission denied
find: ‘/proc/4430/task/4548/fdinfo’: Permission denied
find: ‘/proc/4430/task/4548/ns’: Permission denied
find: ‘/proc/4430/task/4549/fd’: Permission denied
find: ‘/proc/4430/task/4549/fdinfo’: Permission denied
find: ‘/proc/4430/task/4549/ns’: Permission denied
find: ‘/proc/4430/task/4550/fd’: Permission denied
find: ‘/proc/4430/task/4550/fdinfo’: Permission denied
find: ‘/proc/4430/task/4550/ns’: Permission denied
find: ‘/proc/4430/task/4551/fd’: Permission denied
find: ‘/proc/4430/task/4551/fdinfo’: Permission denied
find: ‘/proc/4430/task/4551/ns’: Permission denied
find: ‘/proc/4430/task/4552/fd’: Permission denied
find: ‘/proc/4430/task/4552/fdinfo’: Permission denied
find: ‘/proc/4430/task/4552/ns’: Permission denied
find: ‘/proc/4430/fd’: Permission denied
find: ‘/proc/4430/map_files’: Permission denied
find: ‘/proc/4430/fdinfo’: Permission denied
find: ‘/proc/4430/ns’: Permission denied
find: ‘/proc/5253/task/5253/fd’: Permission denied
find: ‘/proc/5253/task/5253/fdinfo’: Permission denied
find: ‘/proc/5253/task/5253/ns’: Permission denied
find: ‘/proc/5253/fd’: Permission denied
find: ‘/proc/5253/map_files’: Permission denied
find: ‘/proc/5253/fdinfo’: Permission denied
find: ‘/proc/5253/ns’: Permission denied
find: ‘/proc/5263/task/5263/fd’: Permission denied
find: ‘/proc/5263/task/5263/fdinfo’: Permission denied
find: ‘/proc/5263/task/5263/ns’: Permission denied
find: ‘/proc/5263/fd’: Permission denied
find: ‘/proc/5263/map_files’: Permission denied
find: ‘/proc/5263/fdinfo’: Permission denied
find: ‘/proc/5263/ns’: Permission denied
find: ‘/proc/5751/task/5751/fd’: Permission denied
find: ‘/proc/5751/task/5751/fdinfo’: Permission denied
find: ‘/proc/5751/task/5751/ns’: Permission denied
find: ‘/proc/5751/task/5756/fd’: Permission denied
find: ‘/proc/5751/task/5756/fdinfo’: Permission denied
find: ‘/proc/5751/task/5756/ns’: Permission denied
find: ‘/proc/5751/task/5757/fd’: Permission denied
find: ‘/proc/5751/task/5757/fdinfo’: Permission denied
find: ‘/proc/5751/task/5757/ns’: Permission denied
find: ‘/proc/5751/task/5758/fd’: Permission denied
find: ‘/proc/5751/task/5758/fdinfo’: Permission denied
find: ‘/proc/5751/task/5758/ns’: Permission denied
find: ‘/proc/5751/task/5759/fd’: Permission denied
find: ‘/proc/5751/task/5759/fdinfo’: Permission denied
find: ‘/proc/5751/task/5759/ns’: Permission denied
find: ‘/proc/5751/task/5760/fd’: Permission denied
find: ‘/proc/5751/task/5760/fdinfo’: Permission denied
find: ‘/proc/5751/task/5760/ns’: Permission denied
find: ‘/proc/5751/task/5761/fd’: Permission denied
find: ‘/proc/5751/task/5761/fdinfo’: Permission denied
find: ‘/proc/5751/task/5761/ns’: Permission denied
find: ‘/proc/5751/task/5762/fd’: Permission denied
find: ‘/proc/5751/task/5762/fdinfo’: Permission denied
find: ‘/proc/5751/task/5762/ns’: Permission denied
find: ‘/proc/5751/task/5763/fd’: Permission denied
find: ‘/proc/5751/task/5763/fdinfo’: Permission denied
find: ‘/proc/5751/task/5763/ns’: Permission denied
find: ‘/proc/5751/fd’: Permission denied
find: ‘/proc/5751/map_files’: Permission denied
find: ‘/proc/5751/fdinfo’: Permission denied
find: ‘/proc/5751/ns’: Permission denied
find: ‘/proc/6305/task/6305/fd’: Permission denied
find: ‘/proc/6305/task/6305/fdinfo’: Permission denied
find: ‘/proc/6305/task/6305/ns’: Permission denied
find: ‘/proc/6305/fd’: Permission denied
find: ‘/proc/6305/map_files’: Permission denied
find: ‘/proc/6305/fdinfo’: Permission denied
find: ‘/proc/6305/ns’: Permission denied
find: ‘/proc/8210/task/8210/fd’: Permission denied
find: ‘/proc/8210/task/8210/fdinfo’: Permission denied
find: ‘/proc/8210/task/8210/ns’: Permission denied
find: ‘/proc/8210/fd’: Permission denied
find: ‘/proc/8210/map_files’: Permission denied
find: ‘/proc/8210/fdinfo’: Permission denied
find: ‘/proc/8210/ns’: Permission denied
find: ‘/proc/10926/task/10926/fd’: Permission denied
find: ‘/proc/10926/task/10926/fdinfo’: Permission denied
find: ‘/proc/10926/task/10926/ns’: Permission denied
find: ‘/proc/10926/task/10931/fd’: Permission denied
find: ‘/proc/10926/task/10931/fdinfo’: Permission denied
find: ‘/proc/10926/task/10931/ns’: Permission denied
find: ‘/proc/10926/task/10932/fd’: Permission denied
find: ‘/proc/10926/task/10932/fdinfo’: Permission denied
find: ‘/proc/10926/task/10932/ns’: Permission denied
find: ‘/proc/10926/task/10933/fd’: Permission denied
find: ‘/proc/10926/task/10933/fdinfo’: Permission denied
find: ‘/proc/10926/task/10933/ns’: Permission denied
find: ‘/proc/10926/task/10934/fd’: Permission denied
find: ‘/proc/10926/task/10934/fdinfo’: Permission denied
find: ‘/proc/10926/task/10934/ns’: Permission denied
find: ‘/proc/10926/task/10935/fd’: Permission denied
find: ‘/proc/10926/task/10935/fdinfo’: Permission denied
find: ‘/proc/10926/task/10935/ns’: Permission denied
find: ‘/proc/10926/task/10936/fd’: Permission denied
find: ‘/proc/10926/task/10936/fdinfo’: Permission denied
find: ‘/proc/10926/task/10936/ns’: Permission denied
find: ‘/proc/10926/task/10937/fd’: Permission denied
find: ‘/proc/10926/task/10937/fdinfo’: Permission denied
find: ‘/proc/10926/task/10937/ns’: Permission denied
find: ‘/proc/10926/task/10938/fd’: Permission denied
find: ‘/proc/10926/task/10938/fdinfo’: Permission denied
find: ‘/proc/10926/task/10938/ns’: Permission denied
find: ‘/proc/10926/task/11251/fd’: Permission denied
find: ‘/proc/10926/task/11251/fdinfo’: Permission denied
find: ‘/proc/10926/task/11251/ns’: Permission denied
find: ‘/proc/10926/task/11252/fd’: Permission denied
find: ‘/proc/10926/task/11252/fdinfo’: Permission denied
find: ‘/proc/10926/task/11252/ns’: Permission denied
find: ‘/proc/10926/task/11253/fd’: Permission denied
find: ‘/proc/10926/task/11253/fdinfo’: Permission denied
find: ‘/proc/10926/task/11253/ns’: Permission denied
find: ‘/proc/10926/task/11254/fd’: Permission denied
find: ‘/proc/10926/task/11254/fdinfo’: Permission denied
find: ‘/proc/10926/task/11254/ns’: Permission denied
find: ‘/proc/10926/task/11255/fd’: Permission denied
find: ‘/proc/10926/task/11255/fdinfo’: Permission denied
find: ‘/proc/10926/task/11255/ns’: Permission denied
find: ‘/proc/10926/task/11256/fd’: Permission denied
find: ‘/proc/10926/task/11256/fdinfo’: Permission denied
find: ‘/proc/10926/task/11256/ns’: Permission denied
find: ‘/proc/10926/task/11257/fd’: Permission denied
find: ‘/proc/10926/task/11257/fdinfo’: Permission denied
find: ‘/proc/10926/task/11257/ns’: Permission denied
find: ‘/proc/10926/task/11258/fd’: Permission denied
find: ‘/proc/10926/task/11258/fdinfo’: Permission denied
find: ‘/proc/10926/task/11258/ns’: Permission denied
find: ‘/proc/10926/task/11259/fd’: Permission denied
find: ‘/proc/10926/task/11259/fdinfo’: Permission denied
find: ‘/proc/10926/task/11259/ns’: Permission denied
find: ‘/proc/10926/task/11260/fd’: Permission denied
find: ‘/proc/10926/task/11260/fdinfo’: Permission denied
find: ‘/proc/10926/task/11260/ns’: Permission denied
find: ‘/proc/10926/task/11261/fd’: Permission denied
find: ‘/proc/10926/task/11261/fdinfo’: Permission denied
find: ‘/proc/10926/task/11261/ns’: Permission denied
find: ‘/proc/10926/fd’: Permission denied
find: ‘/proc/10926/map_files’: Permission denied
find: ‘/proc/10926/fdinfo’: Permission denied
find: ‘/proc/10926/ns’: Permission denied
find: ‘/proc/10940/task/10940/fd’: Permission denied
find: ‘/proc/10940/task/10940/fdinfo’: Permission denied
find: ‘/proc/10940/task/10940/ns’: Permission denied
find: ‘/proc/10940/fd’: Permission denied
find: ‘/proc/10940/map_files’: Permission denied
find: ‘/proc/10940/fdinfo’: Permission denied
find: ‘/proc/10940/ns’: Permission denied
find: ‘/proc/12186/task/12186/fd’: Permission denied
find: ‘/proc/12186/task/12186/fdinfo’: Permission denied
find: ‘/proc/12186/task/12186/ns’: Permission denied
find: ‘/proc/12186/task/12192/fd’: Permission denied
find: ‘/proc/12186/task/12192/fdinfo’: Permission denied
find: ‘/proc/12186/task/12192/ns’: Permission denied
find: ‘/proc/12186/task/12193/fd’: Permission denied
find: ‘/proc/12186/task/12193/fdinfo’: Permission denied
find: ‘/proc/12186/task/12193/ns’: Permission denied
find: ‘/proc/12186/task/12194/fd’: Permission denied
find: ‘/proc/12186/task/12194/fdinfo’: Permission denied
find: ‘/proc/12186/task/12194/ns’: Permission denied
find: ‘/proc/12186/task/12195/fd’: Permission denied
find: ‘/proc/12186/task/12195/fdinfo’: Permission denied
find: ‘/proc/12186/task/12195/ns’: Permission denied
find: ‘/proc/12186/task/12196/fd’: Permission denied
find: ‘/proc/12186/task/12196/fdinfo’: Permission denied
find: ‘/proc/12186/task/12196/ns’: Permission denied
find: ‘/proc/12186/task/12197/fd’: Permission denied
find: ‘/proc/12186/task/12197/fdinfo’: Permission denied
find: ‘/proc/12186/task/12197/ns’: Permission denied
find: ‘/proc/12186/task/12198/fd’: Permission denied
find: ‘/proc/12186/task/12198/fdinfo’: Permission denied
find: ‘/proc/12186/task/12198/ns’: Permission denied
find: ‘/proc/12186/task/12199/fd’: Permission denied
find: ‘/proc/12186/task/12199/fdinfo’: Permission denied
find: ‘/proc/12186/task/12199/ns’: Permission denied
find: ‘/proc/12186/fd’: Permission denied
find: ‘/proc/12186/map_files’: Permission denied
find: ‘/proc/12186/fdinfo’: Permission denied
find: ‘/proc/12186/ns’: Permission denied
find: ‘/proc/12213/task/12213/fd’: Permission denied
find: ‘/proc/12213/task/12213/fdinfo’: Permission denied
find: ‘/proc/12213/task/12213/ns’: Permission denied
find: ‘/proc/12213/fd’: Permission denied
find: ‘/proc/12213/map_files’: Permission denied
find: ‘/proc/12213/fdinfo’: Permission denied
find: ‘/proc/12213/ns’: Permission denied
find: ‘/proc/13454/task/13454/fd’: Permission denied
find: ‘/proc/13454/task/13454/fdinfo’: Permission denied
find: ‘/proc/13454/task/13454/ns’: Permission denied
find: ‘/proc/13454/fd’: Permission denied
find: ‘/proc/13454/map_files’: Permission denied
find: ‘/proc/13454/fdinfo’: Permission denied
find: ‘/proc/13454/ns’: Permission denied
find: ‘/proc/14018/task/14018/fd’: Permission denied
find: ‘/proc/14018/task/14018/fdinfo’: Permission denied
find: ‘/proc/14018/task/14018/ns’: Permission denied
find: ‘/proc/14018/fd’: Permission denied
find: ‘/proc/14018/map_files’: Permission denied
find: ‘/proc/14018/fdinfo’: Permission denied
find: ‘/proc/14018/ns’: Permission denied
find: ‘/proc/14207/task/14207/fd’: Permission denied
find: ‘/proc/14207/task/14207/fdinfo’: Permission denied
find: ‘/proc/14207/task/14207/ns’: Permission denied
find: ‘/proc/14207/fd’: Permission denied
find: ‘/proc/14207/map_files’: Permission denied
find: ‘/proc/14207/fdinfo’: Permission denied
find: ‘/proc/14207/ns’: Permission denied
find: ‘/proc/14383/task/14383/fd’: Permission denied
find: ‘/proc/14383/task/14383/fdinfo’: Permission denied
find: ‘/proc/14383/task/14383/ns’: Permission denied
find: ‘/proc/14383/fd’: Permission denied
find: ‘/proc/14383/map_files’: Permission denied
find: ‘/proc/14383/fdinfo’: Permission denied
find: ‘/proc/14383/ns’: Permission denied
find: ‘/proc/15976/task/15976/fd’: Permission denied
find: ‘/proc/15976/task/15976/fdinfo’: Permission denied
find: ‘/proc/15976/task/15976/ns’: Permission denied
find: ‘/proc/15976/fd’: Permission denied
find: ‘/proc/15976/map_files’: Permission denied
find: ‘/proc/15976/fdinfo’: Permission denied
find: ‘/proc/15976/ns’: Permission denied
find: ‘/proc/17376’: No such file or directory
find: ‘/proc/17377’: No such file or directory
find: ‘/proc/17378’: No such file or directory
find: ‘/proc/17379’: No such file or directory
find: ‘/proc/17380’: No such file or directory
find: ‘/proc/17381’: No such file or directory
find: ‘/proc/17382’: No such file or directory
find: ‘/proc/17383’: No such file or directory
find: ‘/proc/17384’: No such file or directory
find: ‘/proc/17385’: No such file or directory
find: ‘/proc/17386’: No such file or directory
find: ‘/proc/17387’: No such file or directory
find: ‘/proc/17432/task/17432/fd’: Permission denied
find: ‘/proc/17432/task/17432/fdinfo’: Permission denied
find: ‘/proc/17432/task/17432/ns’: Permission denied
find: ‘/proc/17432/fd’: Permission denied
find: ‘/proc/17432/map_files’: Permission denied
find: ‘/proc/17432/fdinfo’: Permission denied
find: ‘/proc/17432/ns’: Permission denied
find: ‘/run/firewalld’: Permission denied
find: ‘/run/chrony’: Permission denied
find: ‘/run/user/0’: Permission denied
find: ‘/run/setroubleshoot’: Permission denied
find: ‘/run/cups/certs’: Permission denied
find: ‘/run/screen/S-root’: Permission denied
find: ‘/run/sudo’: Permission denied
find: ‘/run/svnserve’: Permission denied
find: ‘/run/mdadm’: Permission denied
find: ‘/run/rpcbind’: Permission denied
find: ‘/run/lvm’: Permission denied
find: ‘/run/log/journal/3dc67f142b1944bbbf7ce95ef99eaecf’: Permission denied
find: ‘/run/lock/iscsi’: Permission denied
find: ‘/run/lock/lvm’: Permission denied
find: ‘/run/systemd/inaccessible’: Permission denied
find: ‘/sys/kernel/debug’: Permission denied
find: ‘/etc/grub.d’: Permission denied
find: ‘/etc/pki/CA/private’: Permission denied
find: ‘/etc/pki/rsyslog’: Permission denied
find: ‘/etc/lvm/archive’: Permission denied
find: ‘/etc/lvm/backup’: Permission denied
find: ‘/etc/lvm/cache’: Permission denied
find: ‘/etc/dhcp’: Permission denied
find: ‘/etc/polkit-1/rules.d’: Permission denied
find: ‘/etc/polkit-1/localauthority’: Permission denied
find: ‘/etc/selinux/targeted/active’: Permission denied
find: ‘/etc/selinux/targeted/tmp’: Permission denied
find: ‘/etc/selinux/final’: Permission denied
find: ‘/etc/ntp/crypto’: Permission denied
find: ‘/etc/cups/ssl’: Permission denied
find: ‘/etc/firewalld’: Permission denied
find: ‘/etc/audisp’: Permission denied
find: ‘/etc/audit’: Permission denied
find: ‘/etc/sudoers.d’: Permission denied
find: ‘/etc/sssd’: Permission denied
find: ‘/etc/ipsec.d’: Permission denied
find: ‘/etc/libvirt’: Permission denied
find: ‘/etc/vmware-tools/GuestProxyData/trusted’: Permission denied
find: ‘/root’: Permission denied
find: ‘/var/tmp/yum-root-0UCJ73’: Permission denied
find: ‘/var/tmp/systemd-private-367606d032b940299d5c2644ad90e90a-chronyd.service-8RBMyl’: Permission denied
find: ‘/var/tmp/systemd-private-367606d032b940299d5c2644ad90e90a-cups.service-hdJOPc’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/1’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/2’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/3’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/4’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/5’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/6’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/7’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/8’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/9’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/10’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/11’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/12’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/13’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/14’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/15’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/16’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/17’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/18’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/19’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/20’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/21’: Permission denied
find: ‘/var/lib/yum/history/2018-09-12/22’: Permission denied
find: ‘/var/lib/machines’: Permission denied
find: ‘/var/lib/polkit-1’: Permission denied
find: ‘/var/lib/NetworkManager’: Permission denied
find: ‘/var/lib/postfix’: Permission denied
find: ‘/var/lib/rsyslog’: Permission denied
find: ‘/var/lib/authconfig’: Permission denied
find: ‘/var/lib/samba/private’: Permission denied
find: ‘/var/lib/rpcbind’: Permission denied
find: ‘/var/lib/certmonger/cas’: Permission denied
find: ‘/var/lib/certmonger/local’: Permission denied
find: ‘/var/lib/certmonger/requests’: Permission denied
find: ‘/var/lib/tpm’: Permission denied
find: ‘/var/lib/libvirt/boot’: Permission denied
find: ‘/var/lib/libvirt/filesystems’: Permission denied
find: ‘/var/lib/libvirt/images’: Permission denied
find: ‘/var/lib/libvirt/network’: Permission denied
find: ‘/var/lib/libvirt/qemu’: Permission denied
find: ‘/var/lib/pulse’: Permission denied
find: ‘/var/lib/gdm’: Permission denied
find: ‘/var/lib/setroubleshoot’: Permission denied
find: ‘/var/lib/gssproxy/clients’: Permission denied
find: ‘/var/lib/gssproxy/rcache’: Permission denied
find: ‘/var/lib/nfs/statd’: Permission denied
find: ‘/var/lib/udisks2’: Permission denied
find: ‘/var/lib/sss/db’: Permission denied
find: ‘/var/lib/sss/pipes/private’: Permission denied
find: ‘/var/lib/sss/secrets’: Permission denied
find: ‘/var/lib/sss/keytabs’: Permission denied
find: ‘/var/lib/mlocate’: Permission denied
find: ‘/var/lib/nvidia’: Permission denied
find: ‘/var/lib/mysql’: Permission denied
find: ‘/var/lib/mysql-files’: Permission denied
find: ‘/var/lib/mysql-keyring’: Permission denied
find: ‘/var/log/audit’: Permission denied
find: ‘/var/log/samba’: Permission denied
find: ‘/var/log/ppp’: Permission denied
find: ‘/var/log/pluto/peer’: Permission denied
find: ‘/var/log/speech-dispatcher’: Permission denied
find: ‘/var/log/libvirt’: Permission denied
find: ‘/var/log/gdm’: Permission denied
find: ‘/var/log/sssd’: Permission denied
find: ‘/var/cache/ldconfig’: Permission denied
find: ‘/var/cache/libvirt’: Permission denied
find: ‘/var/cache/cups’: Permission denied
find: ‘/var/db/sudo’: Permission denied
find: ‘/var/empty/sshd’: Permission denied
find: ‘/var/spool/cron’: Permission denied
find: ‘/var/spool/postfix/active’: Permission denied
find: ‘/var/spool/postfix/bounce’: Permission denied
find: ‘/var/spool/postfix/corrupt’: Permission denied
find: ‘/var/spool/postfix/defer’: Permission denied
find: ‘/var/spool/postfix/deferred’: Permission denied
find: ‘/var/spool/postfix/flush’: Permission denied
find: ‘/var/spool/postfix/hold’: Permission denied
find: ‘/var/spool/postfix/incoming’: Permission denied
find: ‘/var/spool/postfix/maildrop’: Permission denied

preprocessing before triggering 'preprocess.sh' for ontonotes

Hello,
Can anyone suggest on the data processing to be done on conll2012 before calling the following?

./bin/preprocess.sh conf/ontonotes/dilated-cnn.conf
Currently, simply calling the preprocess.sh script as above, does not write anything to the file mentioned below and goes into an infinite loop I suppose.
data/vocabs/ontonotes_cutoff_4.txt

I've downloaded the train v4, dev v4 and test v9 tarballs from
http://conll.cemantix.org/2012/data.html

Edit:
I could convert the ontonotes files successfully to conll format but not sure of the directory structure to trigger the preprocessing script. Can you help?
The following is my directory structure:

$DILATED_CNN_NER_ROOT/data/conll-formatted-ontonotes-5.0

*structure for $DILATED_CNN_NER_ROOT/data/conll-formatted-ontonotes-5.0 ( this directory has all the _gold_conll files. Please take a direcotry below as an example:
/home/ss06886910/Strubel_IDCNN/data/conll-formatted-ontonotes-5.0/data/train/data/english/annotations/wb/c2e/00/c2e_0028.v4_gold_conll)

conll-formatted-ontonotes-5.0
├── data
│   ├── development
│   │   └── data
│   │       ├── arabic
│   │       │   └── annotations
│   │       ├── chinese
│   │       │   └── annotations
│   │       └── english
│   │           └── annotations
│   ├── test
│   │   └── data
│   │       ├── arabic
│   │       │   └── annotations
│   │       ├── chinese
│   │       │   └── annotations
│   │       └── english
│   │           └── annotations
│   └── train
│       └── data
│           ├── arabic
│           │   └── annotations
│           ├── chinese
│           │   └── annotations
│           └── english
│               └── annotations
└── scripts

Tried running with the following parameter in ontonotes.conf ;
export raw_data_dir="$DATA_DIR/conll-formatted-ontonotes-5.0/data"
($DATA_DIR = $DILATED_CNN_NER_ROOT/data)

And, I get the following error:

Processing file: data/conll-formatted-ontonotes-5.0/data/development
python /home/ss06886910/Strubel_IDCNN/src/tsv_to_tfrecords.py --in_file data/conll-formatted-ontonotes-5.0/data/development --out_dir /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/development --window_size 3 --update_maps False --dataset ontonotes --update_vocab /home/ss06886910/Strubel_IDCNN/data/vocabs/ontonotes_cutoff_4.txt --vocab /home/ss06886910/Strubel_IDCNN/data/embeddings/lample-embeddings-pre.txt --labels /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/train/label.txt --shapes /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/train/shape.txt --chars /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/train/char.txt
Embeddings coverage: 98.67%
Processing file: data/conll-formatted-ontonotes-5.0/data/test
python /home/ss06886910/Strubel_IDCNN/src/tsv_to_tfrecords.py --in_file data/conll-formatted-ontonotes-5.0/data/test --out_dir /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/test --window_size 3 --update_maps False --dataset ontonotes --update_vocab /home/ss06886910/Strubel_IDCNN/data/vocabs/ontonotes_cutoff_4.txt --vocab /home/ss06886910/Strubel_IDCNN/data/embeddings/lample-embeddings-pre.txt --labels /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/train/label.txt --shapes /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/train/shape.txt --chars /home/ss06886910/Strubel_IDCNN/data/ontonotes-w3-lample/train/char.txt
Traceback (most recent call last):
  File "/home/ss06886910/Strubel_IDCNN/src/tsv_to_tfrecords.py", line 498, in <module>
    tf.app.run()
  File "/home/ss06886910/IDCNN/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/ss06886910/Strubel_IDCNN/src/tsv_to_tfrecords.py", line 494, in main
    tsv_to_examples()
  File "/home/ss06886910/Strubel_IDCNN/src/tsv_to_tfrecords.py", line 487, in tsv_to_examples
    print("Embeddings coverage: %2.2f%%" % ((1-(num_oov/num_tokens)) * 100))
ZeroDivisionError: division by zero

Regards

What is the use of projection layer

https://github.com/iesl/dilated-cnn-ner/blob/master/src/cnn.py#L252
There is a projection layer that is mentioned in the code what is the advantage found by using the projection layer.
I couldn't find any experimental evidence being mentioned in the paper.

Any pytorch version of dilated-cnn-ner around ?

Like mentioned in question, I would like to ask if there is any pytorch version of this work ?

Support for Tensorflow 1.13

How difficult would it be to update this code to work with higher versions of tensorflow (like 1.13)? Does the entire model have to be rewritten (i.e modifying code from cnn.py/cnn_char.py) or just the training/preprocessing procedure (i.e train.py and preprocess.py)?

The reason I'm asking is because I'm try to speed up the inference times of the models. Currently, the preprocessing is taking the bulk of the time, and in particular, I'm getting some really weird behavior with running my preprocessing inside of a Tensorflow session versus outside.

For example, I have this method, batch_sentence_preprocess, which takes in a bunch of sentences and splits them up into preprocessed batches of a certain batch-size. While batching, I have another method, single_sentence_preprocess, which does preprocessing on a single sentence.

def single_sentence_preprocess(sentence, token_map, shape_map, char_map, token_int_str_map, shape_int_str_map, char_int_str_map, multiprocess=False):

def batch_sentence_preprocess(total_sentences, token_map, shape_map, char_map, token_int_str_map, shape_int_str_map, char_int_str_map, batch_size=128):

Interestingly, when I run batch_sentence_preprocess outside of the Tensorflow session, the average time it takes to preprocess a single sentence hovers around 0.0004 seconds. However, when I run batch_sentence_preprocess inside of the Tensorflow session, the average time it takes to preprocess a single sentence starts at 0.0014 seconds stays there for about halfway through batching all the sentences until it then decreases to .0004 seconds. I attached a log so you can see this behavior.

Doing batching outside of Tensorflow session

Average sent preprocess time per batch 0.000493312254548
Average sent preprocess time per batch 0.0004897788167
Average sent preprocess time per batch 0.000492854043841
Average sent preprocess time per batch 0.000502722337842
...
Average sent preprocess time per batch 0.000503158196807
Average sent preprocess time per batch 0.000500967726111
Average sent preprocess time per batch 0.000469474121928
Average sent preprocess time per batch 0.000504978001118
Average sent preprocess time per batch 0.000480454415083

Doing batching inside of Tensorflow session (inside of with sv.managed_session(FLAGS.master, config=config) as sess:)

Average sent preprocess time per batch 0.0021103117615
Average sent preprocess time per batch 0.00134823098779
Average sent preprocess time per batch 0.00142573565245
Average sent preprocess time per batch 0.00144574232399
Average sent preprocess time per batch 0.00143481045961
Average sent preprocess time per batch 0.00143280252814
...
Average sent preprocess time per batch 0.000499838963151
Average sent preprocess time per batch 0.000491224229336
Average sent preprocess time per batch 0.000479275360703
Average sent preprocess time per batch 0.000500436872244
Average sent preprocess time per batch 0.000483065843582
Average sent preprocess time per batch 0.0004892392592

I suspect it is because of how memory is allocated within a Tensorflow session versus outside. Do you have any idea why this could be happening? I'm confused because the batching code doesn't use any of the tensorflow libraries; it's mostly just loops, list/dict operations, and numpy arrays.

I think this could also be because of a bug with older Tensorflow session managers. I was hoping that I could easily upgrade the code to Tensorflow 1.13 to see if this problem gets resolved and so that I can use some of the multiprocessing features available in the newer tf.data modules .

int() argument must be a string, a bytes-like object or a number, not 'map'

I get the following error first up upon running preprocess.sh as ' ./bin/preprocess.sh conf/conll/dilated-cnn.conf':

Processing file: /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data//conll2003//eng.train
python /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py --in_file /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data//conll2003//eng.train --out_dir /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/conll2003-w3-lample/eng.train --window_size 3 --update_maps True --dataset conll2003 --update_vocab /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/vocabs/conll2003_cutoff_4.txt --vocab /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/embeddings/lample-embeddings-pre.txt
/opt/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py", line 498, in
tf.app.run()
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py", line 494, in main
tsv_to_examples()
File "/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py", line 392, in tsv_to_examples
toks, oov, sent = make_example(writer, line_buf, label_map, token_map, shape_map, char_map, update_vocab, update_chars)
File "/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py", line 237, in make_example
intmapped_labels[pad_width:pad_width+len(labels)] = map(lambda s: label_map[s], labels)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'map'

This is again followed by this:

Processing file: /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data//conll2003//eng.testa
python /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py --in_file /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data//conll2003//eng.testa --out_dir /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/conll2003-w3-lample/eng.testa --window_size 3 --update_maps False --dataset conll2003 --update_vocab /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/vocabs/conll2003_cutoff_4.txt --vocab /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/embeddings/lample-embeddings-pre.txt --labels /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/conll2003-w3-lample/eng.train/label.txt --shapes /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/conll2003-w3-lample/eng.train/shape.txt --chars /home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/conll2003-w3-lample/eng.train/char.txt
/opt/anaconda3/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py", line 498, in
tf.app.run()
File "/opt/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py", line 494, in main
tsv_to_examples()
File "/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/src/tsv_to_tfrecords.py", line 330, in tsv_to_examples
with open(FLAGS.labels, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ss06886910/NLP/Bilstm_Dilated_CNN_NER/data/conll2003-w3-lample/eng.train/label.txt'

Am guessing if it is unable to create the file:
'Bilstm_Dilated_CNN_NER/data/conll2003-w3-lample/eng.train/label.txt'

;due to the initial TypeError up top.

Version info:

print(tf.__version__)
!python -V

1.10.1
Python 3.6.5 :: Anaconda, Inc.

Can you help here Ms. @strubell, please?

Regards

Make pip-installable & add API for easy cross-proj use

@eddotman I think there's an email about this but let's move the conversation here -- I think the best way for us to integrate this tagger with the main predsynth code is for me to (1) add a nice API which takes tokens and labels them (we should sync on the best way to do this, but I know @MSheshera has already coded something up that does this) and (2) make this project pip-installable so that you can just install it and then use it as a library. How does this solution sound?

Issue in preprocessing ontonotes

Hi,
I am facing some issue with preprocessing ontonotes.

First i used the script of conll-2012 shared task to generate the *_gold_conll files which contains the annotations. For example:
conll-2012/v4/data/train/data/english/annotations/bn/cnn/03/cnn_0301.v4_gold_conll
In conll-2012/v4/data directory i have 3 sub directories: train/dev/test

My question is: How should i group *_gold_conll files in order to fit the required format of preprocess.py? Should i concatenate all *_gold_conll in one file for train/dev/test?

Regards

Training issue

i am face this error while training. any solution?

How to run model with character embeddings?

Hi,
according to Table 1 of your paper, "No models use character embeddings or lexicons. " The character embeddings is very useful for sequence labeling as stated by (G Lample - ‎2016) and (X Ma - ‎2016). I also found that your code has already implemented it (bilstm_char.py and cnn_char.py), so I modified the L13~L15 of global.conf to run model with character embeddings:

export char_dim=30
export char_tok_dim=30
export char_model="cnn"

Then, I ran the following steps (for ID-CNN-CRF model):

./bin/preprocess.sh conf/conll/dilated-cnn-viterbi.conf
./bin/train-cnn.sh conf/conll/dilated-cnn-viterbi.conf
./bin/eval-cnn.sh conf/conll/dilated-cnn-viterbi.conf --load_model /home/ljzhao/dilated-cnn-ner-master/models/dilated-cnn-viterbi.tf test

Is this right?

I also found that L12 of eval-cnn.sh needs to be changed to the following line to correctly evaluate on the test set:

if [[ $4 == "test" ]]; then

G Lample - ‎2016: Neural Architectures for Named Entity Recognition
X Ma - ‎2016: End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

Questions regarding differences between the implementation and the experiment details in the research paper

Are configuration files similar to the Paper?
https://github.com/iesl/dilated-cnn-ner/blob/master/conf/ontonotes/dilated-cnn.conf
I find that number of the convolutional layers in the configuration files is 3 but the paper has mentioned that each block contains 4 layers. So are these configuration files aren't similar to the ones used in the paper?

Secondly, I find an extra convolution layer for initial projection used in this implementation, what is the significance of the extra convolutional layer? I find that nothing of such is mentioned in the research paper.

collobert embeddings

Can you share collobert 50d embeddings

InvalidArgumentError: indices[11,21] = 243838 is not in [0, 243245)

Hello,

I could run training and evaluation portions of the script.
Next, I'd converted Ontonotes 5.0 files into conll format as below (manually):

SOCCER NN I-NP O
- : O O
JAPAN NNP I-NP I-LOC
GET VB I-VP O
LUCKY NNP I-NP O
WIN NNP I-NP O
, , O O
CHINA NNP I-NP I-PER
IN IN I-PP O
SURPRISE DT I-NP O
DEFEAT NN I-NP O

And then I've tried to run it as I would for a CONLL file (since I couldn't successfully extract the Ontonotes file into the CONLL format using skel2conll.sh).
The file was divided into test and train files and then Preprocessed but, I still get an issue here:
InvalidArgumentError (see above for traceback): indices[2,34] = 243461 is not in [0, 243245)

The preprocessing should prepare embeddings on the entire corpus. But, why does it still complain of the words embedding beyond range?

Regards

Did you try your model on other seq labeling tasks like Chunking or POS?

Hi, I am doing some work on seq labeling task.

Being curious about the results of your model on such kind of datasets, like Chunking and POS.

Ontonotes train/test/split

I unable get the results specified in your paper using the train/test/dev split using conll2012 v4

Which version did you use ?

details on the accuracy

Hello,

Is it feasible to get a classification report on each class from your reported findings? Something like one below. Moreover, the 'O' class is the dominant class. There's a class imbalance in the CONLL 2003 dataset due to which the F1 score's skewed towards the dominant class. How was the 'O' class dealt with since, the paper reports only 4 classes PER, ORG, LOC, MISC?

                     precision    recall  f1-score   support

B-LOC
                        0.802     0.775     0.788      4196
B-MISC
                        0.673     0.722     0.697      2026
 B-ORG
                        0.557     0.442     0.493      2929
 B-PER
                        0.685     0.674     0.680      2472
 I-LOC
                        0.779     0.624     0.693      1023
I-MISC
                         0.451     0.346     0.392       468
 I-ORG
                         0.803     0.522     0.633      3271
 I-PER
                          0.770     0.651     0.706      1543
  O
                          0.843     0.986     0.909    143315

avg / total         0.829     0.946     0.881    161243

Regards

Is the dilated cnn ner model stable?

I find the inference result is not always the same when I doing ner task by dilated cnn model. Is the dilated cnn model stable and why?

Training File

the training file it required should be in this format
China NNP I-NP I-LOC
but i want to train it on this format
China NNP I-LOC
is there any way to do so??

Inconsistent results when predicting a single sentence versus predicting labels for dev set

I'm currently trying to modify the code to predict the labels for an arbitrary sentence (not part of the train, dev, or test set). I was getting weird behavior when looking at the labels, particularly the padding tokens had its own label and some of the tokens were not being labelled correctly.

This discrepancy even existed when I tried to predict the labels for one of the sentences in my dev set. I was able to get the predicted labels from the dev set using the code from #16. When I compared the predicted labels from the dev test to the same single sentence prediction, some of the labels were off.

I decided to look at the feature vectors that were generated in the preprocessing stage for both of sentences and surprisingly, both vectors were almost the exact same (except for some padding).

token vector:

sentence from dev set
[     0  18152  18152  18152  18152   1279  12659   3128      4      3
    338      7  18152  18152  18152  18152 243801      6   4809   7282
  50959     11  31481   1020  89238      6  33746      7  10587   2578
    813 243021     40   1020     11     60      7     92      4     40
     10   4704   9889      4    106     21  12404    334      7   7746
      7   1296   3995      4   2486   4816      7  10587      4     11
      3    220      7     10  59651      5      0      0      0      0
      0]

single sentence:
[     0  18152  18152  18152  18152   1279  12659   3128      4      3
    338      7  18152  18152  18152  18152 243801      6   4809   7282
  50959     11  31481   1020  89238      6  33746      7  10587   2578
    813 243021     40   1020     11     60      7     92      4     40
     10   4704   9889      4    106     21  12404    334      7   7746
      7   1296   3995      4   2486   4816      7  10587      4     11
      3    220      7     10  59651      5      0]

char vector:

sentence from dev set
[ 0 29 29 29 31 29 27 28 30 24 24 26 29 27 31 31 28 45  6  7 15  5  7 15
  5  3  4 12 18 15 20  7  4 16 20 15  5 36  4  6  7  3 20  7  3 18 34 29
 31 30 31 23 23 23 24 30 33 26 25 23 25 25 26 12 15  5  7 15  5 12 35 12
 17 12  4 38  4 18  8  3 12 15  5  4  7  3  9 12 17 38  9 12 14 12 15 12
  5  6  7  5  3 15  9  9 12  5  3  8  8  7  3 20  5 40  5  7 15  5 12 35
 12 17 12  4 38  4 18  7 37  4 20  7 14  7  5 18 34  4  7 14  8  7 20  3
  4 16 20  7  3  8  8  7  3 20  5  5 18 18 15 31 24 27 21 27 31 23 21 23
 30 21 31 33 33  3 34  4  7 20 40  3 15  9 17  3  5  4 18 34  3 17 17 36
  3 34  4  7 20  3 32 18 15  5 12  9  7 20  3 35 17  7 12 15  4  7 20 44
  3 17 36  4  6  7 20  7 12  5  5 12 14 16 17  4  3 15  7 18 16  5 20  7
  4 16 20 15 18 34  3  8  8 20  7 32 12  3  4 12 18 15 18 34 17 12 39  6
  4  4 18 16 32  6 36 14 18  9  7 20  3  4  7  9  7 39 20  7  7  5 18 34
  4  7 14  8  7 20  3  4 16 20  7 36  3 15  9  4  6  7  8 18 12 15  4  5
 18 34  3 32 18 14  8  3  5  5 21  0]

single sentence:
[ 0 29 29 29 31 29 27 28 30 24 24 26 29 27 31 31 28 45  6  7 15  5  7 15
  5  3  4 12 18 15 20  7  4 16 20 15  5 36  4  6  7  3 20  7  3 18 34 29
 31 30 31 23 23 23 24 30 33 26 25 23 25 25 26 12 15  5  7 15  5 12 35 12
 17 12  4 38  4 18  8  3 12 15  5  4  7  3  9 12 17 38  9 12 14 12 15 12
  5  6  7  5  3 15  9  9 12  5  3  8  8  7  3 20  5 40  5  7 15  5 12 35
 12 17 12  4 38  4 18  7 37  4 20  7 14  7  5 18 34  4  7 14  8  7 20  3
  4 16 20  7  3  8  8  7  3 20  5  5 18 18 15 31 24 27 21 27 31 23 21 23
 30 21 31 33 33  3 34  4  7 20 40  3 15  9 17  3  5  4 18 34  3 17 17 36
  3 34  4  7 20  3 32 18 15  5 12  9  7 20  3 35 17  7 12 15  4  7 20 44
  3 17 36  4  6  7 20  7 12  5  5 12 14 16 17  4  3 15  7 18 16  5 20  7
  4 16 20 15 18 34  3  8  8 20  7 32 12  3  4 12 18 15 18 34 17 12 39  6
  4  4 18 16 32  6 36 14 18  9  7 20  3  4  7  9  7 39 20  7  7  5 18 34
  4  7 14  8  7 20  3  4 16 20  7 36  3 15  9  4  6  7  8 18 12 15  4  5
 18 34  3 32 18 14  8  3  5  5 21  0]

seq len vector:

sentence from dev set
[65]
single sentence:
[65]

tok len vector:

sentence from dev set
[ 1  4  4  4  4  4  9  7  1  3  4  2  4  4  4  4 13  2  4  8 10  3 10  1
 11  2  8  2 11  7  4 14  5  1  3  4  2  3  1  5  1 12  8  1  5  2 12  6
  2 12  2  5  5  1  8  7  2 11  1  3  3  6  2  1  7  1  1  0  0  0  0]

single sentence:
[ 1  4  4  4  4  4  9  7  1  3  4  2  4  4  4  4 13  2  4  8 10  3 10  1
 11  2  8  2 11  7  4 14  5  1  3  4  2  3  1  5  1 12  8  1  5  2 12  6
  2 12  2  5  5  1  8  7  2 11  1  3  3  6  2  1  7  1  1]

Predictions:

sentence from dev set
[3, 4, 4, 5, 0, 0, 0, 0, 0, 0, 0, 3, 4, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

single sentence (without the padding tokens):
[3, 4, 4, 5, 8, 16, 6, 8, 6, 0, 0, 3, 4, 4, 4, 5, 8, 14, 8, 0, 6, 0, 8, 0, 8, 0, 6, 0, 6, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 8, 8, 8, 6, 6, 6,6, 8, 0, 8, 0, 6, 8, 6, 6, 6, 6, 0, 0]

You can see that some of the labels are similar (i.e 3 4 5 are a B, I, L group for a certain class and 10 is a U group for a certain class). However, the single sentence prediction has a bunch of additional predicted classes for the exact same sentence with virtually the same feature vectors.

I'm wondering if there is some internal state maintained in the model with context which would indicate why the results when running on the dev set is different than a single sentence prediction. I didn't enable the documents flag so I thought each sentence would be treated as separate. If this is not the case, how can I train the model to treat each sentence separately?

If there is no internal state maintained in the model, could padding (which is the only difference between the feature vectors) play a large role in the prediction of classes? In that case, how should I be padding the feature vectors for single sentence prediction? What I'm doing right now is just padding it to the maximum length of each feature vector in the batch, except for 1 sentence, the batch size is 1.

Thanks in advance.

Inference

@strubell @ghaddarAbs How to write script that can take text string as input and return label for each token in the the string .

Replicating test score yields dev score

Could it be that running the command to obtain the evaluation on the test set actually returns the dev score? Both commands have exactly the same output on my system, which seems rather odd.

You can find the output here:
https://github.com/sbrugman/dilated-cnn-ner/blob/master/README.md#replication-results

can't replicate the results in this paper

Hi,

I ran this code according to the instructions, but I can't replicate the results in this paper. The F1 score I got is 89.23. Could you tell me how can I replicate the results? The following is the steps I ran this code.

I copy the conll 2003 dataset from https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003, and place the files in data/conll2003.
Set up environment variables.

export DILATED_CNN_NER_ROOT=`pwd`
export DATA_DIR=data

Get the Glove word embeddings from http://nlp.stanford.edu/data/glove.6B.zip, and use the 100d vector "glove.6B.100d.txt". The file is renamed "lample-embeddings-pre.txt" and I place the file in data/embeddings.
Perform all data preprocessing.

./bin/preprocess.sh conf/conll/dilated-cnn.conf

Train a tagger.

./bin/train-cnn.sh conf/conll/dilated-cnn.conf

The output of the final iteration is:
Evaluate a saved model on the test set:

./bin/eval-cnn.sh conf/conll/dilated-cnn.conf --load_model /home/ljzhao/dilated-cnn-ner-master/models/dilated-cnn.tf test

The output is:

MatSci domain word shapes

So looking at the word shape implementation here: https://github.com/iesl/dilated-cnn-ner/blob/master/src/preprocess.py#L64-L90

I think it makes sense to incorporate numbers: https://github.com/olivettigroup/synthesis-database/blob/master/synthesisdatabase/classifiers/token_classifier.py#L144-L159

Haven't tested rigorously between these (and related) methods but intuitively speaking, adding numbers into the word shape should help a lot with chemical formulas.

Need some clarification on the settings

In global.conf file you used,

export char_dim=0

So there is no character level lstm or cnn. So when you publish the results in the paper, does those model contains no characeter level model (like rnn or cnn)?

If you use character level CNN then why did you not take bias here while doing convolution,
https://github.com/iesl/dilated-cnn-ner/blob/master/src/cnn_char.py#L63

for id-cnn dilation in conf file if you take 'block_repeat=1'. isn't that mean you only take a single block? then what's the point of your multi-block training.

Where can I get the config files to reproduce the Table 1 in the paper.

shape embedding initialize

            if self.use_shape:
                shape_embeddings_shape = (self.shape_domain_size - 1, self.shape_size)
                w_s = tf_utils.initialize_embeddings(shape_embeddings_shape, name="w_s")
                shape_embeddings = tf.nn.embedding_lookup(w_s, input_x2)
                input_list.append(shape_embeddings)
                input_size += self.shape_size

in bilstm.py 123 line , w_s will be initialize every forward , there is something wrong ?

Ithink should init w_s in init function

  if self.use_shape:
        shape_embeddings_shape = (self.shape_domain_size - 1, self.shape_size)
        self.w_s = tf_utils.initialize_embeddings(shape_embeddings_shape, name="w_s")

and in forward function write like this:

 if self.use_shape:
        # shape_embeddings_shape = (self.shape_domain_size - 1, self.shape_size)
        # w_s = tf_utils.initialize_embeddings(shape_embeddings_shape, name="w_s")
        shape_embeddings = tf.nn.embedding_lookup(self.w_s, input_x2)
        input_list.append(shape_embeddings)
        input_size += self.shape_size

Question for the paper (only)

According to the text in this paper.

Can I say, the total number of dilations is:

total = ( 1 (initial one, equation 5) + Lc (we have Lc layers) + 1 (final dilation-1) ) * L_b = L_b (7+ L_c)

One question is that "In the L_c layers of dilated convolutions":
If L_c = 1, that means in equation 6, we are also adopting dilation-1 convolution right.

But in your config file, what confuses me is you have a dilation-2, which means L_c = 2. Then you should have 4 convolutions here. but I can see there is only three

layers="{'conv1': {'dilation': 1, 'width': $filter_width, 'filters': $num_filters, 'initialization': '$initialization', 'take': false}, \
         'conv2': {'dilation': 2, 'width': $filter_width, 'filters': $num_filters, 'initialization': '$initialization', 'take': false}, \
         'conv3': {'dilation': 1, 'width': $filter_width, 'filters': $num_filters, 'initialization': '$initialization', 'take': true}}"

Validate model on real text data

Hello,

Just curious, did we try to enter custom sentences and check if it outputs a list of tags? Can this code be used to do that after training the model?

Regards

Nan problem during training on ontonotes data set

Hi Strubel, I ran "./bin/train_cnn.sh ./conf/ontonotes/bilstm-viterbi-doc.conf" for ontonotes data set. In the tenth iteration, there was a NAN problem, and a running warning that "RuntimeWarning: invalid value encountered in reduce return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ". After that, all values printed are "nan". Can you give me some suggestions?

iesl / dilated-cnn-ner Goto Github PK

dilated-cnn-ner's People

Contributors

Stargazers

Watchers

Forkers

dilated-cnn-ner's Issues

Recommend Projects

Recommend Topics

Recommend Org