Creating a Taxonomic eScience
CATE is a web application intended to be used by taxonomists and biodiversity scientists to manage and publish their data. It is built as a Java web application.
Installation
Embedded Mode for testing: from ${build.root} execute 'mvn tomcat7:run' Webapp Mode: Deploy the web application using cargo:deploy | cargo:redeploy Standard Mode: Deploy onto a linux box using yum & chef / puppet Cloud Mode: Deploy onto ec2 autoscaling instance(s) & RDS HA Mode: Deploy onto ec2 instances, use Kundera / Cassandra / Zookeeper / Solr Cloud
Create s3 bucket which acts as yum repo. Mount at yum.cateproject.net/repo Use fedora-19-based ami and puppet for configuration
Search - Amazon CloudSearch Geoserver - Multiple geoserver instances behind ELB (http://geo.cate-project.net) Message Broker - SNS & SQS using Spring integration Logging - slf4j & logstash & SimpleDB Alerting - Cloudwatch & SNS
History
Its origins lie in the three-year NERC funded project of the same name, although very little of code written during the CATE project can be found in this application. It more closely resembles the eMonocot project which followed on from CATE.
The first application CATE suffered from a number of shortcomings including, but not limited to, an overly complex data model, difficulty navigating and editing the data, a lack of integration with other applications used by biodiversity scientists, and
TODO
Solr - add cate schema and config to rpm
- restrict access to localhost or ip range
- upgrade to 4.7.0
- create custom coreadminhandler which creates core and conf directories on create if they do not exist and copies config into those directories
General
- ensure feature parity with eMonocot
- search
- taxon pages
- key player
- provenance
- refactor job launch / instance / execution controllers
- refactor data layer to use spring data jpa and spring solr
- ensure admin api works
- status endpoint
- export sdd and delta
- import and export NEXUS (dataset)
- import phyloxml, newick, nhx, video, audio (media)
Batch
- Add annotations
Unit testing
- improve!
Replace auto-increment with sequences
Define
-
Services
-
Features
BUILD AMI
Packer
{
"variables": {
"aws_access_key" : "",
"aws_secret_key": ""
},
"builders" : [{
"type" : "amazon-ebs",
"access_key" : "{{user aws_access_key
}}",
"secret_key" : "{{user aws_secret_key
}}",
"region" : "eu-west-1",
"source_ami": "blah",
"ssh_username" : "fedora",
"ami_name" : "cate {{timestamp}}"
}],
"provisioners" : [
{
"type" : "shell"
"inline" : [ "sudo yum install puppet" ]
},
{
"type" : "file",
"source" : "puppet/",
"destination" : "/puppet"
},
{
"type" : "puppet-masterless",
"manifest_file" : "site.pp",
"manifest_dir" : "puppet/manifests"
"modules_dir" : [ "puppet/modules" ]
}]
}
puppet setup firewall setup security install open jdk, mysql install tomcat7-slf4j-logback, geoserver, solr, apache-activemq, cate from local rpms configure cate.conf
aws ec2 import-keypair --public-key-file .ssh/pk-ec2.pem ben
aws ec2 create-security-group --group-name web --description 'All public facing web instances' aws ec2 authorize-security-group-ingress --group-name database --cidr 0.0.0.0/0 --port 22 --protocol tcp aws ec2 authorize-security-group-ingress --group-name database --cidr 0.0.0.0/0 --port 8080 --protocol tcp aws ec2 run-instances --image-id ami-29a2595e --count 1 --instance-type m3.medium --key-name ben --security-groups web
ssh -l fedora -i ~/.ec2/pk-ec2.pem ec2-54-72-202-116.eu-west-1.compute.amazonaws.com
aws ec2 create-security-group --group-name database --description 'this RDS is only available on the necessary ports' aws ec2 authorize-security-group-ingress --group-name database --cidr 0.0.0.0/0 --port 3306 --protocol tcp aws rds create-db-parameter-group --db-parameter-group-name default --db-parameter-group-family mysql --description "Default database parameters" aws rds modify-db-parameter-group --db-parameter-group-name default --parameters "ParameterName=lower_case_table_names,ParameterValue=1,ApplyMethod=pending-reboot" aws rds modify-db-parameter-group --db-parameter-group-name default --parameters "ParameterName=character_set_server,ParameterValue=utf8,ApplyMethod=immediate" aws rds modify-db-parameter-group --db-parameter-group-name default --parameters "ParameterName=collation_server,ParameterValue=utf8_general_ci,ApplyMethod=immediate" aws rds modify-db-parameter-group --db-parameter-group-name default --parameters "ParameterName=max_allowed_packet,ParameterValue=1073741824,ApplyMethod=immediate"
aws rds create-db-instance --db-instance-identifier cate-database --db-parameter-group-name default --engine MySQL5.1 --db-instance-class db.m1.small --allocated-storage 5 --master-username <> --master-user-password <> --vpc-security-group-ids sg-9c7a8af9 --backup-retention-period 3
VBoxManage setextradata "packer-test_default_1399911957168_10717" "VBoxInternal/Devices/e1000/0/LUN#0/Config/HostResolverMappings/cateproject/HostIP" 10.0.2.3 VBoxManage setextradata "packer-test_default_1399911957168_10717" "VBoxInternal/Devices/e1000/0/LUN#0/Config/HostResolverMappings/cateproject/HostNamePattern" "*.cate-project.net"
VBoxManage setextradata "packer-test_default_1399911957168_10717" "VBoxInternal/Devices/pcnet/0/LUN#0/Config/HostResolverMappings/cateproject/HostIP" 10.0.2.3
VBoxManage setextradata "packer-test_default_1399911957168_10717" "VBoxInternal/Devices/pcnet/0/LUN#0/Config/HostResolverMappings/cateproject/HostNamePattern" "*.cate-project.net"
SETUP Application Server
sudo yum -y install java sudo yum -y install ecj apache-commons-collections apache-commons-dbcp apache-commons-pool apache-commons-daemon apache-commons-logging tomcat-servlet-3.0-api tomcat-el-2.2-api tomcat-jsp-2.2-api tomcat-native ImageMagick sudo rpm -i tomcat-lib-7.0.47-1.fc19.noarch.rpm tomcat-7.0.47-1.fc19.noarch.rpm
SETUP route 53
aws route53 create-hosted-zone --name cate-project.net --caller-reference CATE
Elastic IP 0.005 = $3.72 Route 53 = $0.50
Maven: Make war file, rpm Package puppet config into rpm, depends on puppet, iptables-services, ruby-devel gcc libxml2 libxml2-devel libxslt libxslt-devel hiera.yaml - default facts /etc/hiera yaml backend cloudformation backend facter (? cloud-init facts ?) % postinstall
gem install hiera-cloudformation
puppet resource cron puppet-apply ensure=present user=root minute=30 command='/usr/bin/puppet apply
Packer / Vagrant: use cloud-init userdata to: add cate S3 yum repo install cate-puppet Puppet: install yum repo, configure firewall install java, tomcat, tomcat-native, ImageMagick, cate manage / start tomcat
Image Handling:
Image Upload (FrontEnd) If MultipartFile populated - upload Else if identifier ends with suffix and is local file then