artefactual-labs / ansible-archivematica-src Goto Github PK
View Code? Open in Web Editor NEWAn ansible role for deploying Archivematica from from its source code repositories
License: GNU Affero General Public License v3.0
An ansible role for deploying Archivematica from from its source code repositories
License: GNU Affero General Public License v3.0
TASK [archivematica-src : Template gunicorn configuration file] ****************
fatal: [am-local]: FAILED! => {"changed": true, "failed": true, "msg": "Destination directory /etc/archivematica does not exist"}
This was introduced in #73, /etc/archivematica needs to be created when deploying the Storage Service (it had been created only when deploying the pipeline)
Note that gunicorn v19.7.0 has not been released yet but it will soon.
gunicorn has added to master support for code reloading using inotify
instead of filesystem polling.
This is going to help us save so much energy in our laptops! Currently, we're deploying a couple of gunicorn apps with four workers each (probably uneeded to have so many workers in development). Each worker polls the filesystem every second, generating a stat() syscall for each file in the source code. After a quick look with strace I see about 600 system calls for each poll.
More details here: https://github.com/benoitc/gunicorn/blob/master/docs/source/settings.rst#L255-L277. We'll only need to add the inotify
package to our pip dependencies when we're in dev mode. That's a change we can probably make in the sources of the AM and SS repos but I thought this is a best place to keep track of it just in case it involves changes in configuration down the road.
Currently this role is adding ppa:archivematica/externals-dev as a source, it should also be added packages.archivematica.org. Probably should have the option to use ppa:archivematica/externals instead of externals-dev when doing a production install.
The externals-dev ppa shows a warning on launchpad, and it appears that it is not working properly. I noticed that in one test, I got siegfried 1.1.0 installed, when I should have been able to get 1.4.5 from externals-dev. packages.archivematica.org/1.5.x has siegfried 1.5.0, which is the version that should be installed with Archivematica 1.5.0 and greater.
Sometimes /var/archivematica/sharedDirectory is a symlink, in order to use a separate mount for increased processing space. Currently the playbook stops with an error
Two pip tasks use --find-links lib
which break if lib/
is missing. This is now occurring in stable/0.9.x
and qa/0.x
.
- name: "Create virtualenv for archivematica-storage-service, pip install requirements"
pip:
chdir: "{{ archivematica_src_dir }}/archivematica-storage-service"
requirements: "requirements.txt"
virtualenv: "/usr/share/python/archivematica-storage-service"
state: "latest"
tags: "amsrc-ss-pydep"
- name: "Work around to install pip deps commented out in old SS branches"
pip:
chdir: "{{ archivematica_src_dir }}/archivematica-storage-service"
virtualenv: "/usr/share/python/archivematica-storage-service"
extra_args: "--find-links lib"
state: "latest"
name: "{{ item }}"
with_items:
- "python-swiftclient"
- "python-keystoneclient"
- "sword2"
- "pyopenssl"
- "ndg-httpsclient"
- "pyasn1"
when: "archivematica_src_ss_pip_missing_deps"
tags: "amsrc-ss-pydep"
This is the solution I have tested:
- stat:
path: "{{ archivematica_src_dir }}/archivematica-storage-service/lib"
register: "ss_lib_dir_check"
- set_fact:
ss_pip_install_extra_args: ""
- set_fact:
ss_pip_install_extra_args: "--find-links lib"
when: "ss_lib_dir_check.stat.isdir is defined and ss_lib_dir_check.stat.isdir"
- name: "Create virtualenv for archivematica-storage-service, pip install requirements"
pip:
chdir: "{{ archivematica_src_dir }}/archivematica-storage-service"
requirements: "requirements.txt"
virtualenv: "/usr/share/python/archivematica-storage-service"
extra_args: "{{ ss_pip_install_extra_args }}"
state: "latest"
tags: "amsrc-ss-pydep"
- name: "Work around to install pip deps commented out in old SS branches"
pip:
chdir: "{{ archivematica_src_dir }}/archivematica-storage-service"
virtualenv: "/usr/share/python/archivematica-storage-service"
extra_args: "{{ ss_pip_install_extra_args }}"
state: "latest"
name: "{{ item }}"
with_items:
- "python-swiftclient"
- "python-keystoneclient"
- "sword2"
- "pyopenssl"
- "ndg-httpsclient"
- "pyasn1"
when: "archivematica_src_ss_pip_missing_deps"
tags: "amsrc-ss-pydep"
storage.db file ownership is sometimes not correctly set (root:root vs archivematica:archivematica), producing nginx/uwsgi errors.
Having the following error when executing a task in common.yml (used stable/1.6.x branch of the role but should also occur in qa/1.7.x):
TASK [external-roles/artefactual.archivematica-src : Expand archivematica_src_dir] **********************************
task path: /media/sf_vbox-ubuntu14/repos/gitlab.artefactual/ops-deployment/envs/ccarchitecture/external-roles/artefactual.archivematica-src/tasks/common.yml:65
fatal: [VSP-AMSS-01]: FAILED! => {
"failed": true,
"msg": "template error while templating string: no filter named 'expanduser'. String: {{ archivematica_src_dir|expanduser }}"
}
Running ansible from a virtualenv in ubuntu 16, installed using pip. Versions:
$ pip list --format=columns
Package Version
------------- -------
ansible 2.4.0.0
asn1crypto 0.23.0
bcrypt 3.1.4
cffi 1.11.2
cryptography 2.1.1
enum34 1.1.6
idna 2.6
ipaddress 1.0.18
Jinja2 2.9.6
MarkupSafe 1.0
paramiko 2.3.1
pip 9.0.1
pkg-resources 0.0.0
pyasn1 0.3.7
pycparser 2.18
PyNaCl 1.1.2
PyYAML 3.12
setuptools 36.6.0
six 1.11.0
wheel 0.30.0
This isn't needed and needs to be deleted. In AM17 the assets are served directly by Django thanks to WhiteNoise.
For reference: https://github.com/artefactual-labs/am/blob/master/compose/etc/nginx/conf.d/archivematica.conf.
At the moment the repository is a constant:
https://github.com/artefactual-labs/ansible-archivematica-src/blob/qa/1.7.x/tasks/automation-tools.yml#L7
But we use a fork. Can you make it configurable, like:
repo: "{{ archivematica_src_automationtools_repo }}"
?
Ghostscript (gs) is used by archivematica for normalization.
The ghostcript version (9.10) provided in ubuntu 14.04 sometimes produces an output that causes archivematica to break ( https://projects.artefactual.com/issues/9243 )
When trying to deploy the new appraisal tab and transfer browser features I found an issue when running the npm tasks. It was trying to run the npm tasks as user root, using /root/.npm directory.
I could deploy this tasks as archivematica user, although I had to change the owner of /opt/archivematica previously.
The following code was used in the tasks/pipeline-instcode.yml file:
#
# front-end
#
#npm needs to be running as user archivematica
#The owner of archivematica_src_dir has to be changed to archivematica
- name: "Change archivematica-source owner to archivematica"
file:
dest: "{{ archivematica_src_dir }}"
state: "directory"
owner: "archivematica"
group: "archivematica"
recurse: "yes"
- name: "Install front-end dependencies"
become: "yes"
become_user: "archivematica"
npm:
path: "{{ item }}"
state: "present"
with_items:
- "{{ archivematica_src_dir }}/archivematica/src/dashboard/frontend/appraisal-tab"
- "{{ archivematica_src_dir }}/archivematica/src/dashboard/frontend/transfer-browser"
when:
- "ansible_env.USER != 'vagrant'"
I don't know if it is better to change the archivematica_src_dir owner at another point of the deployment.
Not sure if due to the 10s timeout in archivematica, but using
clamav_pass_by_reference = True
it works
Similar issue in am-packbuild (deb/rpm): artefactual-labs/am-packbuild#63. It depends on artefactual/archivematica#782 and artefactual/archivematica-storage-service#256 (not merged yet when this issue was reported).
This is a question related with #104 which is a big PR that introduces several changes to the role. Would it be a good thing to have multiple stable branches in this repository, like the archivematica repository? For example, creating a stable/1.5.x and a stable/1.6.x branch, so that the ansible role is paired to the target archivematica branch.
The reason for this is that we may be reaching a point where it may no longer be possible or practical to assure that changes introduced in the role will be backward compatible with older branches of archivematica.
(I found this approach is used for example in https://github.com/elastic/ansible-elasticsearch, there is a stable/2.x branch to deploy ES 2.x and a stable/5.x branch to deploy ES 5.x)
I wonder if we can ask GitHub to convert this repo into normal mode instead of fork mode (hakamine's repo is the parent). That would allow us to make searches, which is currently not possible as forks are not searchable?
After the changes in a51f1a8, it's no longer possible to rerun this on a machine that's already been deployed. This occurs when trying to copy storage.ini
, like so:
TASK: [archivematica-src | copy archivematica-storage-service source files] ***
failed: [am-local] => (item={'dest': '/etc/uwsgi/apps-available/storage.ini', 'src': '/srv/archivematica-storage-service/install/storage.ini'}) => {"failed": true, "gid": 0, "group": "root", "item": {"dest": "/etc/uwsgi/apps-available/storage.ini", "src": "/srv/archivematica-storage-service/install/storage.ini"}, "mode": "0644", "owner": "root", "path": "/etc/uwsgi/apps-available/storage.ini", "size": 969, "state": "file", "uid": 0}
msg: refusing to convert between file and link for /etc/uwsgi/apps-available/storage.ini
npm package provided by default by ubuntu trusty has broken dependencies and can't be installed
# apt-get install npm Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: npm : Depends: nodejs but it is not going to be installed Depends: node-abbrev (>= 1.0.4) but it is not going to be installed Depends: node-ansi but it is not going to be installed Depends: node-archy but it is not going to be installed Depends: node-block-stream but it is not going to be installed Depends: node-fstream (>= 0.1.22) but it is not going to be installed Depends: node-fstream-ignore but it is not going to be installed Depends: node-github-url-from-git but it is not going to be installed Depends: node-glob (>= 3.1.21) but it is not going to be installed Depends: node-graceful-fs (>= 2.0.0) but it is not going to be installed Depends: node-inherits but it is not going to be installed Depends: node-ini (>= 1.1.0) but it is not going to be installed Depends: node-lockfile but it is not going to be installed Depends: node-lru-cache (>= 2.3.0) but it is not going to be installed Depends: node-minimatch (>= 0.2.11) but it is not going to be installed Depends: node-mkdirp (>= 0.3.3) but it is not going to be installed Depends: node-gyp (>= 0.10.9) but it is not going to be installed Depends: node-nopt (>= 2.1.1) but it is not going to be installed Depends: node-npmlog but it is not going to be installed Depends: node-once but it is not going to be installed Depends: node-osenv but it is not going to be installed Depends: node-read but it is not going to be installed Depends: node-read-package-json (>= 1.1.0) but it is not going to be installed Depends: node-request (>= 2.25.0) but it is not going to be installed Depends: node-retry but it is not going to be installed Depends: node-rimraf (>= 2.2.2) but it is not going to be installed Depends: node-semver (>= 2.1.0) but it is not going to be installed Depends: node-sha but it is not going to be installed Depends: node-slide but it is not going to be installed Depends: node-tar (>= 0.1.18) but it is not going to be installed Depends: node-which but it is not going to be installed E: Unable to correct problems, you have held broken packages.
Suggest to upgrade the role to install instead using the recommended procedure at https://nodejs.org/en/download/package-manager/#debian-and-ubuntu-based-linux-distributions
When archivematica_src_install_ss
is set to false
, parent directories like:
/var/lib/archivematica
https://github.com/artefactual-labs/ansible-archivematica-src/blob/qa/1.7.x/tasks/ss-main.yml#L104
/etc/archivematica
https://github.com/artefactual-labs/ansible-archivematica-src/blob/qa/1.7.x/tasks/ss-main.yml#L127
and
/var/archivematica
https://github.com/artefactual-labs/ansible-archivematica-src/blob/qa/1.7.x/tasks/ss-main.yml#L119
are not created. Hence file operations further down the playbook break, as in:
https://github.com/artefactual-labs/ansible-archivematica-src/blob/qa/1.7.x/tasks/pipeline-websrv-gunicorn.yml#L19
Related to artefactual/archivematica#556
I think that we should run collectstatic
in the dashboard to put all the assets under /static
.
This is basically the change I am suggesting:
location /media {
- alias /usr/share/archivematica/dashboard/media;
+ alias /usr/share/archivematica/dashboard/static;
}
The problem with making this change is that we'd need an extra location
for development environments pointing to /usr/share/archivematica/dashboard/media
- this method however has some disadvantages like not allowing to locate assets in more than one subdirectory (Django apps). Ideally, we would serve the static assets from Django when doing development but that would require a couple of changes in the code. This is the approach that I'd prefer but I'm not sure if it's doable to perform such change across all our active code branches.
Basically we would be making two changes:
STATIC_URL = "/static/"
to settings/local.py
and,urls.py
only when settings.DEBUG = True
In development mode we need to disable the use of sendfile() in Nginx in the static assets locations. This is necessary in both storage.conf
and dashboard.conf
, e.g.
location /static {
alias /usr/lib/archivematica/storage-service/assets;
sendfile off;
}
More details here: https://www.vagrantup.com/docs/synced-folders/virtualbox.html
The libldap2-dev
and libsasl2-dev
packages are needed when running the dashboard task virtualenv | Install requirements
on ubuntu xenial deployment.
In the qa/1.7 branch there is no longer a logfile placed in
/var/log/archivematica/MCPServer/
/var/log/archivematica/MCPClient/
For the dashboard this was amendable via:
archivematica_src_am_dashboard_environment:
SS_GUNICORN_ACCESSLOG: "/var/log/archivematica/storage-service/gunicorn.access_log"
SS_GUNICORN_ERRORLOG: "/var/log/archivematica/storage-service/gunicorn.error_log"
SS_GUNICORN_LOGLEVEL: "info"
But how to proceed for the MCP server and client?
Have experienced errors when downloading big SIPs from the Storage Service when sendfile is enabled in gunicorn (default behaviour). Adding the --no-sendfile
option when running Storage Service's gunicorn fixes it.
Archivematica connects to mysql using credentials stored in /etc/archivematica/archivematicaCommon/dbsettings .
This ansible role should have a way to change the username and/or password used, and update the config file used by archivematica.
Using ansible 2.4.1 and branch stable/1.6.x, init scrips are not installed.
This is due to the conditino "ansible_service_manager = upstart" being false, as ansible gives " "ansible_service_mgr": "service""
Instead of copying from the /srv directory, create symlinks to it, so that changes can be easily tested and commited/pushed?
Using the work in artefactual/archivematica#681 and artefactual/archivematica-storage-service#213 the installer scripts could be adapted to automatically configure the storage service and dashboard with users and organisation info, leaving them ready to use (connected with API keys) at the end of the install.
If the archivematica_src_dir
variable points to a location outside the vagrant VM, for example, /vagrant/src
, and that location already exists and already has an Archivematica source checkout, Vagrant will hang forever while performing the checkout. This may only happen if the checkout in /vagrant/src
has local changes.
It needs to be set to
clamav_server = /var/run/clamd.scan/clamd.sock
https://github.com/artefactual-labs/ansible-role-archivematica-src/blob/master/tasks/tear-up.yml#L17
In that line, the externals-dev ppa is added to the list of ubuntu trusted repos.
For qa releases, this is good, but for stable/1.4.x AM, the 1.4 ppa should be added instead.
For stable/1.5.x, the packages.archivematica.org repo for 1.5 should be added, instead of a ppa.
This line should probably change its behaviour depending on which version of AM is being installed.
Define variables in the playbook for this purpose?
version may also use the branch and commit hash as part of it?
In the rpm and deb packages, a /etc/defauilt/archivematica- file is installed, with the environment vars needed by archivematica to boot.
we should do the same in the ansible role, using /etc/default/ for debian based system, and /etc/sysconfig for rhel/centos.
Versions of mediainfo provided by ubuntu default repositories are old. Instead, install updated versions from ppa:djcj/mediainfo (which provide newer versions) or downloading packages from https://mediaarea.net/en/MediaInfo/Download/Ubuntu (which has the latest). In either case be sure to install not only package mediainfo, but also the required dependencies libmediainfo0, libzen0 (missing to do this causes mediainfo errors )
On latest AM qa/1.x there are some settings for clamav:
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_SERVER
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_PASS_BY_REFERENCE
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_TIMEOUT
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_CLIENT_BACKEND
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_MAX_FILE_SIZE
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_CLAMAV_MAX_SCAN_SIZE
These variables should be added to the templates and then, these env vars could be modified using the vars.yml file.
For example, for RH/CentOS, this could be added to the /etc/sysconfig/archivematica-mcp-client
template.
It looks like the part of the role that checks for existence of archivematica-storage-service/lib is not working properly:
https://github.com/artefactual-labs/ansible-role-archivematica-src/blob/master/tasks/ss-main.yml#L88-L95
When running the playbook in https://github.com/artefactual/deploy-pub/tree/master/playbooks/archivematica it is adding "--find-links lib" to pip even though the lib directory doesn't exist
In worker_class
, use any of the following options: eventlet
, gevent
, tornado
, gthread
. Not considering asyncio
because it's Py3 only.
I think it's preferably to use one of the first two options for various reasons. Between the two, I'm inclined to prefer eventlet
because it does carry simpler dependencies (eventlet
). Ultimately the user could decide which worker class to use.
The reasons for moving to async workers are obvious. One example that makes the underlying issue easier to understand: say you deploy SS with two workers. If one of the workers is busy serving a file to the user, you just compromised 50% of the availability of the server. If the case was that you deployed the application with a single worker, the application will become unavailable to the rest of users. In general, our applications (dashboard and ss) should be able to make use of these worker classes with no changes but some testing is recommended.
upstart script broke ?
Getting an error like this when the role dependency is executing:
TASK [geerlingguy.nodejs : Create npm global directory] ********************************************************
...
"mode": "0755",
"msg": "chown failed: failed to look up user hector",
"owner": "root",
"path": "/usr/local/lib/npm",
"size": 4096,
"state": "directory",
"uid": 0
}
When using the stable/1.6.x ansible branch, I get this error:
virtualbox-ovf: TASK [archivematica-src : include variables from retrieved dependencies files in namespace storage_service] *** virtualbox-ovf: fatal: [default]: FAILED! => {"failed": true, "msg": "No filename was specified to include.\n\nThe error appears to have been in '/home/username/workspace/deployment/packer/templates/vagrant-box-archivematica/provisioning/roles/archivematica-src/tasks/ss-osdeps.yml': line 35, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n# create a namespace for osdeps file variables\n- name: \"include variables from retrieved dependencies files in namespace storage_service\"\n ^ here\n"}
AM branch is stable/1.6.x and ss is stable/0.10.x. I'm using ansible 2.1.4.0
I have found this error when upgrading CentOS using qa/1.7.x ,commit faefe63
fatal: [arch01]: FAILED! => {"changed": false, "cmd": "./manage.py collectstatic --noinput --pythonpath=/usr/lib/archivematica/archivematicaCommon", "failed": true, "msg": "\n:stderr: Traceback (most recent call last):\n File \"./manage.py\", line 10, in <module>\n execute_from_command_line(sys.argv)\n File \"/usr/share/python/archivematica-dashboard/lib/python2.7/site-packages/django/core/management/__init__.py\", line 354, in execute_from_command_line\n utility.execute()\n File \"/usr/share/python/archivematica-dashboard/lib/python2.7/site-packages/django/core/management/__init__.py\", line 303, in execute\n settings.INSTALLED_APPS\n File \"/usr/share/python/archivematica-dashboard/lib/python2.7/site-packages/django/conf/__init__.py\", line 48, in __getattr__\n self._setup(name)\n File \"/usr/share/python/archivematica-dashboard/lib/python2.7/site-packages/django/conf/__init__.py\", line 44, in _setup\n self._wrapped = Settings(settings_module)\n File \"/usr/share/python/archivematica-dashboard/lib/python2.7/site-packages/django/conf/__init__.py\", line 92, in __init__\n mod = importlib.import_module(self.SETTINGS_MODULE)\n File \"/usr/lib64/python2.7/importlib/__init__.py\", line 37, in import_module\n __import__(name)\nImportError: No module named production\n", "path": "/usr/share/python/archivematica-dashboard/bin:/sbin:/bin:/usr/sbin:/usr/bin", "state": "absent", "syspath": ["/tmp/ansible_jtQwyY", "/tmp/ansible_jtQwyY/ansible_modlib.zip", "/tmp/ansible_jtQwyY/ansible_modlib.zip", "/usr/lib64/python27.zip", "/usr/lib64/python2.7", "/usr/lib64/python2.7/plat-linux2", "/usr/lib64/python2.7/lib-tk", "/usr/lib64/python2.7/lib-old", "/usr/lib64/python2.7/lib-dynload", "/usr/lib64/python2.7/site-packages", "/usr/lib/python2.7/site-packages"]}
to retry, use: --limit @/home/maml/artefactual/deploymentlast4/deployment/envs/denver/arch01.retry
PLAY RECAP *********************************************************************
arch01 : ok=73 changed=9 unreachable=0 failed=1```
The deploy worked using the commit 79f8a4cab2f0ae53fd5698b739a0338d6b3deb04
The following packages should be installed in order to be able to install dashboard pip dependencies (these are currently installed only with the Storage Service ):
clamd@scan service in fedora, runs as clamscan user. We need to add that user to the archivematica group in order for it to be able to scan files in /var/archivematica
Cf. #109 which does it for Ubuntu.
It should be just enough to change the default values of a couple of variables (archivematica_src_ss_gunicorn and archivematica_src_am_dashboard_gunicorn), but also need to update the README examples.
Related to artefactual/archivematica#604
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.