materialsproject / pymatgen-db Goto Github PK

Pymatgen-db provides an addon to the Python Materials Genomics (pymatgen) library (https://pypi.python.org/pypi/pymatgen) that allows the creation of Materials Project-style databases for management of materials data.

Home Page: https://pypi.python.org/pypi/pymatgen-db

License: MIT License

Python 89.44% Makefile 4.05% HTML 1.97% CSS 0.82% Batchfile 3.71%

pymatgen-db's Introduction

Pymatgen-db is a database add-on for the Python Materials Genomics (pymatgen) materials analysis library. It enables the creation of Materials Project-style MongoDB databases for management of materials data. A query engine is also provided to enable the easy translation of MongoDB docs to useful pymatgen objects for analysis purposes.

Major change

From v2021.5.13, pymatgen-db is now a proper namespace add-on to pymatgen. In other words, you no longer import from matgendb but rather pymatgen.db.

Getting pymatgen-db

Stable version

The easiest way to install pymatgen-db on any system is to use pip, as follows:

pip install pymatgen-db

Requirements

All required python dependencies should be automatically taken care of if you install pymatgen-db using easy_install or pip. Otherwise, these packages should be available on PyPI.

Python 3.7+ required.
Pymatgen 2022+, including all dependencies associated with it. Please refer to the pymatgen docs for detailed installation instructions.
Pymongo 3.3+: For interfacing with MongoDb.
MongoDB 2.2+: Get it at the MongoDB website.

Usage

A powerful command-line script (mgdb) provides most of the access to many of the features in pymatgen-db, including db initialization, insertion of data, running the materials genomics ui, etc. To see all options available, type:

mgdb --help

Initial setup

The first step is to install and setup MongoDB on a server of your choice. The MongoDB manual is an excellent place to start. For the purposes of testing out the tools here, you may simply download the binary distributions corresponding to your OS from the MongoDB website, and then running the following commands:

# For Mac and Linux OS.
mkdir test_db && mongod --dbpath test_db

This will create a test database and start the Mongo daemon. Once you are done with testing, you can simply press Ctrl-C to stop the server and delete the "test_db" folder. Running a Mongo server this way is insecure as Mongo does not enable authentication by default. Please refer to the MongoDB manual when setting up your production database.

After your server is up, you should create a database config file by running the following command:

mgdb init -c db.json

This will prompt you for a few parameters to create a database config file, which will make it much easier to use mgdb in future. Note that the config file name can be anything of your choice, but using "db.json" will allow you to use mgdb without explicitly specifying the filename in future. If you are just testing using the test database, simply hit Enter to accept the defaults for all settings.

For more advanced use of the "db.json" config file (e.g., specifying aliases, defaults, etc., please refer to the following sample.

Inserting calculations

To insert an entire directory of runs (where the topmost directory is "dir_name") into the database, use the following command:

# Note that "-c db.json" may be omitted if the config filename is the
# current directory under the default filename of db.json.

mgdb insert -c db.json dir_name

A sample run has been provided for download for testing purposes. Unzip the file and run the above command in the directory.

Querying a database

Sometimes, more fine-grained querying is needed (e.g., for subsequent postprocessing and analysis).

The mgdb script allows you to make simple queries from the command line:

# Query for the task id and energy per atom of all calculations with
# formula Li2O. Note that the criteria has to be specified in the form of
# a json string. Note that "-c db.json" may be omitted if the config
# filename is the current directory under the default filename of db.json.

mgdb query -c db.json --crit '{"pretty_formula": "Li2O"}' --props task_id energy_per_atom

For more advanced queries, you can use the QueryEngine class for which an alias is provided at the root package. Some examples are as follows:

>>> from pymatgen.db import QueryEngine
# Depending on your db.json, you may need to supply keyword args below
# for `port`, `database`, `collection`, etc.
>>> qe = QueryEngine()

#Print the task id and formula of all entries in the database.
>>> for r in qe.query(properties=["pretty_formula", "task_id"]):
...     print "{task_id} - {pretty_formula}".format(**r)
...
12 - Li2O

# Get a pymatgen Structure from the task_id.
>>> structure = qe.get_structure_from_id(12)

# Get pymatgen ComputedEntries using a criteria.
>>> entries = qe.get_entries({})

The language follows very closely to pymongo/MongoDB syntax, except that QueryEngine provides useful aliases for commonly used fields as well as translation to commonly used pymatgen objects like Structure and ComputedEntries.

Extending pymatgen-db

Currently, pymatgen-db is written with standard VASP runs in mind. However, it is perfectly extensible to any kind of data, e.g., other kinds of VASP runs (bandstructure, NEB, etc.) or just any form of data in general. Developers looking to adapt pymatgen-db for other purposes should look at the VaspToDbTaskDrone class as an example and write similar drones for their needs. The QueryEngine can generally be applied to any Mongo collection, with suitable specification of aliases if desired.

How to cite pymatgen-db

If you use pymatgen and pymatgen-db in your research, please consider citing the following work:

Shyue Ping Ong, William Davidson Richards, Anubhav Jain, Geoffroy Hautier, Michael Kocher, Shreyas Cholia, Dan Gunter, Vincent Chevrier, Kristin A. Persson, Gerbrand Ceder. Python Materials Genomics (pymatgen) : A Robust, Open-Source Python Library for Materials Analysis. Computational Materials Science, 2013, 68, 314-319. doi:10.1016/j.commatsci.2012.10.028

pymatgen-db's People

Contributors

Stargazers

Watchers

pymatgen-db's Issues

DOS file format

Does anyone have an issue with switching to saving DOS as .npy files in GridFS (as opposed to json strings)? Should be a factor of 2.5 smaller. This is currently by far the largest collection in the database.

where can get the data

Write incremental task builder

Incremental building is needed for scaling the builders to handle regular updates to large databases. The current method of simply rebuilding the entire derived database will simply not scale. Incremental building requires some way of knowing at which point in the source database to "start", and an ordering that can at a minimum break the records into distinct "before" and "after" that point. The proposed method is to use an auxiliary bookkeeping collection to record the last place processing occurred, and by default to use MongoDB object IDs, which are ordered (roughly: see http://docs.mongodb.org/manual/reference/object-id/).

Improving schema efficiency

As it currently stands, we store pretty much all the data (except the DOS) in a single tasks document, and when we create derived collections (such as materials) we copy all this data to the new document. As the size of the tasks db increases, it is becoming harder to keep the working set in RAM. At MIT, our tasks and materials collections are each over 150 GB. For PD updates, the working set is the entire tasks and materials collections (way bigger than RAM), even though the amount of data needed from each of these collections to build the PDs is much much smaller. The limitation that we are facing in our PD building is high page faults from mongo - I would imagine that MP will face the same issue, but amplified by creating derived collections for every sandbox. Tweaks to the PD algorithms aren't going to solve these limitations since ~100s of GBs have to be pulled from disk either way (multiple times, and pretty randomly).

The majority (~80%) of the size of the tasks and materials collections is data under the 'calculations' field. AFAIK, this isn't actually used for searches, or displayed to users in lists of query results - all of the necessary data for queries is copied to the top level of the mongo document. By moving this data to a separate collection, we could both remove the duplication of this information between the materials and tasks collections, and shrink the working set for PD building and other derived collections to 1/5 the size. Of course, the other alternative is for everyone to buy more RAM.

I'd also like to consider removing the MIT materials collection entirely, since this data is easily stored with the addition of one or two keys in the tasks collection.

Rename Builder.setup

rename setup() -> get_items()

Getting error Transformations file does not exist error

Hi Good afternoon,
Downloaded leastest code from git. After that tried the following command.
To start the server .

"c:\Program Files\MongoDB\Server\7.0\bin\mongod.exe" --dbpath test_db
from other command prompt
python pymatgen-db\scripts\mgdb insert -c db.json test_db
Then it is shwoing the following output and got the error.

2842 msecs : Scanning for valid paths...
5760 msecs : 6 valid paths found.
C:\python\lib\site-packages\pymatgen\io\vasp\outputs.py:383: UnconvergedVASPWarning: pymatgen-db\test_files\db_test\stopped_mp_aflow\relax1\vasprun.xml is an unconverged VASP run.
Electronic convergence reached: False.
Ionic convergence reached: True.
warnings.warn(msg, UnconvergedVASPWarning)
Traceback (most recent call last):
File "C:\python\lib\site-packages\pymatgen\db\creator.py", line 510, in generate_doc
self.process_vasprun(dir_name, taskname, filename) for taskname, filename in vasprun_files.items()
File "C:\python\lib\site-packages\pymatgen\db\creator.py", line 510, in
self.process_vasprun(dir_name, taskname, filename) for taskname, filename in vasprun_files.items()
File "C:\python\lib\site-packages\pymatgen\db\creator.py", line 478, in process_vasprun
r = Vasprun(vasprun_file, parse_projected_eigen=parse_projected_eigen)
File "C:\python\lib\site-packages\pymatgen\io\vasp\outputs.py", line 371, in init
parse_projected_eigen=parse_projected_eigen,
File "C:\python\lib\site-packages\pymatgen\io\vasp\outputs.py", line 480, in _parse
raise ex
File "C:\python\lib\site-packages\pymatgen\io\vasp\outputs.py", line 395, in _parse
for event, elem in ET.iterparse(stream):
File "C:\python\lib\xml\etree\ElementTree.py", line 1228, in iterator
root = pullparser._close_and_return_root()
File "C:\python\lib\xml\etree\ElementTree.py", line 1275, in _close_and_return_root
root = self._parser.close()
File "", line None
xml.etree.ElementTree.ParseError: no element found: line 48888, column 0

Error in C:\work\pymatgen\pymatgen-db\test_files\db_test\killed_mp_aflow.
Traceback (most recent call last):
File "C:\python\lib\site-packages\pymatgen\db\creator.py", line 510, in generate_doc
self.process_vasprun(dir_name, taskname, filename) for taskname, filename in vasprun_files.items()
File "C:\python\lib\site-packages\pymatgen\db\creator.py", line 510, in
self.process_vasprun(dir_name, taskname, filename) for taskname, filename in vasprun_files.items()
File "C:\python\lib\site-packages\pymatgen\db\creator.py", line 478, in process_vasprun
r = Vasprun(vasprun_file, parse_projected_eigen=parse_projected_eigen)
File "C:\python\lib\site-packages\pymatgen\io\vasp\outputs.py", line 371, in init
parse_projected_eigen=parse_projected_eigen,
File "C:\python\lib\site-packages\pymatgen\io\vasp\outputs.py", line 480, in _parse
raise ex
File "C:\python\lib\site-packages\pymatgen\io\vasp\outputs.py", line 395, in _parse
for event, elem in ET.iterparse(stream):
File "C:\python\lib\xml\etree\ElementTree.py", line 1228, in iterator
root = pullparser._close_and_return_root()
File "C:\python\lib\xml\etree\ElementTree.py", line 1275, in _close_and_return_root
root = self._parser.close()
File "", line None
xml.etree.ElementTree.ParseError: no element found: line 48888, column 0

mgdb runserver failing with AppRegistryNotReady error

Using a clean virtualenv with python 2.7 and "pip install -r requirements.txt", when I try to run mgdb runserver it fails with an AppRegistryNotReady error.

(This is usually attributed to calling django.core.handlers.wsgi:WSGIHandler, which was used in django 1.6, instead of django.core.wsgi:get_wsgi_application which is used in 1.7, but webui/wsgi.py is using the latter, and I am using 1.7)

I can reproduce the error in a vanilla django site if I execute runserver using call_command, as in mgdb, instead of the default execute_from_command_line.

The difference seems to be that execute_from_command_line eventually calls django.setup, which populates the AppRegistry, while call_command doesn't.

Indeed, I can get the vanilla django site to work with call_command if I invoke django.setup just before.

Should mgdb call django.setup explicitly? On the other hand, I do wonder why nobody else seems to have this issue...

mimetype

hi,
The latest version of django uses 'content_type' instead of 'mimetype' in HttpResponse.
-Kiran

mgbuild run: find builder based only on class name

Currently 'mgbuild run' takes a full path to a builder: mypackage.mymodule.MyBuilder.

Often the class name alone is unique within a given sys.path. So, if you can just give a builder Class name and you do a package search to find the match (return error if there are multiple and ask the user to be more specific by using the module.Class language).

Please push new release to PyPI

error occur when inserting example Li2O into db: “pymongo.errors.OperationFailure: Authentication failed”

Describe the bug
after i config the db.json and download Li2O example, when i run code
mgdb insert -c db.json Li2O

Traceback (most recent call last):
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/bin/mgdb", line 442, in <module>
    args.func(args)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/bin/mgdb", line 64, in update_db
    drone = VaspToDbTaskDrone(
  File "/public/home/gaobo/honglin/atomate2/pymatgendb-test/package/pymatgen-db/pymatgen/db/creator.py", line 173, in __init__
    if self.db.counter.count_documents({"_id": "taskid"}) == 0:
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/collection.py", line 1913, in count_documents
    return self._retryable_non_cursor_read(_cmd, session)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/collection.py", line 1923, in _retryable_non_cursor_read
    return client._retryable_read(func, self._read_preference_for(s), s)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1492, in _retryable_read
    return self._retry_internal(
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/_csot.py", line 107, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1462, in _retry_internal
    ).run()
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2315, in run
    return self._read() if self._is_read else self._write()
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/mongo_client.py", line 2439, in _read
    with self._client._conn_from_server(self._read_pref, self._server, self._session) as (
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1330, in _conn_from_server
    with self._checkout(server, session) as conn:
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/mongo_client.py", line 1252, in _checkout
    with server.checkout(handler=err_handler) as conn:
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/pool.py", line 1637, in checkout
    conn = self._get_conn(handler=handler)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/pool.py", line 1756, in _get_conn
    conn = self.connect(handler=handler)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/pool.py", line 1607, in connect
    conn.authenticate()
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/pool.py", line 1078, in authenticate
    auth.authenticate(creds, self, reauthenticate=reauthenticate)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/auth.py", line 625, in authenticate
    auth_func(credentials, conn)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/auth.py", line 530, in _authenticate_default
    return _authenticate_scram(credentials, conn, "SCRAM-SHA-1")
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/auth.py", line 257, in _authenticate_scram
    res = conn.command(source, cmd)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/helpers.py", line 322, in inner
    return func(*args, **kwargs)
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/pool.py", line 968, in command
    return command(
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/network.py", line 192, in command
    helpers._check_command_response(
  File "/public/home/gaobo/lihl/miniconda3/envs/pmdb-test/lib/python3.10/site-packages/pymongo/helpers.py", line 230, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}

To Reproduce
Steps to reproduce the behavior:

create a new python env
install the pymatgen-db from github by downloading zip file, then pip install -e . to install
after finishing db.json, I try to insert Li2O into db, and the error occur
I have check the username, password, host, port and databasename by pymongo interface and it can make connection with mongodb

Provide any example files that are needed to reproduce the error,
especially if the bug pertains to parsing of a file.

Expected behavior
None

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

NAME="CentOS Linux"
VERSION="7 (Core)"
Python 3.10.13
pymatgen 2024.2.23
pymatgen-db 2023.7.18
pymongo 4.6.2
MongoDB 7.0.5

Make alt. to using docs for configuration

Allow docstring-based configuration of builders, but also programmatic route

Why do all dependent packages require the exact version?

I found in "requirements.txt" the exact version requirement is specified. e.g. monty==0.7.0. However, in pymatgen a newer version is required, i.e., monty==0.8.1. This inconsistence makes PyCharm keeps reminding me that the requirement is not satisfied. Is there any reason why the minimum version requirement (e.g. monty>=0.70) doesn't work?

Allow extra criteria to be specified for get_entries_in_system

It would be really useful to be able to get_entries_in_system that are on the hull, or have n_sites or less, etc.

New compatibility fields

There is currently no obvious way to add more fields required for compatibility checks. For example, I need to check the setting of LASPH in the INCAR for a new compatibility scheme. Should I add a field at the top level of the mongo document (where is_hubbard and hubbards are currently set), or should we add this (and move is_hubbard and hubbards) under a 'compatibility' key to reduce top level namespace pollution?

I can implement, but wanted some discussion. Using aliases, either should be backwards compatible.

Make docstring-driven API optional

(Computron): This is a style preference, but I tend to not like code that operates on docs. There are more experienced programmers than me that find this very fun and cool (and it has side benefits like enforcing documentation), but I think it always violates the principle of least astonishment when people tell me that I need to get the docs correct in order to use a code. But perhaps for the Builders it's good to enforce docs?

mgvv diff: reporting bugs

CRITICAL [mg] In 'diff' command: Runtime error: Not a number: collection=new key=3725 e_above_hull='None'(builder_env)-bash-3.2

$ mgvv diff -k "snlgroup_id_final" -p 'icsd_ids,task_id' -m -n "efermi=+-0.01" mprod.json mdev.json
Traceback (most recent call last):

File "/global/u1/w/weichen1/builder_env/bin/mgvv", line 10, in

execfile(__file__)

File "/global/u1/w/weichen1/builder_env/pymatgen-db/scripts/mgvv", line 619, in

sys.exit(main())

File "/global/u1/w/weichen1/builder_env/pymatgen-db/scripts/mgvv", line 610, in main

return args.func(args, *args.func_args)

File "/global/u1/w/weichen1/builder_env/pymatgen-db/scripts/mgvv", line 264, in command_diff

text = make_report(args.format or "text")

File "/global/u1/w/weichen1/builder_env/pymatgen-db/scripts/mgvv", line 250, in

make_report = lambda f: getattr(report, "Diff{}Formatter".format(f.title()))(meta, **fmt_kwargs).format(r)

File "/global/u1/w/weichen1/builder_env/pymatgen-db/matgendb/vv/report.py", line 709, in format

self.sort_rows(rows, section)

File "/global/u1/w/weichen1/builder_env/pymatgen-db/matgendb/vv/report.py", line 482, in sort_rows

rows.sort(key=itemgetter(sort_key))

KeyError: 'delta'

mgbuild_list more flexible

"mgbuild list" is OK, but if I have to go through the trouble of putting in a long module name, it's easy for me to figure out what builders are inside that module and what parameters they take without calling list. Also I don't like the module approach to discovering builders. I'd rather lookup usage of a specific builder than lookup all the builders inside some module (I have no idea how that module is organized or how the programmer is organizing builders into modules).

runserver command doesn't exist in version installed from pip

I tried to follow the instruction in README page and I can follow up to mgdb runsercer -c db.json, which gives me an error about saying this argument is not available. However, I can find this argument available in release 0.5.1 (with minor error from django)

any idea? Thanks :)

mgvv diff: dbconfig not defined

(from Wei)

The code complains "dbconfig" is not defined. Where should I define it? The code works fine if I comment out relevant lines in mgvv.

(builder_env)-bash-3.2$ mgvv diff -k "snlgroup_id_final" -p 'icsd_ids,task_id' -m -n "e_above_hull=+-0.01" mprod.json mdev.json

CRITICAL [mg] In 'diff' command: Runtime error: global name 'dbconfig' is not defined

How to delete an entry from an existing database

Hi all,

I am a new user to pymatgen-db and am trying to understand how to delete entries to a database. Is there a method (expected it to be in queryengine.py but didn't find) that implements this ?

Thanks,
Josh

mgvv - add check for changing materials_id

If the materials_id for a given snlgroup changes between releases, we should be notified.

Note that merging materials_id and snlgroup would eliminate the need for this, but that is currently not the case at all and hopefully the check is easy to implement for the short term.

No authenticate method for 'Database' object

Describe the bug

The authentication in the query engine:

        if user:
            self.db.authenticate(user, password)

does not work any more as:
pymongo.database.Database.authenticate() and pymongo.database.Database.logout() were removed in MongoDB 3.6+
https://pymongo.readthedocs.io/en/stable/migrate-to-pymongo4.html#database-authenticate-and-database-logout-are-removed
Changing to:

         if connection is None:
            if self.replicaset:
                self.connection = pymongo.MongoClient(self.host, self.port, replicaset=self.replicaset, username=user, password=password)
            else:
                self.connection = pymongo.MongoClient(self.host, self.port, username=user, password=password)

solved the issue for me. I assume this also needs to be changed in /pymatgen/db/creator.py but I did not test the latter.
If you want to keep it in the mongoDB version <3.6+ state it would be great if you could add that to the requirements for future users.

ubuntu, db version v6.0.7

access to dos_fs

Hello,

firstly, thanks for making pymatgen-db available, it seems very promising. I've just finished following the documentation and examples, importing some vasp runs and querying via the webui, and it's very useful.

In order to see what it would take to add some features to the webui, I thought about adding a django view that returns a DOS plot.

Looking at VaspToDbTaskDrone in matgendb/creator.py, I see that it already supports adding DOS information to the records, but the call in mgdb has parse_dos hardcoded to False, is there a reason for that?

Setting it to True creates a calculations.dos_fs_id property in the new record. I can then get the dos_fs_id via the QueryEngine, but in order to get the DOS data I had to use pymongo and gridfs as below, evaluate the string as a dictionary and then convert to a pymatgen Dos object, is there better/proper way?

Cheers,
Miguel

from pymatgen.electronic_structure.plotter import DosPlotter
from pymatgen.io.vaspio import Dos
from pymongo import MongoClient
from bson.objectid import ObjectId
from gridfs import GridFS
from ast import literal_eval

def plot_dos(mongo_uri, database, dos_fs_id_string):
   # get file
   c = MongoClient(host=mongo_uri)
   dos_fs = GridFS(c[database], collection='dos_fs')
   dos_file = dos_fs.get(ObjectId(dos_fs_id_string))

   # convert to dictionary and then to pymatgen Dos object (better way?)
   dos_dict=literal_eval(dos_file.readline())  
   dos_obj=Dos.from_dict(dos_dict)

   # plot
   plotter = DosPlotter()
   plotter.add_dos("DOS", dos_obj)
   plotter.save_plot("plot.png", img_format="png")

Continuous integration testing?

Is this set up somewhere? Ideally this would run a mongod instance so all the building can be tested properly.

I just fixed a regression on Python2 compatibility that went unnoticed for a week.
Also some of the builders tests are currently failing

Adjust from matgendb import statements to change in namespace

Describe the bug
trying to use the mgdb command the import statements in the mgdb script result in errors as it is noted in the README that the namespace convention changed from matgendb to pymatgen.db (also the DBConfig changed to config).

Changing them in scripts/mgdb to the following solves the issues for me.

from pymatgen.db import SETTINGS
from pymatgen.db.query_engine import QueryEngine
from pymatgen.db.creator import VaspToDbTaskDrone
from pymatgen.db.config import DBConfig
from pymatgen.db.util import get_settings, DEFAULT_SETTINGS, MongoJSONEncoder

This bug happened to me when installing the version from github with python 3.10.9

It would be great if this could be fixed so future users can avoid this issue.

Easier to select builder from command-line

In the command line specify the builder via (-b pymatpro.builders.new_builders.ExternalDataBuilder). That is more explicit and clear than giving the module name and have knowledge that the code will magically figure out what you're talking about (sometimes) or needing to specify two parameters (-m and -b).

Example impl:
modname, classname = user_path.rsplit(".", 1)
mod = import(modname, globals(), locals(), [classname], 0)
if hasattr(mod, classname):
cls_ = getattr(mod, classname)
return cls_.from_dict(obj_dict)

Easier yet: if you can just give a builder Class name and you do a package search to find the match (return error if there are multiple and ask the user to be more specific by using the module.Class language). Then if I want to use a builder I just need to know it's name, not what module it's located in. There is some code in fw_serializers.py that does this kind of thing, although for a restricted number of packages.

Add validation support for executing arbitrary Python code

Add support in validation "spec" syntax to invoke arbitrary Python code (which will be passed a connection object)

mgvv - add warnings for unindexed fields

To avoid accidental table scans, print a warning if any of the fields used by mgvv is not indexed in the target collection

add Enum's module

Backported from python 3.4, Enum is a useful little class

Optional Sort Method on query_engine.query method

The query method already supports the index, limit, and distinct_key options. It would be convenient as well to have an option to pass in something along the lines of sort={'date_created':1} to enable queries such as "find the 10 most recently processed entries".

Please tag pypi releases here as with pymatgen repo

The last tagged release here is v0.5.1, but the latest pypi version is v0.6.2.

Files missing in master branch

Describe the bug
Lots of code files like pymatgen/db/util.py are seemingly missing in the master branch.
However, these files exists in the branch "92401161b4", which is strangely unseeable from searching.

To Reproduce
Steps to reproduce the behavior:

util.py cannot be found in master branch: https://github.com/materialsproject/pymatgen-db/tree/master/pymatgen/db
util.py can be found at "92401161b4": https://github.com/materialsproject/pymatgen-db/tree/92401161b4baedd6813ad44e576aaeb83982af66/pymatgen/db
The branch "92401161b4" seems to have same last update date as the master branch. I don't know what's wrong with the master branch.

Expected behavior
pymatgen/db/util.py should exist in the master branch.