ESRF and Diamond have been discussing new tables and table changes for cryo-EM to acco

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Ok, for me the sql in your comment looks good! <a class="user-mention notranslate" dat

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

WIP: New tables and changes for cryo-EM,about ispyb/ispyb-database-modeling

Comments (39)

stufisher commented on August 24, 2024

Does particle picking happen on a movie, or a motion corrected movie? (presumably the latter). Movie should not contain parameters that come from a processing of a motioncorrection. The relationships look to be the wrong way round and there seems to be an intermediate table missing. I suggest removing the columns from movie, and adding

ParticlePicker
particlepickerid pk ai
autoprocprogramid fk
particlePickingTemplate varchar(255) COMMENT 'Cryolo model',
particleDiameter float COMMENT 'Unit: nm';
numberofparticles int

ParticlePicker_has_MotionCorrection
ParticlePicker_has_MotionCorrection pk ai
particlepickerid fk
motioncorrectionid fk

ParticleClassification
remove motionCorrectionId int unsigned,
particlepickerid fk

i think the json file should go into autoprocprogramattachment?

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi @stufisher,

Particle picking is made on a motion corrected micrograph from a movie. You are right, movie shouldn't contain parameters from processing. However, even if we could keep track on which particle that came from which motion-corrected micrograph, I'm not sure that it's useful - I don't think we will upload every single particle to ISPyB.

What is more useful I think is to start at 2D classification an work backwards. The pipeline I'm working on at this moment triggers 2D classification at 5000, 20000, 50000 and 100000 particles (these thresholds may be changed in the future). For each threshold we will obtain a 2D classification and it would then be interesting to link this classification to the set of micrographs used, typically around 1000 micrographs for 50000 particles. I think it would be overkill to link thousands motion-corrected micrographs for each 2D classification, maybe the easies would be to link start and end micrograph. We should also store the set of particles used for the 2D classification as we store results from auto processing today. So the AutoProcProgram table would indeed be useful.

Thoughts?

from ispyb-database-modeling.

stufisher commented on August 24, 2024

I'm not suggesting storing which particles were picked on an individual bases. ParticlePicker is just an instance of a particle picker program that was run

Ok i understand better your workflow, ive updated the above to take into account how the pipelines are launched without muddying the movie table. This allows you to launch multiple ParticlePickers each with their linkages to which movies were used and how many particles were picked

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

This sounds good! However, I'm confused - what does 'updated the above' mean? I guess we would need a new SQL script as a base for further discussions.

from ispyb-database-modeling.

stufisher commented on August 24, 2024

Sorry i meant i had updated the sql in my previous comment

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Ok, for me the sql in your comment looks good! @KarlLevik , any comments?

How easy would it be to visualise this proposed model in a data base diagram? I would like to discuss it with Isai and it's easier to have a graph to show.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Hi @olofsvensson - yes, that probably makes sense.

I've updated my SQL in the initial post above. I'll generate a diagram later today.

Still missing some units in the column comments on ParticleClassification.

Any better ideas for the name of InitialModel?

I would love to keep these and the other EM tables in a separate schema (*) as is done with the SSX tables, since this is an extension / module of ISPyB that only a few facilities will use. That way we won't clutter the core schema. Also, it makes it clear that these tables are EM tables without the need to have 'EM' as a prefix in the table names. However, not sure how you feel about that at the ESRF, Olof?

* i.e. different than the "pydb" or "ispyb" as we call it at Diamond

from ispyb-database-modeling.

stufisher commented on August 24, 2024

This is already core ISPyB, we really do not want to query across multiple databases to retrieve this info.

from ispyb-database-modeling.

antolinos commented on August 24, 2024

I agree Stuart besides we prefer to keep the current ISPyB as it is (EM belongs to the core) until the SSX with a modular architecture is proven as a working solution
We need to test if it will fit our needs and other technical details like how we manage the consistency and how aggregate results.

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Thanks @KarlLevik for updating the SQL! I'll go through it and I let you know if I have any questions / remarks.
I agree with @antolinos and @stufisher that we should keep the EM to the core.

from ispyb-database-modeling.

drnasmith commented on August 24, 2024

If I understand correctly there would not be an impact to queries across schemas running in the same cluster. However, maybe file this under a good idea for the future. As @antolinos says the SSX extension should be a good test case for such an approach.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Yes, I'm aware EM is part of the ISPyB collaboration, but that doesn't mean the EM tables have to be in the same schema as the tables that are used across all disciplines. In that sense the EM tables are not "core" ISPyB. (Neither are MX-specific tables, of course, but moving those seems like a more involved job than the EM tables which after all are currently not in use at Diamond and only used by 1 (?) instrument at the ESRF at the moment.

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Yes @KarlLevik you are right, the EM tables are used only by one instrument (CM01) at the ESRF. However, CM01 has been using the EM tables since 2017 and they are used for every experiments, now up to three experiments per week. Also we have the 2D classification pipeline deployed and we would now like to add the results of 2D classification. More, we will soon also need to add more EM tables related to pre-screening of samples.
I agree with you that the best would be to have separate schema for different techniques and this is the direction where we are going with the results of serial crystallography. However, I don't think it would be a good idea to do this change now for EM.

from ispyb-database-modeling.

antolinos commented on August 24, 2024

Hi @KarlLevik
It does not matter if the number of instruments using EM is 1 or 10000, the work to be done to extract the current EM tables from the core is the same.
As I said, it is something that we are not going to do until we are sure that it is worth. This is why we aim to implement that architecture with SSX and then it will be evaluated. With that feedback we could then decide if techniques should be moved from the core or not considering, of course, among other factors the work involved.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Diagrams:

Just the relationships (includes also DataCollection, DataCollectionGroup, Detector and BeamLineSetup):

Full diagram:

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Thanks @KarlLevik for these diagrams! I have finally come to the point where I will start to implement these tables in our "valid" (test) database. I have two requests / questions:

Would it be possible to rename the "InitialModel" to "3DStartingModel"? I find "InitialModel" too vague...
Sometimes it's useful to provide symmetry information to the 2D classification processing. Could we add a "symmetry" column to the "ParticleClassification" table?

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

@KarlLevik , I need help! I get this error when I try to create the ParticlePicker_has_MotionCorrection table:

#1005 - Can't create table pydb.ParticlePicker_has_MotionCorrection (errno: 150 "Foreign key constraint is incorrectly formed") (Details…)

When I click on "Details" I see this (phpMyAdmin 4.6.6):

Storage Engines
InnoDB Documentation
Percona-XtraDB, Supports transactions, row-level locking, foreign keys and encryption for tables

Can you please help me to find out what's wrong with the following SQL (copied from your initial post):

CREATE TABLE ParticlePicker_has_MotionCorrection (
  particlePickerId int unsigned,
  motionCorrectionId int unsigned,
  PRIMARY KEY (particlePickerId, motionCorrectionId),
  CONSTRAINT `ParticlePicker_has_MotionCorrection_fk_particlePickerId`
    FOREIGN KEY (`particlePickerId`)
      REFERENCES `ParticlePicker` (`particlePickerId`)
        ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT `ParticlePicker_has_MotionCorrection_fk_motionCorrectionId`
      FOREIGN KEY (`motionCorrectionId`)
        REFERENCES `MotionCorrection` (`motionCorrectionId`)
          ON DELETE CASCADE ON UPDATE CASCADE
);

I managed to create all other tables including the "ParticleClassification_has_InitialModel" table which has a very similar SQL...

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Hi @olofsvensson !

Re: the problem with the ParticlePicker_has_MotionCorrection table, I suspect it's because of a mismatch between the primary key datatype "display size" of the MotionCorrection table and the default "display size" for ints in your database system. Or potentially that your MotionCorrection PK column is defined without the UNSIGNED keyword. So have a look at your MotionCorrection table and find the motionCorrectionId column and see what it says. You can then try to create ParticlePicker_has_MotionCorrection again, but specify exactly the same datatype as you just saw for motionCorrectionId column, e.g. "motionCorrectionId int(11) unsigned".

Re: your question about renaming InitialModel to 3DStartingModel, that is possible, but then we should of course also rename its primary key column and also rename the ParticleClassification_has_InitialModel table accordingly. Is everyone happy with 3DStartingModel?

Re: adding a "symmetry" column to the "ParticleClassification" table: Yes, certainly. So to be clear, do you mean a column that would allow a value equal to one of: triclinic, monoclinic, orthorhombic, tetragonal, trigonal, hexagonal, cubic?

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi @KarlLevik , thanks for feedback!

I have managed to insert data into the ParticlePicker table using (Java) ISPyB web services (ispyb/ISPyB#518).

I have not yet tested what you suggested for the ParticlePicker_has_MotionCorrection table because I have so far just added a simplified table without the constraints for testing. I'm now wondering if the table ParticlePicker_has_MotionCorrection is practical since each ParticlePicker entry could be based on hundreds if not thousands of movies / motion correction entries. I don't think it's useful to enter all this information in the database for these reasons:

It will take time to upload thousands of motion correction ids
I can't see how this information could be displayed in a useful way

Instead of using the table ParticlePicker_has_MotionCorrection I suggest to add two new columns to the ParticlePicker table:

firstMotionCorrectionId
lastMotionCorrectionId

The first link will ensure that the ParticlePicker entry is linked to a proposal / session. Together with the last link they will at least give an idea which movies that were used for the ParticlePicker.

Coming back to the InitialModel table, maybe it's sufficient to give it a prefix: EMInitialModel, or CryoemInitalModel. 3DStartingModel is probably not a good idea since the name starts with a number... My preference would be CryoemInitalModel.

Concerning the symmetry I don't know which symmetries are allowed for CryoEM processing in general and 2D classification in particular, however, we had a case with five-fold symmetry for which the 2D classification would have been both more accurate and faster if we could have provided the symmetry as input. For the database I don't think we should use a enumeration but just a string column.

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi again @KarlLevik , I have discussed the topic concerning the ParticlePicker_has_MotionCorrection table with out CryoEM scientists here and we have come to these conclusions:

It is desirable to keep track on which motion-corrected micro-graphs which were used for a particular particle picking result.
This information is available as a '.star' file from Relion.
Therefore, it should be enough to store the particle picking .star file in the database and link it via an AutoProcProgramAttachment to the ParticlePicker table.
Thus ParticlePicker_has_MotionCorrection table is not needed. However, we must have at least one reference to a motion correction entry in the database, which could be solved by a new column "firstMotionCorrectionId".
Also, it is desirable to have information on how many particles were picked on each micro-graph. Again this fine-grained information does not to be modeled in the database, a file containing this information is enough (csv, text, etc) attached to the ParticlePicker via an AutoProcProgramAttachment.

Would this work for you at the DLS?

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Hi Olof, I've just updated the SQL (see the first post above) based on your suggestions. I hope I've captured everything.

We'll have to discuss on our side whether we think this will work for us.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Updated 4th March 2021:

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi @KarlLevik , thanks for these modifications! This will work for me, at least for 2D classification - I have no experience with 3D classification so I guess we might have to do some tweaks when we start to do these.

I have just one semantic issue with the proposed data model... The 'ParticlePicker' table has now a reference to 'particleClassificationProgramId'. This means that this table is more a classification table than a particle picker table. Also, the fact that many individual 'ParticleClassification' entries from one classification are linked to this table makes it, I think, more a classification table than a particle picker table.

There are two possible solution to this semantic issue:

We can introduce a new table between 'ParticlePicker' and 'ParticleClassification'. This would make 'ParticlePicker' only have to deal with particle picking. However, I don't suggest this solution because I see no point in separating the particle picking from the classification.
We rename 'ParticleClassification' to 'ParticleClassificationEntry' (or something like this) and we rename 'ParticlePicker' to 'ParticleClassification'.

I'm open to keep the suggested structure if you feel that this is not sufficiently important semantic problem.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

I think we should not postpone fixing any semantic issues, but rather sort them out now while we can!

I agree, let's not add any extra tables to fix semantic issues, but rather rename the tables we have.

How about renaming the ParticlePicker table to ParticleClassificationGroup?

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi @KarlLevik , after having discussed this issue briefly with CryoEM scientists I now think that the best would be to have an intermediate table for the following reason:

Today we have an one-to-one relation between particle picker and 2D classification, however, this might change in the future. Therefore, even if it at this moment is not necessary, I suggest that we keep the ParticlePicker and we introduce a one-to-many relation with a new table ParticleClassificationGroup. Otherwise we will be in trouble if in the future we have more than one program that can do 2D classification from a given set of particles.

This extra table will make the data model slightly more complex but it's a low price to pay in comparison if we have to come back to this data model in the future.

So, my (final) suggestion would be:

ParticlePicker * - 1 ParticleClassificationGroup * - 1 ParticleClassification

The 'particleClassificationProgramId' would then be moved from ParticlePicker to ParticleClassificationGroup. We can then rename 'particlePickerProgramId' and 'particleClassificationProgramId' to simply 'programId' since they will be unique for each table.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

I had a chat with Olof earlier today and we agreed that we do need a new intermediary table between ParticlePicker (pp) and ParticleClassification (pc). Some columns from pp and pc will be moved into the new table, which we will call ParticleClassificationGroup:

pp.particleClassificationProgramId
pc.type
pc.batchNumber
pc.numberOfParticlesPerBatch
pc.numberOfClassesPerBatch
pc.symmertry

I've updated the SQL script in the issue description above with the new table structure.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

We've discovered that ParticleClassification.rotationAccuracy should actually have data type float, not int unsigned.

@olofsvensson , do you agree?

ALTER TABLE `ParticleClassification`
  MODIFY rotationAccuracy float COMMENT '???';

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Yes @KarlLevik , I do agree - rotationAccuracy should be of type float.

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi @KarlLevik , we just discovered that there's a column missing in the "ParticleClassification" table: classDistribution. This parameter is important because it's the best parameter for sorting the results of 2D classification in the GUI.
Would you agree to add this column to the data model?

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Maybe with some explanation!

What is the purpose of this column? This would be a varchar, if I understand correctly? What should be the max length? What kind of data / values would go into this column?

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Yes, sorry, I forgot to mention that 'classDistribution' is a float value produced by Relion 2D classification, like rotationAccuracy, translationAccuracy etc.

from ispyb-database-modeling.

d-j-hatton commented on August 24, 2024

There is some discussion of what the class distribution is in this email thread: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ccpem;ace0783d.1803. It provides a figure of merit for the class. People like to view classes ordered by the class distribution from highest to lowest

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

I propose the following SQL to modify the table:

ALTER TABLE ParticleClassification
  ADD classDistribution float NULL DEFAULT NULL COMMENT 'Provides a figure of merit for the class, higher number is better';

I'll update the CREATE TABLE at the top of the thread.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

@olofsvensson - At Diamond we have identified another column that we need in the ParticlePicker table: summaryImageFullPath.

So if you don't mind, we'd like to propose the following change:

ALTER TABLE ParticlePicker
  ADD summaryImageFullPath varchar(255) NULL DEFAULT NULL COMMENT 'Generated summary micrograph image with highlighted particles';

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi @KarlLevik , ok for me to add summaryImageFullPath in the ParticlePicker table.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

Excellent. I've updated the CREATE TABLE statements at the top of the thread, and here is the up-to-date diagram - not much has changed since the version from April:

from ispyb-database-modeling.

olofsvensson commented on August 24, 2024

Hi @KarlLevik , thanks for the new diagram!

I just spotted a mistake in the 'PariclePicker' SQL:

  CONSTRAINT `ParticlePicker_fk_particlePickerProgramId`
    FOREIGN KEY (`particlePickerProgramId`)
      REFERENCES `AutoProcProgram` (`autoProcProgramId`)
        ON DELETE NO ACTION ON UPDATE CASCADE,
  CONSTRAINT `ParticlePicker_fk_programId`
    FOREIGN KEY (`programId`)
      REFERENCES `AutoProcProgram` (`autoProcProgramId`)
        ON DELETE NO ACTION ON UPDATE CASCADE,

I guess the first constraint should be deleted since the 'particlePickerProgramId' column was renamed to 'programId'.

from ispyb-database-modeling.

KarlLevik commented on August 24, 2024

@olofsvensson Yes, you're correct - have removed that now.

from ispyb-database-modeling.

WIP: New tables and changes for cryo-EM about ispyb-database-modeling HOT 39 OPEN

Comments (39)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent