johnchase / cual-id Goto Github PK
View Code? Open in Web Editor NEWA package for creating and managing sample identifiers in comparative -omics datasets.
License: BSD 3-Clause "New" or "Revised" License
A package for creating and managing sample identifiers in comparative -omics datasets.
License: BSD 3-Clause "New" or "Revised" License
I needed to generate some sample ids this morning and came across a few issues:
BC_generator
instead of Cual-ID
)Cual-ID create-ids
fails with traceback when called with no options - possible to just display the help text instead? Command line users shouldn't get a traceback.Cual-ID create-ids
is confusing as the positional argument is required the [ ]
imply that it's optional-p
, the separator character that's used (:
) is non MEINS-compliant, so keemei warns about it. Could this be replaced with a .
?When I try creating the cual IDs on my Mac on terminal it says -bash: cual-id: command not found. On my coworkers Mac it works fine though. How can this be fixed so that I can create IDs?
fix.py
function doesn't use the same edit distance calculation as mint.py
. Probably best to harmonize these, but again given the random nature of the generated IDs this is likely quite a minor issue in practice.Opened a PR (#27) with one proposed fix. And thanks for the work โ package + paper do a nice job outlining the benefits of the approach!
If the existing list passed to create cualids contains an empty string the function returns nothing:
for id_ in cualid.create_ids(n=7, id_length=4, existing_ids=['']):
print(e)
If an empty string is valid it should not return nothing, if it is not valid it should throw a warning. I don't feel an empty string should be considered valid
Cual-id should only support python 3
Hi I get this error:
cual-id -help
Traceback (most recent call last):
File "/home/jroatkul/anaconda_ete/envs/cual-id/bin/cual-id", line 4, in
import('pkg_resources').run_script('cual-id==0.9.1', 'cual-id')
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 744, in run_script
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/init.py", line 1506, in run_script
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/cual_id-0.9.1-py3.5.egg/EGG-INFO/scripts/cual-id", line 68, in
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/click/core.py", line 716, in call
return self.main(*args, **kwargs)
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/click/core.py", line 675, in main
_verify_python3_env()
File "/home/jroatkul/anaconda_ete/envs/cual-id/lib/python3.5/site-packages/click/_unicodefun.py", line 119, in _verify_python3_env
'mitigation steps.' + extra)
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either run this under Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.
This system lists a couple of UTF-8 supporting locales that
you can pick from. The following suitable locales where
discovered: aa_DJ.utf8, aa_ER.utf8, aa_ET.utf8, af_ZA.utf8, am_ET.utf8, an_ES.utf8, ar_AE.utf8, ar_BH.utf8, ar_DZ.utf8, ar_EG.utf8, ar_IN.utf8, ar_IQ.utf8, ar_JO.utf8, ar_KW.utf8, ar_LB.utf8, ar_LY.utf8, ar_MA.utf8, ar_OM.utf8, ar_QA.utf8, ar_SA.utf8, ar_SD.utf8, ar_SY.utf8, ar_TN.utf8, ar_YE.utf8, as_IN.utf8, ast_ES.utf8, az_AZ.utf8, be_BY.utf8, ber_DZ.utf8, ber_MA.utf8, bg_BG.utf8, bn_BD.utf8, bn_IN.utf8, bo_CN.utf8, bo_IN.utf8, br_FR.utf8, bs_BA.utf8, byn_ER.utf8, ca_AD.utf8, ca_ES.utf8, ca_FR.utf8, ca_IT.utf8, crh_UA.utf8, cs_CZ.utf8, csb_PL.utf8, cv_RU.utf8, cy_GB.utf8, da_DK.utf8, de_AT.utf8, de_BE.utf8, de_CH.utf8, de_DE.utf8, de_LU.utf8, dv_MV.utf8, dz_BT.utf8, el_CY.utf8, el_GR.utf8, en_AG.utf8, en_AU.utf8, en_BW.utf8, en_CA.utf8, en_DK.utf8, en_GB.utf8, en_HK.utf8, en_IE.utf8, en_IN.utf8, en_NG.utf8, en_NZ.utf8, en_PH.utf8, en_SG.utf8, en_US.utf8, en_ZA.utf8, en_ZW.utf8, es_AR.utf8, es_BO.utf8, es_CL.utf8, es_CO.utf8, es_CR.utf8, es_DO.utf8, es_EC.utf8, es_ES.utf8, es_GT.utf8, es_HN.utf8, es_MX.utf8, es_NI.utf8, es_PA.utf8, es_PE.utf8, es_PR.utf8, es_PY.utf8, es_SV.utf8, es_US.utf8, es_UY.utf8, es_VE.utf8, et_EE.utf8, eu_ES.utf8, fa_IR.utf8, fi_FI.utf8, fil_PH.utf8, fo_FO.utf8, fr_BE.utf8, fr_CA.utf8, fr_CH.utf8, fr_FR.utf8, fr_LU.utf8, fur_IT.utf8, fy_DE.utf8, fy_NL.utf8, ga_IE.utf8, gd_GB.utf8, gez_ER.utf8, gez_ET.utf8, gl_ES.utf8, gu_IN.utf8, gv_GB.utf8, ha_NG.utf8, he_IL.utf8, hi_IN.utf8, hne_IN.utf8, hr_HR.utf8, hsb_DE.utf8, ht_HT.utf8, hu_HU.utf8, hy_AM.utf8, id_ID.utf8, ig_NG.utf8, ik_CA.utf8, is_IS.utf8, it_CH.utf8, it_IT.utf8, iu_CA.utf8, iw_IL.utf8, ja_JP.utf8, ka_GE.utf8, kk_KZ.utf8, kl_GL.utf8, km_KH.utf8, kn_IN.utf8, ko_KR.utf8, kok_IN.utf8, ks_IN.utf8, ku_TR.utf8, kw_GB.utf8, ky_KG.utf8, lg_UG.utf8, li_BE.utf8, li_NL.utf8, lo_LA.utf8, lt_LT.utf8, lv_LV.utf8, mai_IN.utf8, mg_MG.utf8, mi_NZ.utf8, mk_MK.utf8, ml_IN.utf8, mn_MN.utf8, mr_IN.utf8, ms_MY.utf8, mt_MT.utf8, my_MM.utf8, nb_NO.utf8, nds_DE.utf8, nds_NL.utf8, ne_NP.utf8, nl_AW.utf8, nl_BE.utf8, nl_NL.utf8, nn_NO.utf8, no_NO.utf8, nr_ZA.utf8, nso_ZA.utf8, oc_FR.utf8, om_ET.utf8, om_KE.utf8, or_IN.utf8, pa_IN.utf8, pa_PK.utf8, pap_AN.utf8, pl_PL.utf8, ps_AF.utf8, pt_BR.utf8, pt_PT.utf8, ro_RO.utf8, ru_RU.utf8, ru_UA.utf8, rw_RW.utf8, sa_IN.utf8, sc_IT.utf8, sd_IN.utf8, se_NO.utf8, shs_CA.utf8, si_LK.utf8, sid_ET.utf8, sk_SK.utf8, sl_SI.utf8, so_DJ.utf8, so_ET.utf8, so_KE.utf8, so_SO.utf8, sq_AL.utf8, sq_MK.utf8, sr_ME.utf8, sr_RS.utf8, ss_ZA.utf8, st_ZA.utf8, sv_FI.utf8, sv_SE.utf8, ta_IN.utf8, te_IN.utf8, tg_TJ.utf8, th_TH.utf8, ti_ER.utf8, ti_ET.utf8, tig_ER.utf8, tk_TM.utf8, tl_PH.utf8, tn_ZA.utf8, tr_CY.utf8, tr_TR.utf8, ts_ZA.utf8, tt_RU.utf8, ug_CN.utf8, uk_UA.utf8, ur_PK.utf8, ve_ZA.utf8, vi_VN.utf8, wa_BE.utf8, wo_SN.utf8, xh_ZA.utf8, yi_US.utf8, yo_NG.utf8, zh_CN.utf8, zh_HK.utf8, zh_SG.utf8, zh_TW.utf8, zu_ZA.utf8
My lab has been using the cual-id system for our barcodes for 2 large projects. It has worked great for us. Having made over 800 barcodes and working with them on a daily basis, we have noticed 2 things that we think could be improved, as a suggestion.
@lkursell mentioned that it would be nice if it was possible to run cualid twice and generate the same set of IDs.
This may need to be taken care of within the UUID call itself
Currently create_ids
will continue to try to add IDs to the list indefinitely. This creates a situation where it may not be possible to add IDs below a minimum edit distance. If this happens the loop will run forever
All of the code that was used to create the figures in the paper submission should be included in this repository
Currently at ~74%.
John and I have been discussing whether this is a worthwhile thing to do. John doesn't want to end up supporting every spreadsheet program's weird data typing as it is read in, but perhaps some general things can be done without going overboard. John was hoping to get input from others.
I noticed when I create the ids when the character "e" is followed by numbers or when the id begins with 0 or any number it causes it to be misread by Excel. Most likely people will create the ids and put them in their metadata spreadsheet. I was wondering how difficult it would be to set rules governing the random creation that disallowed a number directly after "e" and starting with 0 or any number. I am sure that this would reduce the number of ids that can be created for a given length but could be an improvement.
There is also the issue of ids being read in as a date. A simple solution could be that all ids must have at least one letter (or perhaps even better must begin with a letter).
Thoughts on these ideas?
Arron
Common label sheet formats such as 3x6 should be supported with an option
On the off chance that someone's system doesn't have a clock accurate to the chosen interval, we should create a running validator that compares every ID to its immediate predecessor for uniqueness between the two.
Should failure of this condition raise on exception? or would it just retry until it succeeds, failing out after so many retries?
Hi there!
This is such a great tool. After < 24 hours working with this I can tell that our group is going to get a ton of mileage out of cual-id
, so thank you very much!
I'm currently running cual-id
on High Sierra (10.13.6) and have only had one issue: the --existing-ids
option doesn't seem to work!
I installed cual-id
into its own Conda env without issue:
conda create -c https://conda.anaconda.org/johnchase -n cual-id python=3 cual-id
source activate cual-id
I get the following output from cual-id --help
:
(cual-id) myID@machine:~$ cual-id --help
Usage: cual-id [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
create Command to create barcode labels or sample...
fix Compare a set of possibly invalid IDs against...
And generating IDs with --length
/-l
and --fail-threshold
/-f
works without issue:
(cual-id) myID@machine:~$ time cual-id create ids -l 10 -f 0.99 10
5cfc82d2-041a-41b3-898d-fbec9b47bff7 ec9b47bff7
31abdf17-b0bd-4e5f-b0b0-e254bf8be4a7 54bf8be4a7
1f5483ba-492c-4baa-a489-e5c891db6149 c891db6149
4d07142a-669b-4513-a125-299377ac9be8 9377ac9be8
c4705cd4-623f-44eb-9195-edcff7367e41 cff7367e41
61f01f14-7403-42a7-8355-2ca629729804 a629729804
2fc45e0b-3726-491a-a969-13cf60d1b53d cf60d1b53d
d8218270-0e52-4138-9106-5727766baef3 27766baef3
104cb1dc-24b6-4665-9624-612fb13c26e9 2fb13c26e9
ab02825b-88ed-4b6b-8a99-ac33ae581fbb 33ae581fbb
real 0m0.245s
user 0m0.200s
sys 0m0.041s
(cual-id) myID@machine:~$ time cual-id create ids --length 10 --fail-threshold 0.99 10
22281f49-5be0-4465-a38f-0e0650e47f4b 0650e47f4b
b03e9988-ea9b-42fa-ac6c-eadd0253a688 dd0253a688
6b191d3b-25f5-4d73-809d-4e504bf9f15d 504bf9f15d
6d4d3097-c671-4b1d-818a-75e01d3987d0 e01d3987d0
33cd7df0-9c3b-46a0-98f1-dde555baa009 e555baa009
93fa15f8-ddcc-4611-aaad-0de4ca694c4a e4ca694c4a
08b6c9c2-9375-4ef4-8ab5-182e90cea65d 2e90cea65d
d32d0d73-fcf0-4049-a8da-fa950016ba38 950016ba38
f6fcec23-38a4-4e9a-a256-ac682bfcc036 682bfcc036
46eaefc4-2cc1-4d48-aeb2-cc26a3c195fb 26a3c195fb
real 0m0.246s
user 0m0.201s
sys 0m0.040s
But, when I write to file and try to use a pre-existing file to create IDs, I get the following error:
(cual-id) myID@machine:~$ time cual-id create ids --length 10 --fail-threshold 0.99 10 > test-ids.txt
real 0m0.247s
user 0m0.201s
sys 0m0.042s
(cual-id) myID@machine:~$ head test-ids.txt
dccc9d88-7e36-477a-ac4a-610ada8bc9fe 0ada8bc9fe
c296e7d2-fb98-4e31-a4d8-83e1e2999836 e1e2999836
c825451d-4317-4099-8ae0-7a2e4b8a7ee3 2e4b8a7ee3
ea7e6d72-c82a-4840-ad88-9a3c2c7f5435 3c2c7f5435
4df91764-4d5a-4301-8472-8c78cb3f0853 78cb3f0853
d92744b0-342b-4cdd-a084-1a462c6f39a6 462c6f39a6
555202bf-434d-4d2c-b01e-73aa52ee002d aa52ee002d
75eaa961-9b62-4628-9071-5b02be0260cb 02be0260cb
b3a6ffed-895e-4011-aa58-c2f30ae87cd5 f30ae87cd5
75316bf9-81c7-4ca4-8517-4aefa2e9f8f3 efa2e9f8f3
(cual-id) myID@machine:~$ time cual-id create ids --length 10 --fail-threshold 0.99 --existing-ids test-ids.txt 10
Error: no such option: --existing-ids
real 0m0.242s
user 0m0.198s
sys 0m0.040s
I can see that --existing-ids
should be an option as it's clearly in cual-id/cual-id at master
at line 36:
@click.option('-e', '--existing-ids', type=click.File('U'), default=None, required=False)
Am I missing something? Thanks a ton!
Make better names.
This would be a bit weird, but the dashes aren't allowed in QIIME 1 sample ids, so if we could replace them with dots (or remove them) would be really helpful.
The documentation on this website shows that you only need submit one parameter in order to create ID's, the number of ID's you want, but when you run it, you need to give it a second parameter, id_length.
Mainly the click help text.
Hello,
It appears that when the create-ids command is run it creates the UUID column and then the id column. When this file is passed using the --existing-ids it reads the UUID column as the input for existing-ids. This runs with no error (not that one would be expected) and may lead the user to believe it has been done correctly. This can result in duplicate ids. If we change the order of the output in these columns then the created ids.txt file could be passed as the existing-ids file and no reworking of the ids.txt file is needed (as simple as this may be).
Thanks,
Arron and William
Should be updated to reference the mSystems paper (not the pre-print).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.