Code Monkey home page Code Monkey logo

Comments (15)

cduvallet avatar cduvallet commented on July 30, 2024 1

Update, just heard back from the DrugBank people and they said that including the info in the header of the file is fine. Full speed ahead!

@forzavitale can you update us on your progress from the hackathon (if you ended up working on this)?

from drug-spending.

acutrell avatar acutrell commented on July 30, 2024 1

I don't have the coding ability to do this, but I am knowledgeable about the domain as an informatics pharmacist and willing to offer some help from that aspect. Pretty sure the answer to this problem is the Structured Product Labeling (SPL). It is a document markup standard approved by Health Level Seven (HL7) and adopted by FDA as a mechanism for exchanging product and facility information.

Different datasets use different drug identifiers: brand name, generic name, NDA, NDC, etc. and it is hard to find the same drug in different datasets. The OpenFDA features harmonization of drug identifiers and fields for various pharmacological use are part of the dataset. Take a look: https://open.fda.gov/drug/label/reference/

from drug-spending.

cduvallet avatar cduvallet commented on July 30, 2024

Thanks for making this issue @jenniferthompson!

  • re: 1. I'll email them today!
  • re: 2. It would be great to have help from someone who's been more involved in the data analysis projects happening here to work with me to figure out what the most interesting/useful part of the data will be. This can probably wait until after we've started poking around and seeing what's available in DrugBank.
  • re: 3. Happy to take the lead on this, looks like it's easy to download and will probably be relatively straightforward to parse.

from drug-spending.

cduvallet avatar cduvallet commented on July 30, 2024

They got back to me pretty quickly, and had some questions about data.world that I'm not sure I know the answer to:

Looks like an interesting project, thanks for reaching out!

I checked out your site and noticed a couple of issues:

  1. Data.world looks like a commercial project that requires people have accounts to download data. It doesn't look like they have a good way to post the licenses for datasets? Maybe I am not understanding what data.world is.

  2. I don't see a clear indication of the license for the datasets available through your website, or clear citations to the datasets there?

Your use case looks like a non-commercial use case, so that should be fine but, when our data is shared it has to be shared both with a citation and the license we share our data under.

We also have 2 datasets that are public domain and you can do whatever you want with them, on this page: https://www.drugbank.ca/releases/latest#open-data

They include DrugBank identifiers, names, and synonyms to permit easy linking and integration into any type of project.

Is there any way we can include their license and citation on data.world? I'm pretty sure it will be more characters than are allowed in the "description" on data.world, and I'm not sure where else dataset metadata can be put on data.world (which is pretty surprising...)

Alternatively, should we just stick with the public domain data?

from drug-spending.

mattgawarecki avatar mattgawarecki commented on July 30, 2024

from drug-spending.

forzavitale avatar forzavitale commented on July 30, 2024

hi all-- first time jumping in here! at the NYC hackathon rn, seems like this issue is pretty recent and would like to start munging something.... guidance?

from drug-spending.

cduvallet avatar cduvallet commented on July 30, 2024

@forzavitale I pinged the DrugBank people again to ask if we could just include the license and citation info in the header of the file, since we can't assign it to the file directly via data.world. They haven't gotten back to me about that though. That said, in my opinion it should be fine so you can probably start working on the data. Let's just make sure to check back in with them before we post the data to data.world.

Alternatively, you can poke around the public domain data and see if that's enough to get us what we want!

from drug-spending.

jenniferthompson avatar jenniferthompson commented on July 30, 2024

I think that's a good plan @cduvallet - and at the speed the data.world folks move (read: blazing fast), it's entirely plausible that we might be able to assign a file-specific license by the time we're ready to post it.

from drug-spending.

jenniferthompson avatar jenniferthompson commented on July 30, 2024

Fantastic! Thanks so much for following up, @cduvallet! 🎉

from drug-spending.

darwinyfu avatar darwinyfu commented on July 30, 2024

Is this still a project that needs help? I see the label but comments are fairly old.

Been lurking on D4D for a while but interested in working on something.

from drug-spending.

darya-akimova avatar darya-akimova commented on July 30, 2024

Hello! The project has been dormant for a while (hence the old comments), I'm one of the people that's trying to get this project going again. Any issue with the label status-under-review can be ignored for now, it either can't be tackled yet or may need to be trimmed/reformatted. This is one of the older issues that I thought would be good to try and get through because drugbank.ca materials seem to be very useful for our current goal of matching drugs to therapeutic uses.

from drug-spending.

darya-akimova avatar darya-akimova commented on July 30, 2024

In PR #83 @proof-by-accident investigated how many of the Medicare drugs can be found in the drugbank.ca data. The results seem similar to matching attempts attempts from other sources: a good number of drugs can be matched easily on the first pass, but about twice as many were not matched and will probably require a non-trivial amount of research to match the rest properly.

from drug-spending.

darya-akimova avatar darya-akimova commented on July 30, 2024

Oh this seems great! The OpenFDA might be just what we need because you're right, we have been running into the issue where not everything is in one dataset and the names can be inconsistent between datasets. Thanks for this suggestion.

from drug-spending.

veena-v-g avatar veena-v-g commented on July 30, 2024

Can I help?

from drug-spending.

TBusen avatar TBusen commented on July 30, 2024

Is this still active? Can I start this or is this throw away work?

from drug-spending.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.