๐ Personal webpage: https://thevasudevgupta.github.io/
thevasudevgupta / biobigbird Goto Github PK
View Code? Open in Web Editor NEWBigBird for bio-medical domain
Home Page: https://huggingface.co/bisectgroup
BigBird for bio-medical domain
Home Page: https://huggingface.co/bisectgroup
๐ Personal webpage: https://thevasudevgupta.github.io/
from transformers import FlaxAutoModelForMaskedLM, AutoTokenizer
model_id = 'microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract'
model = FlaxAutoModelForMaskedLM.from_pretrained(model_id, from_pt=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
import jax.numpy as jnp
from flax.serialization import to_bytes
params = model.params
pe = params['bert']['embeddings']['position_embeddings']['embedding']
npe = jnp.concatenate([pe] * 8)
params['bert']['embeddings']['position_embeddings']['embedding'] = npe
with open('flax_model.msgpack', "rb") as f:
f.write(to_bytes(params))
# update `max_position_embeddings ` from `config.json` to 4096
# push everything to HF Hub
https://www.ncbi.nlm.nih.gov/pmc/tools/ftp/ > https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_bulk/
from datasets import load_dataset
data = load_dataset("pubmed", cache_dir="/outputs/pubmed_data")
print(data)
from transformers import BigBirdForMaskedLM, AutoTokenizer
model = BigBirdForMaskedLM.from_pretrained("ddp-iitm/bigbird_base_v4_pubmed_raw_text_v3", revision="9e687730b2a66985a2c8035c2de48907303c9eac", from_flax=True)
tokenizer = AutoTokenizer.from_pretrained("ddp-iitm/bigbird_base_v4_pubmed_raw_text_v3", revision="9e687730b2a66985a2c8035c2de48907303c9eac")
example = """In mammals, the chromoprotein makes up about 96% of the red blood cells' dry content (by weight), and around 35% of the total content (including water).[5] Hemoglobin has an oxygen-binding capacity of 1.34 mL O2 per gram,[6] which increases the total blood oxygen capacity seventy-fold compared to dissolved oxygen in blood. The mammalian hemoglobin molecule can bind (carry) up to four oxygen molecules.[7] Hemoglobin is involved in the transport of other gases: It carries some of the body's respiratory carbon dioxide (about 20โ25% of the total)[8] as carbaminohemoglobin, in which CO2 is bound to the heme protein. The molecule also carries the important regulatory molecule nitric oxide bound to a thiol group in the globin protein, releasing it at the same time as oxygen.[9] Hemoglobin is also found outside red blood cells and their progenitor lines. Other cells that contain hemoglobin include the A9 dopaminergic neurons in the substantia nigra, macrophages, alveolar cells, lungs, retinal pigment epithelium, hepatocytes, mesangial cells in the kidney, endometrial cells, cervical cells and vaginal epithelial cells.[10] In these tissues, hemoglobin has a non-oxygen-carrying function as an antioxidant and a regulator of iron metabolism.[11] Excessive glucose in one's blood can attach to hemoglobin and raise the level of hemoglobin A1c.[12] Hemoglobin and hemoglobin-like molecules are also found in many invertebrates, fungi, and plants.[13] In these organisms, hemoglobins may carry oxygen, or they may act to transport and regulate other small molecules and ions such as carbon dioxide, nitric oxide, hydrogen sulfide and sulfide. A variant of the molecule, called leghemoglobin, is used to scavenge oxygen away from anaerobic systems, such as the nitrogen-fixing nodules of leguminous plants, lest the oxygen poison (deactivate) the system. Hemoglobinemia is a medical condition in which there is an excess of hemoglobin in the blood plasma. This is an effect of [MASK] hemolysis, in which hemoglobin separates from red blood cells, a form of anemia. There is more than one hemoglobin gene: in humans, hemoglobin A (the main form of hemoglobin present in adults) is coded for by the genes, HBA1, HBA2, and HBB.[28] The hemoglobin subunit alpha 1 and alpha 2 are coded by the genes HBA1 and HBA2, respectively, which are both on chromosome 16 and are close to each other. The hemoglobin subunit beta is coded by HBB gene which is on chromosome 11 . The amino acid sequences of the globin proteins in hemoglobins usually differ between species. These differences grow with evolutionary distance between species. For example, the most common hemoglobin sequences in humans, bonobos and chimpanzees are completely identical, without even a single amino acid difference in either the alpha or the beta globin protein chains.[29][30][31] Whereas the human and gorilla hemoglobin differ in one amino acid in both alpha and beta chains, these differences grow larger between less closely related species."""
model(**tokenizer(example, return_tensors="pt", max_length=768, padding="max_length"))
https://drive.google.com/drive/folders/1XCwCjY0e1b2cnu_SyvFUfUjMfioyX0rJ
https://drive.google.com/a/smail.iitm.ac.in/uc?id=0B_JDnoghFeEKLTlJT09IckMwOFk&export=download
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.