Code Monkey home page Code Monkey logo

biobigbird's Introduction

biobigbird's People

Contributors

thevasudevgupta avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

techthiyanes

biobigbird's Issues

how did we initialize position embeddings when transitioning from BERT to BigBird?

from transformers import FlaxAutoModelForMaskedLM, AutoTokenizer

model_id = 'microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract'
model = FlaxAutoModelForMaskedLM.from_pretrained(model_id, from_pt=True)

tokenizer = AutoTokenizer.from_pretrained(model_id)

import jax.numpy as jnp
from flax.serialization import to_bytes

params = model.params
pe = params['bert']['embeddings']['position_embeddings']['embedding']
npe = jnp.concatenate([pe] * 8)

params['bert']['embeddings']['position_embeddings']['embedding'] = npe

with open('flax_model.msgpack', "rb") as f:
    f.write(to_bytes(params))


# update `max_position_embeddings ` from `config.json` to 4096
# push everything to HF Hub

conversion Flax -> PyTorch

from transformers import BigBirdForMaskedLM, AutoTokenizer
model = BigBirdForMaskedLM.from_pretrained("ddp-iitm/bigbird_base_v4_pubmed_raw_text_v3", revision="9e687730b2a66985a2c8035c2de48907303c9eac", from_flax=True)

tokenizer = AutoTokenizer.from_pretrained("ddp-iitm/bigbird_base_v4_pubmed_raw_text_v3", revision="9e687730b2a66985a2c8035c2de48907303c9eac")

example = """In mammals, the chromoprotein makes up about 96% of the red blood cells' dry content (by weight), and around 35% of the total content (including water).[5] Hemoglobin has an oxygen-binding capacity of 1.34 mL O2 per gram,[6] which increases the total blood oxygen capacity seventy-fold compared to dissolved oxygen in blood. The mammalian hemoglobin molecule can bind (carry) up to four oxygen molecules.[7] Hemoglobin is involved in the transport of other gases: It carries some of the body's respiratory carbon dioxide (about 20โ€“25% of the total)[8] as carbaminohemoglobin, in which CO2 is bound to the heme protein. The molecule also carries the important regulatory molecule nitric oxide bound to a thiol group in the globin protein, releasing it at the same time as oxygen.[9] Hemoglobin is also found outside red blood cells and their progenitor lines. Other cells that contain hemoglobin include the A9 dopaminergic neurons in the substantia nigra, macrophages, alveolar cells, lungs, retinal pigment epithelium, hepatocytes, mesangial cells in the kidney, endometrial cells, cervical cells and vaginal epithelial cells.[10] In these tissues, hemoglobin has a non-oxygen-carrying function as an antioxidant and a regulator of iron metabolism.[11] Excessive glucose in one's blood can attach to hemoglobin and raise the level of hemoglobin A1c.[12] Hemoglobin and hemoglobin-like molecules are also found in many invertebrates, fungi, and plants.[13] In these organisms, hemoglobins may carry oxygen, or they may act to transport and regulate other small molecules and ions such as carbon dioxide, nitric oxide, hydrogen sulfide and sulfide. A variant of the molecule, called leghemoglobin, is used to scavenge oxygen away from anaerobic systems, such as the nitrogen-fixing nodules of leguminous plants, lest the oxygen poison (deactivate) the system. Hemoglobinemia is a medical condition in which there is an excess of hemoglobin in the blood plasma. This is an effect of [MASK] hemolysis, in which hemoglobin separates from red blood cells, a form of anemia. There is more than one hemoglobin gene: in humans, hemoglobin A (the main form of hemoglobin present in adults) is coded for by the genes, HBA1, HBA2, and HBB.[28] The hemoglobin subunit alpha 1 and alpha 2 are coded by the genes HBA1 and HBA2, respectively, which are both on chromosome 16 and are close to each other. The hemoglobin subunit beta is coded by HBB gene which is on chromosome 11 . The amino acid sequences of the globin proteins in hemoglobins usually differ between species. These differences grow with evolutionary distance between species. For example, the most common hemoglobin sequences in humans, bonobos and chimpanzees are completely identical, without even a single amino acid difference in either the alpha or the beta globin protein chains.[29][30][31] Whereas the human and gorilla hemoglobin differ in one amino acid in both alpha and beta chains, these differences grow larger between less closely related species."""

model(**tokenizer(example, return_tensors="pt", max_length=768, padding="max_length"))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.