#biology

waynerad@diasp.org

"BioModels is a repository of mathematical models of biological and biomedical systems. It hosts a vast selection of existing literature-based physiologically and pharmaceutically relevant mechanistic models in standard formats. Our mission is to provide the systems modelling community with reproducible, high-quality, freely-accessible models published in the scientific literature."

Another fascinating thing I just discovered existed. So these are mathematical models, not neural network models, as we are used to hearing the word "model" used (at least if you follow my stuff). So they are things like population models, metabolic networks, petri nets, ordinary differential equation models, delayed differential equation models, stochastic differential equation models, partial differential equation models, differential algebraic equation models, boolean models, cellular Potts models, agent-based models, pharmacodynamics models, constraint-based models, steady-state models, rule-based models, protein-protein interaction networks, Markov chains, finite element spatial models, and physiologically based pharmacokinetic models.

All the models are supposed to be in a data format called Systems Biology Markup Language (SBML). In addition to SBML, some are in other formats like MATLAB/Octave, COMBINE archive (whatever that is), MorpheusML (whatever that is), CompuCell3DML (whatever that is), as well as programming languages Python, R, and Mathematica, and you can filter on these when you do searches. They also promise a human readable summary of each model in PDF format.

You can also filter by organism, such as Homo sapiens, Saccharomyces cerevisiae, Mus musculus, Escherichia coli, Drosophila melanogaster (fruit fly), and you can filter on larger categories like Mammalia, Vertebrata, Eukaryota, Chordata, etc.

You can also filter by disease, such as Alzheimer's, Covid-19, Diabetes Mellitus, Cancer, Parkinson's, Osteoarthritis, etc.

They have a filter for "GO", which means gene onotology. The Gene Ontology project aims to come up with the same or similar names for genes that have the same function across all species. Here the "GO" filter has such categories as cellular metabolic processes, cytoplasm, nucleus, translation (going from DNA to proteins), extracellular regions, cytosol, protein catabolic process, cell death, endoplasmic reticulum, mitochondrion, lysosomes, Golgi apparatus, peroxisome, cell growth, blood coagulation, protein phosphorylation, ATP hydrolysis, and so on (very long list actually -- hundreds of items).

You can filter on "UniProt", which is a "Universal Protein Database". Proteins have names like Pyruvate kinase PKLR, Cytoplasmic aconitate hydratase, Glucose-6-phosphate isomerase, Hydroxymethylglutaryl-CoA synthase, and Fructose-bisphosphate aldolase A. Well, those all have -ase names, indicating they are enzymes, but not all proteins are enzymes, of course, so the database has names like ATP-binding cassette sub-family A member 1 as well.

You can filter on "ChEBI", which stands for "Chemical Entities of Biological Interest". Here you can search for things like ADP, ATP, glycerol, phosphoenolpyruvate, acetaldehyde, aldehydo-N-acetyl-D-glucosamine, hydrogen peroxide, glycine, ethanol, NADH, etc.

The last thing they have that you can filter on is Ensembl. I actually told you all about Ensembl before, when 2.5 years ago I told you about the "mitochondria calcium channel mystery". Our mitochondria calcium channel is made of 3 proteins, but fungi have only 2 of them. With some clever research it was determined that the ancestor of both us and fungi had all 3 proteins, but fungi lost one of them. The researchers used 1,156 eukaryotic genomes to do this, but they didn't sequence those 1,156 eukaryotic genomes themselves -- they just download them from the Ensembl Project website. I thought it was amazing something so far in the distant evolutionary past was possible to determine, and all just by downloading data from this already existing database. The Ensembl Project was created in tandem with the Human Genome Project in the 90s. Lookups into this database go by actual gene name, which are those cryptic all-caps-and-numbers designations, such as NFKB2, NANOGP1, POU5F1, QSOX2, CASP8, etc.

BioModels

#solidstatelife #biology #mathematics

digit@iviv.hu
tekaevl@diasp.org

wow

♲ Wayne Radinsky - 2023-03-10 04:19:24 GMT

"Take the DNA Delorean: the promise of large language models in genomics."

I can't improve on just quoting from the article so I'm going to quote a few sentences for each of the major developments highlighted, which will still take a bunch of space. Click through to the full article for details and links to the specific technologies mentioned.

"Genomic instrument companies such as Oxford Nanopore Technologies, PacBio, Singular, and Ultima have publicly announced using graphics processing units inside their sequencing platforms for AI-based base calling. These models span CNN, RNN, and transformer-based AI models, including DeepConsensus in PacBio's instruments which uses gap-aware sequence transformers to correct errors and enable read accuracy."

"AI has helped accelerate variant calling, variant filtering, and base calling in genomic instruments and analysis, but what about in other areas that include predictions? Large language models (LLMs) are AI models built on transformer architecture, and their application to DNA, RNA, and proteins is a burgeoning field in genomics."

"Compared to the vocabulary of 20 amino acids and an average sequence length of 350 amino acids for proteins, genomic LLMs operate on a vocabulary of four nucleotides and very long sequences -- the haploid human genome is three billion nucleotide pairs."

"At this year's SuperComputing conference, we shared the Gordon Bell special award with more than two dozen academic and commercial researchers from Argonne National Laboratory, the University of Chicago, and others. The honored work was a genomic LLM that tracks the genetic mutations and predicts variants of concern in SARS-CoV-2, the virus behind COVID-19. With anywhere from 2.5 to 25 billion trainable parameters, the Genome-Scale language models (GenSLMs) represent some of the first and largest whole genome LLMs trained on over 100 million nucleotide sequences."

"In September of this year, Nature featured a deep generative model focusing on regulatory DNA and predictions of lowest and highest levels of expression in yeast."

"Enformer -- released in 2021 -- is a deep learning model with a transformer architecture for genomic enhancers that predicts gene expression from DNA sequences and can integrate information from long-range interactions in the genome. This model helps scientists understand how noncoding DNA makes decisions about gene expression in different cell types, such as in skin, liver, and heart cells, among others."

"scBERT -- released in September 2022 -- is another groundbreaking genomic LLM that understands gene-gene interactions and is trained on large corpora of unlabeled scRNA-Seq data."

"DNABERT -- released in 2021 -- is another genomic LLM that understands nucleotide sequences and can make downstream predictions of promoters, splice sites, and transcription factor binding sites."

Take the DNA Delorean: the promise of large language models in genomics

#solidstatelife #ai #nlp #llms #biology #genomics #proteomics

devevo@diasp.org

#biology #humans #chimpanzees #cognitivetest
The Cognitive Tests in Which Humans Lose to Chimpanzees
Though humans share 99% of our DNA with chimpanzees, we regularly shrug off the biological similarity with a haughty air of superiority, confident that our cognitive abilities — endowed by a brain three times larger, with 14 billion more neurons — firmly trounce theirs.
We shouldn’t be so sure.
True, chimpanzees have yet to master flight, manufacture semiconductors, or cure a disease, but there are a number of basic cognitive tasks where, in a battle between human and ape, they come out on top…
https://www.realclearscience.com/blog/2023/03/04/the_cognitive_tests_in_which_humans_lose_to_chimpanzees_885292.html

waynerad@diasp.org

"New blood types are often discovered following medical disasters." Wait, what? I thought there were just 4 blood types: A, B, AB, and O. Well, with the + or -, so multiply by 2 to get 8. Oh me of such little knowledge.

First a review of what the A, B, AB, and O and the +/- represent. It was discovered a little over 100 years ago that when combining blood from different people, sometimes the red blood cells would clump together, and sometimes they wouldn't. These were grouped into 4 categories. Eventually it was figured out that there were two antigens, which were designated anti-A and anti-B. In immunology, immune system cells are referred to as having special molecules, called antibodies, generally a large, Y-shaped protein, to identify pathogenic agents. A molecule that binds to an antibody is called an antigen. This is where the A, B, AB, and O designations come from. O simply means neither anti-A nor anti-B is present.

The +/- indicates presence or absence of anti-RhD. "Rh" originally stood for "Rhesus factor", but it was subsequently discovered that the antigen being studied was not the same between humans and rhesus monkeys, so the name doesn't make sense but has stuck anyway. The "Rh" system originally had 5 antigens, one of which was designated "D". This article refers to "the Rh antigen", but the "Rh" system has actually been expanded to 49 antigens.

"Scientists discovered the Vel and Langereis blood group systems after patients suffered hemolytic reactions following transfusions."

"In 1953, a child in Venezuela died of hemolytic disease three days after birth." A blood lab identified the antigen but knew of no other person with that antigen, so they named it after the family, "Diego". It was subsequently found that the reason they had no other people with that antigen in the database was because they didn't have native Americans -- 36% of the indigenous people of South America have the antigen.

"There are only ten clinically relevant blood typing systems, and if you were to expand your blood type to include them, it might look like ABO(A+B+), Rh(D+c+e+), MNS(M+N-S+s-), P1+, Lu(a+b+), Kell(K-k+), Le(a+b-), Fy(a+b-), Jk(a+b -- ). You don't need to remember all this because physicians will test them before you get a transfusion."

"The second reason you shouldn't be concerned about a surprising reaction is that physicians no longer rely entirely on blood types to determine if blood is compatible. Instead, they use a technique called crossmatching, which involves mixing donors' serum with recipients' blood cells in a test tube. If the two are incompatible, the blood will clump."

New blood types are often discovered following medical disasters

#discoveries #biology #immunology

ghostmonkey@sysad.org

Hey everyone, I'm #newhere! I'm studying #holistic #environmental #sciences.

My interests are #earth #environment #climate #cleanliness #art #music #reading...I actually have a lot of interests, and these are just limited to what I usually do.

I'm also interested in #psychology #psychological-memes #cooking-tutorials #make-up #biology #chemistry #film #tv #spiritual-practices #motivational #self-improvement #life-hacks #helping #charity #peace

Thanks @anyspace@sysad.org my love, for the invite.

kennychaffin@diasp.org

Great article, but TLDR: "So if we focus on pathogens that are beginning to take hold in people, such as the dog coronavirus that infected the 5-month-old in 2017, we're not looking at every animal for every possible pathogen. And we can catch these spillover viruses before they fully adapt and become highly transmissible," he says.

https://www.npr.org/sections/goatsandsoda/2023/02/15/1152892721/how-to-stop-pandemics

#viruses #biology #medicine