North
Carolina A&T State University and Elon University

Joint REU in Mathematical Biology

**REU Mentors for Summer 2023
Program**

The following faculty will be
serving as project mentors for the 2021 Joint REU program. The general
area of each mentor's project is listed, but specfic project topics have not yet
been finalized.

-Dr. Nicholas Luke, North Carolina A&T State University: Epidemiological
Modeling or Pharmacokinetics.

-Dr. Karen Yokley, Elon University: Epidemiological Modeling or Pharmacokinetics

-Dr. Ling Xu, North Carolina A&T State University:

**Projects from Summer 2020
REU Program**

**Predicting
Phosphorylation Sites Using Gradient Boosting: **
Phosphorylation is a post-translational modification of proteins where
specialized enzymes calledkinases
add a phosphate group to serine, threonine or tyrosine, which have the highest
abundanceof
phosphorylation sites in the phosphoproteome. Due to its flexibility and
reversibility, phosphory-lation
is implemented in numerous cellular processes, such as signal transduction, cell
proliferation,apoptosis,
gene expression, cell cycle progression, and cytoskeletal regulation. The
prediction ofphosphorylation
sites can also be utilized in cancer therapy to treat B-Cell Lymphoma II. UsingPhospho.ELM
for training and testing data, we compared 6 models: support vector machines
withlinear
and radial kernels, random forest, and gradient boosting with varying
interaction depths of 1,2,
and 5. There were 514 protein sequence features defined for a window size of
nine. The proteinsequence
features implemented into our gradient boosting model were: Shannon Entropy
(SE),Relative
Entropy (RE), Information Gain (IG), Average Cumulative Hydrophobicity (ACH),
Com-position,
Transition and Distribution (CTD), Sequence Order Coupling Numbers (SOCN), QuasiSequence
Order (QSO), Sequence Features (SF), Overlapping Properties (OP), Pseudo-Amino
AcidComposition
(PseAAC), and Amphiphilic Pseudo-Amino Acid Composition (APseAAC). The sixpredictive
models were assessed based on accuracy, sensitivity, specificity, MCC, and F1.
Althoughthe
random forest model was found to have higher accuracy and specificity, gradient
boosting withan
interaction depth of 5 was found to have the higher sensitivity, MCC, and F1. A
balanceddataset
was also considered and measures evaluated for each model.**
**

**Incorporation of Body Composition Variations into a
Physiologically-Based****Pharmacokinetic
Model of Xylene**:
Physiologically based pharmacokinetic (PBPK) models are ordinary differential
equation modelsthat
have been used to estimate the internal dosages for toxicants in the body. One
such toxicantis
xylene, a hydrocarbon that comes in the form of a liquid or vapor and has a wide
variety of uses,most
commonly in paint and paint-thinner in the industrial setting. While any long
exposure toxylene
is harmful (inhalation, ingestion, dermal contact, etc), the inhalation of the
substance is oneof
the most dangerous. Inhalation of xylene may cause dizziness, headache, nausea,
and vomiting.The
current study uses a pre-existing PBPK model to investigate how xylene is
distributed todifferent
compartments of the body. While the PBPK model was originally parameterized for
theaverage
male body, we account for the variety of human bodies, examining how different
bodycompositions
may affect the concentration of xylene in the different compartments. To betterinvestigate
this, body volume parameters are altered and the addition of body height
parametersare
considered. Fat percentages are altered as well to describe different body types
during theresearch.

**A Quantitative Investigation of Preventative Measures for COVID-19**:
Coronaviruses are RNA viruses that cause respiratory infections. In late 2019
and early 2020, anovel
coronavirus, SARS-CoV-2 or COVID-19, was discovered and caused an outbreak in
Wuhan,China.
Since, cases of COVID-19 have spread worldwide, causing a global pandemic.
Therefore, itis
important to understand, predict, and prevent the spread of COVID-19. A
mathematical modelis
used to divide an arbitrary population into susceptible, exposed, infected,
quarantined, andrecovered
populations. The model shows the progression of the virus among these
populations andcan
be used to determine the total infections caused by COVID-19, the rapidity of
the outbreak, andthe
reproductive number. This model can be further used to investigate the efficacy
of measures
taken to prevent the spread of infection such as vaccination, social distancing,
and masks. To
nvestigate
such measures, the model is modified. For vaccination, a vaccinated subset of
the
population is added. For social distancing and masks, the model adds
susceptible, exposed, and
infected
populations who are social distancing or wearing masks. Investigation of these
preventative
measures
also included the addition of new parameters or changes to existing parameter
values
within
the model. For vaccination, parameters representing vaccination rate and
efficacy of the
vaccine were added. For social distancing and mask usage, the contact rate and
transmission rate,
respectively, were adjusted. By examining the effect of these parameters on the
total infections,
rapidity of the outbreak, and reproductive number, the efficacy of each
preventive measure can
be
determined. Then, a suggestion can be made to help mitigate the real-world
destruction of
COVID-19.

**Physiologically Based Pharmacoketic (PBPK) Modeling of Per- and****Polyfluoroalkyl
Substances (PFASs)**:
Per- and
polyfluoroalkyl substances (PFASs) are a group of persistent manufacturing
byproducts.Studies
by several state environmental agencies have found PFASs in water sources,
raising signifi-cant
concern for human safety. This is a result of the chemicals’ ability to stay in
the body for longperiods
of time and their resistance to chemical and thermal breakdown. In lab animals,
PFASshave
been shown to cause tumors as well as have reproductive, developmental, and
immunologicaleffects.
PFAS research with regard to humans is incomplete, yet existing findings are
notable. Inthis
project, we use physiologically-based pharmacokinetic (PBPK) modeling to
represent the flowof
PFASs through the body. PBPK modeling shows the concentration of a toxicant in
differenttissues
and organs in the body, called compartments. Compartments include the fat,
brain, lungs,gut,
and liver. Equation parameters include the rate of blood flow to and from each
compartment;partition
coeffcients, which quantify the difficulty of passing from blood to each
compartment; andmetabolism
coefficients. The final model is a system of differential equations, each
representing therate
of change of the toxicant within one compartment. Each compartment uses a
modified versionof
a general equation to model the toxicant, based initially on ’flow in’ minus
‘flow out’. We first
replicate
existing published data using MATLAB, to test the general structure and coding
of PFASsin
PBPK modeling. Existing parameter values also come primarily from published
data. We then
compare our simulated data to experimental observations, resulting in a
qualitative assessment of
leading
models.

**Projects from Summer 2021 REU Program**

**Bursting Oscillations**:
Certain neurons and other excitable cells exhibit much more
complicated firing patterns than simple repetitive firing. A common mode is
bursting, a dynamic state where a neuron repeatedly fires bursts of spikes, each
of which is followed by a period of quiescence before the next burst occurs.
Neuronal bursting can play important roles in communication between neurons. In
particular, bursting neurons are important for pattern generation and
synchronization. Bursting activity in Aplysia R-15 neuron in pancreas, for
example, is implicated in the generation of insulin secretion of pancreatic beta
cells whereas certain thalamic cells is implicated in the generation of sleep
rhythms. Patients with parkinsonian tremor exhibit increased bursting activity
in neurons within the basal ganglia and cells involved in the generation of
respiratory rhythms within the pre-Botzinger complex also display bursting
oscillations.

In this project, we consider three fundamental types of bursting oscillations,
square-wave bursting, elliptic bursting, and parabolic bursting (G.B. Ermentrout
and D. H. Terman, Mathematical Foundations of Neuroscience, Springer IAM 35,
2010). Models for bursting typically involve multiple timescales. We use
geometric dynamical systems methods in determining what sorts of solutions may
arise and how the solutions depend on parameters.

**Using Smoothing Splines to Investigate
Dendrochronology**: Dendrochronologists use tree-rings to
understand past climate conditions. Because each tree-ring corresponds to one
year of the tree’s life, the width of the ring should vary based on
environmental factors like precipitation or soil moisture from that particular
year. Naturally, there is a lot of “noise” in this data. One of the more popular
techniques to remove this high-frequency information is based on cubic smoothing
splines, which can be thought of as a more flexible regression. The flexibility
of the spline is controlled by a value called the smoothing parameter. The
popular dendrochronological implementation of smoothing splines, though, is
theoretically complex and thus difficult to choose a smoothing parameter. Their
unique approach to selecting a smoothing parameter differs from the standard
spline procedures in other disciplines such as statistics. There is a vast
wealth of knowledge on smoothing splines in statistics (e.g., using cross
validation to find smoothing parameters), and this information is not currently
being utilized in dendrochronology. Our primary goal will be to find a
connection (either theoretical or numerical) between the common smoothing spline
methods and the dendrochronology method. If successful, we will be able to
propose to the dendrochronology field a simpler, better understood methodology
that still achieves the same goals. This will bypass the difficulty that the
current tree-ring procedure presents. If we cannot find a connection, we will
explore why a connection does not exist and some ways to simplify the current
approach.

**Quantitative Investigations of the COVID-19
Pandemic**: In 2019, a novel coronavirus (Covid-19) was
discovered in Wuhan, China. Covid-19 has been declared a global pandemic by the
World Health Organization and has affected nearly every country in the world.
In this project, students will study the basic model structure that is used for
predictions in such epidemics. Students will modify and apply mathematical
models to investigate different aspects of the current Covid-19 pandemic or
other potential outbreaks. Project investigations may include vaccination
efficacy requirements, parameter identification (based on available data),
comparisons of covid-19 responses of different countries, comparisons of
different diseases, or intervention strategies.

**Mathematical Modeling of Gender Differences in ADME
properties (absorption, distribution, metabolism, excretion) of emerging PFASs**:
Per- and polyfluoroalkyl substances (PFASs) are a global research priority
because they have been detected in the drinking water of millions of people and
are widespread in the blood of the general human population. Experimental data
indicate that there is a difference in how PFASs are distributed throughout the
body of exposed rodents, based on the gender of the animal. In this project,
students will utilize Physiologically-Based Pharmacokinetic (PBPK) or
Physiologically-Based Pharmacodynamic (PBPD) models to investigate and
potentially predict how PFASs are distributed through the body. Models will be
investigated to determine which parameters would lead to the observed gender
differences.

**Impacts of Variable Selection on Predictive Modeling**:
Statistical or machine learning algorithms are commonly used to build a
predictive model. In the era of Big Data, predictive models often encounter a
large p, small n problem which is the situation that a large number of predictor
(p) is greater than the number of observations (n). The high dimensional
predictors cause significant problems in predictive modeling. First, modeling is
computationally infeasible because of the nonexistence of inverse matrix, which
may not allow a numerical optimization. Second, the inclusion of non-informative
predictors undermines the prediction performance. Machine learning algorithms
suffer from this large p small n problem, and they incorporate variable
selection (or feature selection) techniques. This large p small n problem has
been successfully addressed using penalized regression methods such as LASSO,
ENET, SCAD and MCP. These regression methods incorporate various mathematical
constraints on the coefficients of predictors to ensure the selected predictors
less than the number of observations. This project aims to adopt the penalized
regression methods for variable selection in predictive modeling. These methods
will be compared to random forest-based variable selection methods using
simulation studies and empirical data analysis. The empirical studies include
the prediction of a disease status and cyber-bullying status.

**Projects from the 2022 Summer REU Program**

**#1 Epidemiological Models Applied to Multi-Pen Pig Farms**:
Pig farms have played a central role in historical outbreaks and form an
important part of our food landscape. Recent reports indicate concern for the
way pigs are kept in factory pig farms. In this project, we will review
necessary literature related to the pig farming industry as well as the
prevalence of infection disease in pig farms [2]. We will use a Matlab ODE45
model with discrete time rearrangement to model a multi-pen pig farm. Our
approach is inspired by the article [3] where a multi-pen farm was used for
gilts, gestating sows, and nursing sows with piglets. We begin by investigating
results from a three part SIR model for two pens. The model incorporates pigs
in the main pen, nursing sows in the piglet pen, and piglets.

We will begin with an SIR model and then alter it to an SEIR model. Finally,
depending upon our investigation and time considerations, we will consider other
configurations, reinfection, vaccination strategies, parameter estimation, and
sensitivity analysis.

References

[1] Dobie, A. P., Demirci, A., Bilge, A. H., & Ahmetolan, S. (2019). On the time
shift phenomena in epidemic models. arXiv preprint arXiv:1909.11317.

[2] Lowe, J., & Johnson, E. (2007). Alternative flow strategies in sow farms.

[3] Reynolds JJH, Torremorell M, Craft ME (2014) Mathematical Modeling of
Influenza A Virus Dynamics within Swine Farms and the Effects of Vaccination.
PLoS ONE 9(8): e106177. doi:10.1371/journal.pone.0106177

**#2 A study of the motion of the Jellyfish and material
transport in their vicinity: **Jellyfish swim by inflating and deflating
their bells. This unique swimming style not only helps them move around in the
ocean but also plays a crucial role for them to obtain food or other resources.

As Jellyfish are inflating, their motion is slow, the fluid flow is
transported into the bell, an axisymmetric vortex ring is formed at the edge of
the bell where tentacles are present. Next, as the Jellyfish is deflating, fluid
flow is pushed out of the bell promptly, a new and similar axisymmetric vortex
ring is formed but of opposite sign, and the Jellyfish is lifted upward. How the
Jellyfish maneuver in the ocean by doing such simple and repeating motion has
been explored by many researchers in the community of fluid dynamics.

In this project, we are going to study the mechanism of the Jellyfish
swimming numerically. We will start with a two-dimensional model. The rotating
fluid flow near the edge of the bell is modeled using point vortices. The
evolution of these point vortices is shown to elucidate the role of the vortex
interactions on the Jellyfish motion. Various locations and strengths of these
point vortices will be examined. Besides, passive particles that travel at the
fluid velocity are placed in the fluid flow to track the transport. Our study
will explore the propulsive advantage of Jellyfish in swimming and its important
implications for bio-engineered propulsion systems.

**#3: Quantitative Investigations of the Covid-19 Pandemic
and Other Infectious Diseases**: In 2019, a novel coronavirus (Covid-19)
was discovered in Wuhan, China. Covid-19 was declared a global pandemic by the
World Health Organization, and many interventions were used to try to control
the spread of the virus. Strategies to control the outbreak included mask
wearing, social distancing, and increased use of disinfectants. These
strategies likely also affected the spread of other infectious diseases, such as
influenza. In this project, students will study the basic model structure that
is used for predictions of the spread of infectious diseases. Students will
modify and apply mathematical models to investigate different how Covid-19
interventions may have also impacted the spread of other diseases.

**#4: A Quantitative Investigation of the Effect of Body
Disposition on the ADME properties (absorption, distribution, metabolism,
excretion) of emerging PFASs**: Per- and polyfluoroalkyl substances
(PFASs) are a global research priority because they have been detected in the
drinking water of millions of people and are widespread in the blood of the
general human population. Recent studies data indicate that there is a
difference in how PFASs are distributed throughout an animal based on its body
condition. In this project, students will utilize Physiologically-Based
Pharmacokinetic (PBPK) or Physiologically-Based Pharmacodynamic (PBPD) models to
investigate and potentially predict how PFASs are distributed through the body
of exposed animals. Models will be investigated to determine which parameters
would lead to the observed differences based on body condition..

**#5: Mathematical Modeling of Bipolar Disorder:
**Bipolar II disorder is characterized by alternating hypomanic and
depressive episodes and afflicts about 1% of the United States adult population.
In this project, participants will investigate and compare different
mathematical models of bipolar disorder. After establishing the baseline
model(s), we will investigate the sensitivity of the parameters, the stability
of the system, and possibly modify the models to incorporate treatment.