Home             Projects

North Carolina A&T State University and Elon University 
Joint REU in Mathematical Biology

REU Mentors for Summer 2023 Program

The following faculty will be serving as project mentors for the 2021 Joint REU program.  The general area of each mentor's project is listed, but specfic project topics have not yet been finalized.

-Dr. Nicholas Luke, North Carolina A&T State University: Epidemiological Modeling or Pharmacokinetics.
-Dr. Karen Yokley, Elon University: Epidemiological Modeling or Pharmacokinetics
-Dr. Ling Xu, North Carolina A&T State University:

Projects from Summer 2020 REU Program

Predicting Phosphorylation Sites Using Gradient Boosting: Phosphorylation is a post-translational modification of proteins where specialized enzymes calledkinases add a phosphate group to serine, threonine or tyrosine, which have the highest abundanceof phosphorylation sites in the phosphoproteome. Due to its flexibility and reversibility, phosphory-lation is implemented in numerous cellular processes, such as signal transduction, cell proliferation,apoptosis, gene expression, cell cycle progression, and cytoskeletal regulation. The prediction ofphosphorylation sites can also be utilized in cancer therapy to treat B-Cell Lymphoma II. UsingPhospho.ELM for training and testing data, we compared 6 models: support vector machines withlinear and radial kernels, random forest, and gradient boosting with varying interaction depths of 1,2, and 5. There were 514 protein sequence features defined for a window size of nine. The proteinsequence features implemented into our gradient boosting model were: Shannon Entropy (SE),Relative Entropy (RE), Information Gain (IG), Average Cumulative Hydrophobicity (ACH), Com-position, Transition and Distribution (CTD), Sequence Order Coupling Numbers (SOCN), QuasiSequence Order (QSO), Sequence Features (SF), Overlapping Properties (OP), Pseudo-Amino AcidComposition (PseAAC), and Amphiphilic Pseudo-Amino Acid Composition (APseAAC). The sixpredictive models were assessed based on accuracy, sensitivity, specificity, MCC, and F1. Althoughthe random forest model was found to have higher accuracy and specificity, gradient boosting withan interaction depth of 5 was found to have the higher sensitivity, MCC, and F1. A balanceddataset was also considered and measures evaluated for each model.

Incorporation of Body Composition Variations into a Physiologically-BasedPharmacokinetic Model of Xylene: Physiologically based pharmacokinetic (PBPK) models are ordinary differential equation modelsthat have been used to estimate the internal dosages for toxicants in the body. One such toxicantis xylene, a hydrocarbon that comes in the form of a liquid or vapor and has a wide variety of uses,most commonly in paint and paint-thinner in the industrial setting. While any long exposure toxylene is harmful (inhalation, ingestion, dermal contact, etc), the inhalation of the substance is oneof the most dangerous. Inhalation of xylene may cause dizziness, headache, nausea, and vomiting.The current study uses a pre-existing PBPK model to investigate how xylene is distributed todifferent compartments of the body. While the PBPK model was originally parameterized for theaverage male body, we account for the variety of human bodies, examining how different bodycompositions may affect the concentration of xylene in the different compartments. To betterinvestigate this, body volume parameters are altered and the addition of body height parametersare considered. Fat percentages are altered as well to describe different body types during theresearch.

A Quantitative Investigation of Preventative Measures for COVID-19: Coronaviruses are RNA viruses that cause respiratory infections. In late 2019 and early 2020, anovel coronavirus, SARS-CoV-2 or COVID-19, was discovered and caused an outbreak in Wuhan,China. Since, cases of COVID-19 have spread worldwide, causing a global pandemic. Therefore, itis important to understand, predict, and prevent the spread of COVID-19. A mathematical modelis used to divide an arbitrary population into susceptible, exposed, infected, quarantined, andrecovered populations. The model shows the progression of the virus among these populations andcan be used to determine the total infections caused by COVID-19, the rapidity of the outbreak, andthe reproductive number. This model can be further used to investigate the efficacy of measures taken to prevent the spread of infection such as vaccination, social distancing, and masks. To nvestigate such measures, the model is modified. For vaccination, a vaccinated subset of the population is added. For social distancing and masks, the model adds susceptible, exposed, and infected populations who are social distancing or wearing masks. Investigation of these preventative measures also included the addition of new parameters or changes to existing parameter values within the model. For vaccination, parameters representing vaccination rate and efficacy of the vaccine were added. For social distancing and mask usage, the contact rate and transmission rate, respectively, were adjusted. By examining the effect of these parameters on the total infections, rapidity of the outbreak, and reproductive number, the efficacy of each preventive measure can be determined. Then, a suggestion can be made to help mitigate the real-world destruction of COVID-19.

Physiologically Based Pharmacoketic (PBPK) Modeling of Per- andPolyfluoroalkyl Substances (PFASs): Per- and polyfluoroalkyl substances (PFASs) are a group of persistent manufacturing byproducts.Studies by several state environmental agencies have found PFASs in water sources, raising signifi-cant concern for human safety. This is a result of the chemicals’ ability to stay in the body for longperiods of time and their resistance to chemical and thermal breakdown. In lab animals, PFASshave been shown to cause tumors as well as have reproductive, developmental, and immunologicaleffects. PFAS research with regard to humans is incomplete, yet existing findings are notable. Inthis project, we use physiologically-based pharmacokinetic (PBPK) modeling to represent the flowof PFASs through the body. PBPK modeling shows the concentration of a toxicant in differenttissues and organs in the body, called compartments. Compartments include the fat, brain, lungs,gut, and liver. Equation parameters include the rate of blood flow to and from each compartment;partition coeffcients, which quantify the difficulty of passing from blood to each compartment; andmetabolism coefficients. The final model is a system of differential equations, each representing therate of change of the toxicant within one compartment. Each compartment uses a modified versionof a general equation to model the toxicant, based initially on ’flow in’ minus ‘flow out’. We first replicate existing published data using MATLAB, to test the general structure and coding of PFASsin PBPK modeling. Existing parameter values also come primarily from published data. We then compare our simulated data to experimental observations, resulting in a qualitative assessment of leading models.

Projects from Summer 2021 REU Program

Bursting Oscillations: Certain neurons and other excitable cells exhibit much more complicated firing patterns than simple repetitive firing. A common mode is bursting, a dynamic state where a neuron repeatedly fires bursts of spikes, each of which is followed by a period of quiescence before the next burst occurs. Neuronal bursting can play important roles in communication between neurons. In particular, bursting neurons are important for pattern generation and synchronization. Bursting activity in Aplysia R-15 neuron in pancreas, for example, is implicated in the generation of insulin secretion of pancreatic beta cells whereas certain thalamic cells is implicated in the generation of sleep rhythms. Patients with parkinsonian tremor exhibit increased bursting activity in neurons within the basal ganglia and cells involved in the generation of respiratory rhythms within the pre-Botzinger complex also display bursting oscillations.

In this project, we consider three fundamental types of  bursting oscillations, square-wave bursting, elliptic bursting, and parabolic bursting (G.B. Ermentrout and D. H. Terman, Mathematical Foundations of Neuroscience, Springer IAM 35, 2010).  Models for bursting typically involve multiple timescales. We use geometric dynamical systems methods in determining what sorts of solutions may arise and how the solutions depend on parameters.

Using Smoothing Splines to Investigate Dendrochronology: Dendrochronologists use tree-rings to understand past climate conditions. Because each tree-ring corresponds to one year of the tree’s life, the width of the ring should vary based on environmental factors like precipitation or soil moisture from that particular year. Naturally, there is a lot of “noise” in this data. One of the more popular techniques to remove this high-frequency information is based on cubic smoothing splines, which can be thought of as a more flexible regression. The flexibility of the spline is controlled by a value called the smoothing parameter. The popular dendrochronological implementation of smoothing splines, though, is theoretically complex and thus difficult to choose a smoothing parameter. Their unique approach to selecting a smoothing parameter differs from the standard spline procedures in other disciplines such as statistics. There is a vast wealth of knowledge on smoothing splines in statistics (e.g., using cross validation to find smoothing parameters), and this information is not currently being utilized in dendrochronology. Our primary goal will be to find a connection (either theoretical or numerical) between the common smoothing spline methods and the dendrochronology method. If successful, we will be able to propose to the dendrochronology field a simpler, better understood methodology that still achieves the same goals. This will bypass the difficulty that the current tree-ring procedure presents. If we cannot find a connection, we will explore why a connection does not exist and some ways to simplify the current approach.

Quantitative Investigations of the COVID-19 Pandemic: In 2019, a novel coronavirus (Covid-19) was discovered in Wuhan, China.  Covid-19 has been declared a global pandemic by the World Health Organization and has affected nearly every country in the world.  In this project, students will study the basic model structure that is used for predictions in such epidemics.  Students will modify and apply mathematical models to investigate different aspects of the current Covid-19 pandemic or other potential outbreaks.  Project investigations may include vaccination efficacy requirements, parameter identification (based on available data), comparisons of covid-19 responses of different countries, comparisons of different diseases, or intervention strategies.

Mathematical Modeling of Gender Differences in ADME properties (absorption, distribution, metabolism, excretion) of emerging PFASs: Per- and polyfluoroalkyl substances (PFASs) are a global research priority because they have been detected in the drinking water of millions of people and are widespread in the blood of the general human population.  Experimental data indicate that there is a difference in how PFASs are distributed throughout the body of exposed rodents, based on the gender of the animal.  In this project, students will utilize Physiologically-Based Pharmacokinetic (PBPK) or Physiologically-Based Pharmacodynamic (PBPD) models to investigate and potentially predict how PFASs are distributed through the body.  Models will be investigated to determine which parameters would lead to the observed gender differences.

Impacts of Variable Selection on Predictive Modeling: Statistical or machine learning algorithms are commonly used to build a predictive model. In the era of Big Data, predictive models often encounter a large p, small n problem which is the situation that a large number of predictor (p) is greater than the number of observations (n). The high dimensional predictors cause significant problems in predictive modeling. First, modeling is computationally infeasible because of the nonexistence of inverse matrix, which may not allow a numerical optimization. Second, the inclusion of non-informative predictors undermines the prediction performance. Machine learning algorithms suffer from this large p small n problem, and they incorporate variable selection (or feature selection) techniques. This large p small n problem has been successfully addressed using penalized regression methods such as LASSO, ENET, SCAD and MCP. These regression methods incorporate various mathematical constraints on the coefficients of predictors to ensure the selected predictors less than the number of observations. This project aims to adopt the penalized regression methods for variable selection in predictive modeling. These methods will be compared to random forest-based variable selection methods using simulation studies and empirical data analysis. The empirical studies include the prediction of a disease status and cyber-bullying status.

Projects from the 2022 Summer REU Program

#1 Epidemiological Models Applied to Multi-Pen Pig Farms: Pig farms have played a central role in historical outbreaks and form an important part of our food landscape. Recent reports indicate concern for the way pigs are kept in factory pig farms. In this project, we will review necessary literature related to the pig farming industry as well as the prevalence of infection disease in pig farms [2]. We will use a Matlab ODE45 model with discrete time rearrangement to model a multi-pen pig farm. Our approach is inspired by the article [3] where a multi-pen farm was used for gilts, gestating sows, and nursing sows with piglets. We begin by investigating results from a three part SIR model for two pens.  The model incorporates pigs in the main pen, nursing sows in the piglet pen, and piglets.

We will begin with an SIR model and then alter it to an SEIR model.  Finally, depending upon our investigation and time considerations, we will consider other configurations, reinfection, vaccination strategies, parameter estimation, and sensitivity analysis.


[1] Dobie, A. P., Demirci, A., Bilge, A. H., & Ahmetolan, S. (2019). On the time shift phenomena in epidemic models. arXiv preprint arXiv:1909.11317.

[2] Lowe, J., & Johnson, E. (2007). Alternative flow strategies in sow farms.

[3] Reynolds JJH, Torremorell M, Craft ME (2014) Mathematical Modeling of Influenza A Virus Dynamics within Swine Farms and the Effects of Vaccination. PLoS ONE 9(8): e106177. doi:10.1371/journal.pone.0106177

#2 A study of the motion of the Jellyfish and material transport in their vicinity: Jellyfish swim by inflating and deflating their bells. This unique swimming style not only helps them move around in the ocean but also plays a crucial role for them to obtain food or other resources.
   As Jellyfish are inflating, their motion is slow, the fluid flow is transported into the bell, an axisymmetric vortex ring is formed at the edge of the bell where tentacles are present. Next, as the Jellyfish is deflating, fluid flow is pushed out of the bell promptly, a new and similar axisymmetric vortex ring is formed but of opposite sign, and the Jellyfish is lifted upward. How the Jellyfish maneuver in the ocean by doing such simple and repeating motion has been explored by many researchers in the community of fluid dynamics.
   In this project, we are going to study the mechanism of the Jellyfish swimming numerically. We will start with a two-dimensional model. The rotating fluid flow near the edge of the bell is modeled using point vortices. The evolution of these point vortices is shown to elucidate the role of the vortex interactions on the Jellyfish motion. Various locations and strengths of these point vortices will be examined. Besides, passive particles that travel at the fluid velocity are placed in the fluid flow to track the transport. Our study will explore the propulsive advantage of Jellyfish in swimming and its important implications for bio-engineered propulsion systems.

#3: Quantitative Investigations of the Covid-19 Pandemic and Other Infectious Diseases: In 2019, a novel coronavirus (Covid-19) was discovered in Wuhan, China.  Covid-19 was declared a global pandemic by the World Health Organization, and many interventions were used to try to control the spread of the virus.  Strategies to control the outbreak included mask wearing, social distancing, and increased use of disinfectants.  These strategies likely also affected the spread of other infectious diseases, such as influenza.  In this project, students will study the basic model structure that is used for predictions of the spread of infectious diseases.  Students will modify and apply mathematical models to investigate different how Covid-19 interventions may have also impacted the spread of other diseases.  

#4: A Quantitative Investigation of the Effect of Body Disposition on the ADME properties (absorption, distribution, metabolism, excretion) of emerging PFASs: Per- and polyfluoroalkyl substances (PFASs) are a global research priority because they have been detected in the drinking water of millions of people and are widespread in the blood of the general human population.  Recent studies data indicate that there is a difference in how PFASs are distributed throughout an animal based on its body condition.  In this project, students will utilize Physiologically-Based Pharmacokinetic (PBPK) or Physiologically-Based Pharmacodynamic (PBPD) models to investigate and potentially predict how PFASs are distributed through the body of exposed animals.  Models will be investigated to determine which parameters would lead to the observed differences based on body condition..

#5: Mathematical Modeling of Bipolar Disorder:  Bipolar II disorder is characterized by alternating hypomanic and depressive episodes and afflicts about 1% of the United States adult population.  In this project, participants will investigate and compare different mathematical models of bipolar disorder.  After establishing the baseline model(s), we will investigate the sensitivity of the parameters, the stability of the system, and possibly modify the models to incorporate treatment.