Network Analysis
The Two Dimensional Anisotropic Ising Model as Paradigm for Repeat Protein Folding Behavior
Yale Univ., Fall - Spring 2007, Physics Intensive Senior Thesis,
advised by Simon Mochrie (
Group Information)
TPR protein structure is only a function of
nearest-neighbor helix interaction. Accordingly, we look to
the analogous nearest-neighbor Ising model to quantify the
folding behavior of TPR proteins. Previous research has
demonstrated the efficacy of a one-dimensional model in which
the spin state (±1) corresponds with the folding
behavior, where spin-up corresponds to a folded helix and
spin-down to an unfolded helix.
This paper explores the possibility of extending previous
work into a second dimension. Currently, the one-dimensional
model cannot take into account the fact that each helix in the
repeat structure can be simultaneously folded and unfolded. By
allowing each amino acid comprising the helices to assume its
own spin state, then it is easy to account for this. A two
dimensional model would also extend the biological relevance
of this computational paradigm for studying protein
folding.
Statistical Prediction
What Makes a Nobel Prize Winner in Physics? Classification on the Citation Network:
Yale Univ., Spring 2007
The Physics citation network indicates the relationship
between scientific publications. The bibliographic data can
be viewed as a graph, with papers as nodes, and citations as
directed edges from one paper to another. Given a chunk of
the citation network, we can use measurements (number of
publications, number of citations, h-index, etc.) to define an
author's status in the scientific community. Classification
techniques are then used in an attempt to identify the Nobel
Prize winning authors. Out-of-sample cross validation error
indicates that the most successful learning method, a linear
kernel support vector machine, will misclassify approximately
37% of the authors.
Cluster Analysis
Skeletonization Via a Biologically Motivated Data-Driven Process in Digital Binary Shapes:
Yale Univ., Spring 2007
The recognition of object skeletons allow complicated
object recognition algorithms to work on smaller input data.
This paper proposes a novel technique that uses a
self-organizing feature map in order to find these object
skeletons. Because self-organizing feature maps preserve
topology, we train the network with object coordinates. By
imposing links between neurons, which we selectively delete
over the network convergence phase, we show how to devise a
skeleton from a self-organizing feature map. This technique
requires no a priori object-identification, and may be
performed on noisy image data. This new model is both
biologically relevant and computationally efficient.
Sociological Modeling
Urban Sprawl - Modeling the Morphology of US Cities:
Yale Univ., Spring 2007, Applied Mathematics Senior Thesis,
advised by
Daniel Spielman
This thesis extends a long tradition of research within the
urban studies and economics communities. The belief that some
simple forms may underlie the very nature of a city's
structure inherits directly from this discourse, whose
standard urban model suggests that cities are generally radial
with an exponential decay in population density from the city
center. Though this model is not always accurate, it has been
shown to hold for many cities. More importantly, even the
existence of this model allows for investigations into which
kinds of cities follow the observed patterns (and which do
not) and why.
We explore the vast effort that has been put forth both to
challenge and to defend the standard urban model, and
reconstitute the work of some other theorists to put forth
alternative measures that might characterize city
morphology. We take these concepts to derive a set of metrics
for a given city, using these metrics to analyze over 150 of
the cities of largest population in the United States, with
geodata provided by the 2000 Census.
Three stochastic models of city morphology are discussed at
length and then analyzed according to the defined metrics.
The first two originate in the literature: diffusion limied
aggregation assumes that households settle at the location
where a Brownian motion walk started from a distance reaches
the city's frontier; correlated site percolation meanwhile
assumes that deposit according to fluid flow on a regular
lattice with some autocorrelation. The third method proposed
by the the authors is to define a city as the connected
component of a slightly sub-critical bond percolation process.
What may be the most important feature of our bond percolation
model is the ease by which it may altered to incorporate
geographical constraints upon the city generation process, a
consideration not well explored previously.
To answer the question of which model is best, we take as
training observations the simulated cities that each model has
produced, and we perform a data classification example on the
test set of cities as given by US Census data. If a sizeable
majority of actual cities are classified to one the group of
cities constructed by a given model, then we can say (albeit
with some reservations) that this model is better at producing
real results.