Abstract:
Time Series modeling frequently involves economic data. These data tend to have abrupt changes in its trend. These changes may occur due to economic recession, epidemic outbreak, people power revolution, or other political, social, economic, or natural events. An important tool for evaluating these changes is a new method known as Spline Regression Analysis. It is a method in regression analysis in which the independent variable is partitioned into intervals and separate regression lines are fit within intervals (segments) that join at the knots. The knots (breakpoints) can be interpreted as critical, safe, or threshold values beyond or below which undesired effects occur. The breakpoint can be important in decision making.
This study investigates the efficiency of Spline Regression as a new method in time series modeling, when there is an abrupt change in the trend of the data. It is very useful to researchers especially to those who are dealing with economic studies in their analysis and forecasting. The study shows that the RMSE and AIC values of the Linear Spline Regression methods for the original data of the Peso Per U.S. Dollar Rate decreases as the number of knots increases. It also shows that the Linear Spline models with more knots have better fit to the data compared to the Linear Spline models with few knots. But it is evident that as more knots added, the model becomes complicated due to additional terms. On the other hand, the RMSE and AIC values of the Quadratic Spline Regression for original data decreases until three knots, and increases after it. This means that the Quadratic Spline model with three knots has the best fit to the data. Spline Regression methods applied for smoothed data have smaller RMSE and AIC values as compared to the original data.The asymptotic efficiency of the Spline Regression models was also investigated using the Block Bootstrapping method.
From the findings of the study, it is recommended that the investigation of Spline Regression modeling for time series data with seasonal variations be done. Moreover, the efficiency of Polynomial Spline Regression of higher order is also worth studying for time series data exhibiting trend. A Monte Carlo simulation of the Peso Per U.S. Dollar Rate is also recommended for the analysis of asymptotic behavior on the efficiency of the estimates.

Abstract:
We explore variable selection for partially linear models when the covariates are measured with additive errors.
We propose two classes of variable selection procedures, penalized least squares and penalized quantile regression,
using the nonconvex penalized principle. The first procedure corrects the bias in the loss function caused by
the measurement error by applying the so-called correction-for-attenuation approach, whereas the second procedure
corrects the bias by using orthogonal residuals. The sampling properties for the two procedures are investigated.
The rate of convergence and the asymptotic normality of the resulting estimates are established. We further demonstrate that,
with proper choices of the penalty functions and the regularization parameter,
the resulting estimates perform asymptotically as well as an oracle procedure.
Choice of smoothing parameters is also discussed. Finite sample performance of the proposed
variable selection procedures is assessed by Monte Carlo simulation studies.
We further illustrate the proposed procedures by an application.

Abstract:
Improved procedures, in terms of smaller missed discovery rates
(MDR), for performing multiple hypotheses testing with weak and
strong control of the family-wise error rate (FWER) or the false
discovery rate (FDR) will be presented in this talk. Improvement over
existing procedures, such as the Sidak procedure for FWER control and
the Benjamini-Hochberg (BH) procedure for FDR control, is achieved
by exploiting differences in powers of individual tests. Results signal
the need to take into account the powers of the individual tests and to
have multiple hypotheses decision functions which are not limited to
simply using the individual p-values, as is the case for example with
the Sidak, Bonferroni, or BH procedures. A decision-theoretic frame-
work is utilized, and through auxiliary randomizers the procedures
could be used with discrete or mixed-type data or with rank-based
nonparametric tests. This is in contrast to existing p-value based pro-
cedures whose theoretical validity is contingent on the uniformity of
the p-value statistic under the null hypothesis. Proposed procedures
are relevant in the analysis of high-dimensional "large M, small n"
data sets arising in the natural, physical, medical, economic, and social
sciences, whose generation and creation is accelerated by advances
in high-throughput technology, notably, but not limited to, microarray
technology. (This is joint work with Joshua Habiger and Wensong
Wu.)

Abstract:
In longitudinal studies with a potentially large number of
covariates, investigators are often interested in identifying
important variables that are predictive of the response. Suppose we
can a priori divide the covariates into two groups: one for which
parametric effects are adequate and the other for which
nonparametric modeling is required. In this research, we propose a new
method to simultaneously select important parametric covariate effects and
nonparametric covariate effects in partial additive mixed models for
longitudinal data. The proposed method takes the advantage of mixed effect
representation of the smoothing spline estimates of the nonparametric
covariate effects and treats the inverse of a smoothing parameter as a
variance component in an induced working linear mixed model. The selection
of fixed parameter effects and nonparametric effects is achieved by
shrinking negligible fixed effects and induced variance components to zero.
Simulation studies are conducted to evaluate the performance
of the new method and a real data analysis is used to illustrate its
application.

Abstract:

Abstract:
In this paper, I build a dynamic trade-off model of financing with difference in beliefs be-
tween the manager and investors. In the model, investors update more readily on earnings
announcements than the manager does. The model offers a parsimonious treatment of endoge-
nous financing, payout, and cash policies. The model generates a broad set of well-documented
empirical facts that are difficult to explain using standard theories. In particular, the model
predicts: 1) high stock returns predicting equity issuance, 2) the low debt ratios of firms in cross-
section, 3) the substantial presence of firms with no debt or negative net debt and the fact that
zero-debt firms are more profitable, pay larger dividends, and keep higher cash balances than
other firms, and 4) the negative relationship between profitability and both book and market
leverage ratios. If investors overextrapolate trends in earnings growth, the model also predicts
the negative/positive long-run abnormal returns following stock issuances/repurchases.

Abstract:

Abstract:
Suppose we have N individuals and for each individual we observed its
response vector variable (Yi1,..., Yiq) and its p-dimensional categorical-valued
covariates (Xi1,..., Xip). Our goal is to discover which subset of the response
variables is influenced by which subset of the covariates. Although the
problem is similar to the multiple-response regression, our goal is much
more ambitious than just finding certain linear relationships.
I will present a novel Bayesian partition model through the use of
a set of latent indicator vectors, together with a Markov chain Monte
Carlo algorithm, to tackle the problem. I will illustrate the power of
the method mainly using examples in genome-wide genetic association studies
and in studies of expression quantitative trait loci.

Abstract:
In this talk we consider a problem from bone marrow transplant (BMT)
studies where there is interest on assessing the effect of haplotype
match for donor and patient on the overall survival. The BMT study we
consider is based on donors and patients that are genotype matched, and
this therefore leads to a missing data problem. We show how the Aalen's
additive risk model can be applied in this setting with the benefit that
the time-varying haplo-match effect can be easily studied. This problem
has not been considered before, and the standard approach where one
would use the EM-algorithm cannot be applied for this model because the
likelihood is hard to evaluate without additional assumptions. We
suggest an approach based on multivariate estimating equations that are
solved using a recursive structure. This approach leads to an estimator
where the large sample properties can be developed using
product-integration theory. Small sample properties are investigated
using simulations in a setting that mimics the motivating haplo-match
problem.

Abstract:
A penalized polynomial spline method will be introduced for
simultaneous model estimation and variable selection in additive models.
The proposed method approximates the nonparametric functions by polynomial
splines, and minimizes the sum of squared errors subject to an additive
penalty on norms of spline functions. This approach sets estimators of
certain function components to exactly zero, thus performing variable
selection. Under mild regularity conditions, I show that the proposed
method estimates the non-zero function components in the model with the
same optimal mean square convergence rate as the standard polynomial
spline estimators, and correctly set the zero function components to zero
with probability approaching one, as sample size goes to infinity. The
theoretical results are well supported by simulation studies. The proposed
method is also applied to two real data examples for illustration.

Abstract:
With competing risks failure time data, one often needs to assess the covariate
effects on the cumulative incidence probabilities. Fine and Gray proposed a proportional regression model to directly model the subdistribution of a competing risk
and developed some estimating procedures based on inverse probability of censoring
weighting for right censored only competing risks data. Right censored and left truncated competing risks data sometimes occur in biomedical researches. In this paper,
we study the proportional hazards regression model for the subdistribution of a com-
peting risk with right censored and left truncated data. We adopt a new weighting
technique to estimate the parameters in this model. We have derived the large sample properties of the proposed estimators. To illustrate the application of the new
method, we analyze the failure time data for the children with acute leukemia. In
this example, the failure times for the children who had bone marrow transplants
were left truncated.

Abstract:
In family studies, canonical discriminant analysis can be
used to find linear combinations of phenotypes that exhibit high ratios of between-family to within-family variabilities. But with large numbers of phenotypes, canonical discriminant analysis may over-fit. To estimate the predicted ratios associated with the coefficients obtained from canonical discriminant analysis, two methods are developed; one is based on bias correction and the other based on cross-validation. Because the cross-validation is computationally intensive, an approximation to the cross-validation is also developed. Furthermore, these methods can be applied to perform variable selection in canonical discriminant analysis.

Abstract:
Longevity risk constitutes an important risk factor for insurance companies and pension plans. For its analysis, but also for evaluating mortality-contingent structured financial products, modeling approaches allowing for uncertainties in mortality projections are needed.
One model class that has attracted interest in applied research as well as among practitioners are forward mortality models, which are defined based on forecasts of survival probabilities as can be found in generation life tables and infer dynamics on the entire age/term-structure ? or forward surface ? of mortality. However, thus far, there has been little guidance on identifying suitable specifications and their properties.
The current paper provides a detailed analysis of forward mortality models driven by a finite-dimensional Brownian motion. In particular, after discussing basic properties, we present an infinite- dimensional formulation, and we examine the existence of finite-dimensional realizations for time- homogenous Gaussian forward models, which are shown to possess important advantages for practical applications.

Abstract:
If viewed realistically, models under consideration are always false. A consequence of model falseness is that for every data generating mechanism, there exists a sample size at which the model failure will become obvious. There are occasions when one will still want to use a false model, provided that it gives a parsimonious and powerful description of the generating mechanism. We introduced a model credibility index, from the point of view that the model is false. The model credibility index is defined as the maximum sample size at which samples from the model and those from the true data generating mechanism are nearly indistinguishable. Estimating the model credibility index is under the framework of subsampling, where a large data set is treated as our population, subsamples are generated from the population and compared with the model using various sample sizes. Exploring the asymptotic properties of the model credibility index is associated with the problem of estimating variance of U statistics. An unbiased estimator and a simple fix-up are proposed to estimate the U statistic variance.

Abstract: Recently functional data analysis has received considerable attention in statistics research and a number of successful applications have been reported, but there has been no results on the inference of the global shape of the mean regression curve. In this paper, asymptotically simultaneous confidence band is obtained for the mean trajectory curve based on sparse longitudinal data, using piecewise constant spline estimation. Simulation experiments corroborate the asymptotic theory.

Abstract:
In classification, semi-supervised learning occurs when a large amount of unlabeled data
is available with only a small number of labeled data. This imposes a great challenge in that it
is difficult to achieve good classiffication performance through labeled data alone. To
leverage unlabeled data for enhancing classification, we introduces a margin based semisupervised learning
method within the framework of regularization, based on an efficient margin loss for unlabeled data, which seeks
efficient extraction of the information from unlabeled data for estimating the Bayes rule for classiffication.
In particular, I will discuss three aspects: (1) the idea and methodology development; (2)
computational tools; (3) a statistical learning theory.
Numerical examples will be provided to demonstrate the advantage of our proposed methodology
against other existing competitors. An application to gene function prediction will be discussed.

Abstract:
The intraclass correlation coefficient (ICC) rho is widely used to measure the degree of
family resemblance with respect to characteristics such as blood pressure, weight and height, etc.
In this talk the author will discuss several statistical problems regarding ICCs. Especially,
the author will present several resampling methods for computing the confidence intervals for
the common ICC and testing the homogeneity of ICCs for several populations.
The author will also propose a few research topics regarding ICCs.

Abstract:
In medical imaging analysis and computer vision, there is a growing interest in analyzing various manifold-valued data including 3D rotations, planar shapes, oriented or directed directions, the Grassmann manifold, deformation field, symmetric positive definite (SPD) matrices and medial shape representations (m-rep) of subcortical structures. Particularly, the scientific interests of most population studies focus on establishing the associations between a set of covariates (e.g., diagnostic status, age, and gender) and manifold-valued data for characterizing brain structure and shape differences, thus requiring a regression modeling framework for manifold-valued data. The aim of this talk is to develop an intrinsic regression model for the analysis of manifold-valued data as responses in a Riemannian manifold and their associations with a set of covariates, such as age and gender, in Euclidean space. Because manifold-valued data do not form a vector space, directly applying classical multivariate regression may be inadequate in establishing the relationship between manifold-valued data and covariates of interest, such as age and gender, in real applications. Our intrinsic regression model, which is a semiparametric model, uses a link function to map from the Euclidean space of covariates to the Riemannian manifold of manifold data. We develop an estimation procedure to calculate an intrinsic least square estimator and establish its limiting distribution. We develop score statistics to test linear hypotheses on unknown parameters. We apply our methods to the detection of the difference in the morphological changes of the left and right hippocampi between schizophrenia patients and healthy controls using medial shape description.

Abstract:
Value-at-Risk is a simple, but useful measure in risk management.
When some volatility model is employed, conditional Value-at-Risk is of importance.
As ARCH/GARCH models are widely used in modeling volatilities, in this talk,
we first propose empirical likelihood methods to construct confidence intervals for
the conditional Value-at-Risk with the volatility model being an ARCH/GARCH model.
We further consider an empirical likelihood-based estimation
of the conditional Value-at-Risk in the nonparametric regression model.

Abstract:
Most association study methods become either ineffective or inefficient when dealing with increasing numbers of SNPs. Suggested by the block-like
structure of the human genome, a popular strategy is to use haplotypes to try to capture the correlation structure of SNPs in regions of little recombination.
This haplotype based association study would have significantly reduced degrees of freedom and be able to capture the combined effects of tightly linked causal variants. An efficient rule-based algorithm is presented for haplotype inference from pedigree genotype data, with the assumption of no recombination.
This zero-recombination haplotyping algorithm is extended to a maximum parsimoniously haplotyping algorithm in one whole genome scan to minimize the total number of breakpoint sites. We show that such a whole genome scan haplotyping algorithm can be implemented in O(m3n3) time in a novel incremental fashion, here m denotes the total number of SNP loci on the chromosome. Extensive simulation experiments using eight pedigree structures that were used previously for association studies showed that the haplotype allele sharing status among the members can be deterministically, efficiently, and accurately determined, even for very small pedigrees.

Abstract:
For model selection in mixed effects models, Vaida and Blanchard (2005) demonstrated that the marginal Akaike
information criterion is appropriate as to the questions regarding the population and the conditional Akaike
information criterion is appropriate as to the questions regarding the particular clusters in the data. This paper
shows that the marginal Akaike information criterion is asymptotically equivalent to the leave-one-cluster-out
cross-validation and the conditional Akaike information criterion is asymptotically equivalent to the
leave-one-observation-out cross-validation.

Abstract:
Discussed are exact one-stage and group-sequential sample size determination methods for one- and two-sample binomial proportions testing problems, methods for the corresponding finite population tests, and simultaneous tests for correlated binomial proportions. Design properties are discussed and new/unpublished results are described. The exact group sequential methods allow early stops only for efficacy or only for futility or for either efficacy or futility. Sample sizes, levels of significance and power at fixed points in the research hypothesis parameter space are compared among competing designs including those derived using asymptotic normal theory methods. Documents provided will include
a description of how sample points are placed in the rejection region,
simple proofs for each of the 3 one-sample theorems,
tables demonstrating the efficiency of the two-sample designs,
a table showing how close the one-sample designs can get to the one-stage uniformly most powerful test in terms of significance and power,
a table demonstrating the remarkable sample size savings if two or more binomial endpoints are tested simultaneously.

Abstract:
Quantile regression is a very useful statistical tool to learn the
relationship between the response variable and covariates. For many
applications, one often needs to estimate multiple conditional quantile
functions of the response variable given covariates. Although one can
estimate multiple quantiles separately, it is of great interest
to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain betterestimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility to incorporate noncrossing constraints of quantile regression
functions. In this talk, I will present a new multiple noncrossing
quantile regression estimation technique. Both asymptotic properties and
finite sample performance will be presented to illustrate usefulness of
the proposed method.

Abstract:
We propose a penalized orthogonal-components regression
(POCRE) for large p small n data. Orthogonal components are sequentially constructed to maximize, upon standardization, their correlation to the response residuals. A new penalization framework, implemented via empirical Bayes thresholding, is presented to effectively identify sparse predictors of each component. POCRE is computationally efficient owing to its sequential construction of leading sparse principal components. In addition, such construction offers other properties such as grouping highly correlated predictors and allowing for collinear or nearly collinear predictors. With multivariate responses, POCRE can construct common components and thus build up latent-variable models for large p small n data. This is joint work with Yanzhu Lin and Min Zhang.

Abstract:
Penalized regression has been widely used in high-dimensional data
analysis. Much recent work has been
done on the study of penalized least squares methods. In this talk, I
will first introduce the application of penalized LAD
methods in detecting copy number variations. I will then discuss some
theoretical properties of penalized LAD methods in high-dimensional
settings. The finite sample performance of proposed methods are
demonstrated by simulation studies.