SEMIPARAMETRIC MODELS
In the framework of supervised classification and regression problems, one of the most common approach are semiparametric models. These are standard regression tools formed by a parametric and a nonparametric component. A new class of models, named Generalized Additive Multi-Model (GAM-M), has been introduced as an integration of different approaches (parametric as well as semiparametric). This approach consistently improved the model goodness of fit. It extends the Generalized Additive Models (GAM) framework of Hastie and Tibshirani combining in an unified formulation smoothing functions with alternative approaches, such as for example decision trees. The results showed an overperformance of this approach in comparison to GAM, and the benefits deriving from the use of this procedure instead of alternative approaches (multiple regression, decision trees, etc.) have also been discussed. The GAM-M estimation procedure is adaptive, being based on two different steps: in the first one, the most appropriate model/smoother is assigned to each predictor according to a goodness of fit criterion. Then, an iterative backfitting-like procedure is considered, allowing to update the partial residuals until convergence. The case of a categorical response is also covered, by considering an outer loop allowing to relate the additive combination of models/smoothers to the response classes by means of a link function introducing a system of weights. Concerning the stability of the obtained results, an experimental studies based on bootstrap and cross-validation have been also provided.
Apart from the classical definition, GAM-M has been formulated in other two ways. Firstly, we integrated the standard CART-like recursive partitioning procedure of Breiman et. al with GAM. This allowed to solve one of the main problems concerning the correct identifiability of a semiparametric model, namely the definition of an ordering entrance for the predictors in the model as well as of the optimal smoothing parameters taking into account , for the latter, of the dependence relationship between the observations. The results confirmed the effectiveness of this integrated strategy. The second formulation of GAM-MM is based on the correct identification of semiparametric models for homogeneous sub-populations, and is part of a well known research framework (the definition of nested sequences of models), that results very effective when dealing with huge datasets.
|