MISSING DATA IMPUTATION
Concerning missing data imputation, we propose an iterative use of tree based classifiers for missing data imputation in large data bases.
The imputation method derives from an appropriate pre-processing of the original dataset, that uses lexicographic order to rank missing values occurring in different variables. Once that data have been suitably ranked and coded, the imputation process deals with these incrementally, i.e, augmenting the data by the previously filled in records according to the defined order.
This imputation process is incremental because, as it goes on, more and more information is added to the data matrix, both respect the rows and the columns. Furthermore, the imputation is also conditional because, in the joint imputation of multiple inputs, the subsequent imputations are conditioned on previously filled-in inputs.
|