Simulation of data
To illustrate usage of Zebu, data simulated using a bayesian network will be used. The example presented in figure 1 is borrowed from Pearl [25]. Five thousand cases with 5% of missing data was simulated using SamIan. To illustrate how Zebu handles continuous variables, calcium concentrations were simulated. ”Not increased” cases were randomly sampled from a normal distribution with mean 2.4 and standard deviation 0.05. ”Increased” cases were randomly sampled from an negative expo- nential distribution with rate 10 to which the value 2.6 was added. An identification variable for each case was also added.
Analysis of data
The first step is importing the dataset in the ”Import file” panel. Once this has been achieved, …show more content…
Zebu provides a user-friendly manner to compute both global and local multivariate association measures. This software can be used by scientists and clinicians with no former knowledge of programming. Furthermore, we have presented a method to define subgroups allowing to characterize the population in which the global association exists.
Local association measures offer a different view on what association is by break- ing the myth that it has the same strength for all modalities of variables. This makes theses measures particularly adapted to describe and model discontinuous relation- ships which are far from being an exception, notably in biology [26]. In terms of quantity, their use is however still limited in the scientific literature. Nonetheless, the diversity of fields interested in these measures show us that they are of in- terest. Indeed applications are found in computation linguistics, image processing, cardiology, pharcokinetics and …show more content…
Although these have proven their interest in diverse applications, theoretical studies of their math- ematical properties are sparce. For example, only Monte Carlo simulations of Ducher’s Z behavior are available [27]. A more theoretical approach to these mea- sures could be of interest. Moreover, improvements in Zebu are also possible. The first concerns discretization, a necessary step for continuous variables. We have restrained ourselves to very simple discretization methods: equal-width and user- defined. Other discretization algorithms exist [28] and may be more adapted for computation of association measures. These will have to be considered in future versions of Zebu. Furthermore, the bootstrap function in Zebu is based on an it- erative procedure. These are particularly slow in R. To speed this up, writing the bootstrap function in C or Fortran and calling it from R could be a reliable solution. Finally, Zebu has been conceived for people with no programming knowledge. How- ever, a R package for more experienced users could be of use for more sophisticated