ATTENTION: This is an early version of the Blankets Joint Posterior (BJP) scoring function for Markov network structure learning. Please do not hesitate to contact Federico Schluter in case you encounter any difficulties in using the software or find any errors. This package contains a Java implementation of the Blankets Joint Posterior scoring function for Markov network structure learning and the Hill climbing search methods used in its paper for optimizing it. For further details on BJP, please refer to: F. Schluter, Y. Strappa, F. Bromberg, and D. H. Milone. "Blankets Joint Posterior score for learning irregular Markov network structures" Technical Report, (2016). http://192.168.16.89/papers/bjp/fschluterEtAl.MBJP.pdf -------------------------------------------------------------------------------- CONTENT -------------------------------------------------------------------------------- I - The paper manuscript "Blankets Joint Posterior score for learning irregular Markov network structures". II - Source code and executable files for reproducing the experiments: The BJP-source-code-schluter-et.al-.zip file contains: * source/ The source code of BJP's Java implementation. * libraries/ The libraries used by BJP's Java implementation. * build.xml An Ant build file that provides tasks to compile the Java source code and to generate two executable JAR files from the resulting class files. * generate-synthetic-data.zip A Bash script to generate the synthetic data used in our experiments. * hill-climbing.jar The algorithm for optimizing the scoring functions for learning Markov networks (Blankets Joint posterior, IBMAP, and Marginal Pseudo-likelihood functions are available). * mpl-interIAMB-and-hill-climbing.jar The algorithm for optimizing the Marginal Pseudo Likelihood scoring function for Markov networks structure discover, as proposed in (Pensar et al, 2014), Section 5. * independence-based-algorithms.jar The competitor independence-based algorithms used in our experiments (PC, HITON-PC and GSMN algorithms). -------------------------------------------------------------------------------- MINIMUM REQUIREMENTS -------------------------------------------------------------------------------- * Java 1.7 JRE or newer to run TODTLER. * Java 1.7 JDK or newer to compile TODTLER's source code. * Apache Ant 1.7.0 or newer to execute the build file. -------------------------------------------------------------------------------- SETUP -------------------------------------------------------------------------------- Compile the Java source code in source/ as follows: ant jar (from the root directory) This command creates the executable JAR files. -------------------------------------------------------------------------------- USAGE -------------------------------------------------------------------------------- 1. To generate the synthetic datasets used in the paper, first unzip the generate-synthetic-data.zip file. Run the bash script generatedata.sh to generate the synthetic Markov networks distributions, and then calls the Libra toolkit (libra.cs.uoregon.edu) to run Gibbs sampler. 2. Run the hill-climbing.jar to learn the structure of a Markov network from data using some scoring function and the heuristic hill-climbing search method proposed. Example: java -jar hill-climbing.jar -sf bjp -dir data -dataset msnbc \ -numTests 2000 -resultsDir results -testSufix .test.data \ -trainSufix .ts.data -D 100 -useADTree true This command runs a heuristic hill-climbing search over the space of possible structures maximizing the Blankets Joint Posterior scoring function. Marginal Pseudo Log-likelihood and IBMAP scoring functions are also available by using -sf mpl, or -sf ibscore, respectively. The parameters are: -dir : Path to datasets folder . -sf : Scoring function used. Available values are: {bjp, ibscore, mpl} . -D: Number of data points to read from data . -datasetName: Name of dataset without sufixes. -trainSufix: Sufix of train dataset. It will learn the structure from the file in $dir/$datasetName$trainSufix . -testSufix: Sufix of test dataset. It will evaluate the structure from the file in $dir/$datasetName$testSufix . -resultsDir: Folder where the results file will be stored . -useADTree: boolean value specifying if ADTree cache is used to get better performance for contingency tables construction. Three files will be stored in the results folder: I) rawResults_generatingModel_msnbc_sf_bjp.csv with accuracy, runtime and date of execution. II) rawResults_generatingModel_msnbc_sf_bjp.graph with the adjacency matrix of the structure learned. III) rawResults_generatingModel_msnbc_sf_bjp.mn for learning the parameters and computing the Conditional Marginal Log-likelihood with the Libra toolkit (libra.cs.uoregon.edu). See step 5 to run a script which do that. 3. Run the mpl-interIAMB-and-hill-climbing.jar to learn the structure of a Markov network from data using the method proposed to optimize the Marginal Pseudo Likelihood scoring function in (Pensar et al, 2014),Section 5. Example: java -jar mpl-interIAMB-and-hill-climbing.jar -dir data -dataset msnbc \ -numTests 2000 -resultsDir results -testSufix .test.data \ -trainSufix .ts.data -D 100 -useADTree true The parameters are: -dir : Path to datasets folder . -D: Number of data points to read from data . -datasetName: Name of dataset without sufixes. -trainSufix: Sufix of train dataset. It will learn the structure from the file in $dir/$datasetName$trainSufix . -testSufix: Sufix of test dataset. It will evaluate the structure from the file in $dir/$datasetName$testSufix . -resultsDir: Folder where the results file will be stored . -useADTree: boolean value specifying if ADTree cache is used to get better performance for contingency tables construction. Three files will be stored in the results folder: I) rawResults_generatingModel_msnbc_sf_mplIAMBHC.csv with accuracy, runtime and date of execution. II) rawResults_generatingModel_msnbc_sf_mplIAMBHC.graph with the adjacency matrix of the structure learned. III) rawResults_generatingModel_msnbc_sf_mplIAMBHC.mn for learning the parameters and computing the Conditional Marginal Log-likelihood with the Libra toolkit (libra.cs.uoregon.edu). See step 5 to run a script which do that. 4. Run the independence-based-algorithms.jar to learn the structure of a Markov network from data using the independence-based algorithms selected as competitors: PC, HITON-PC and GSMN algorithms. Example: java -jar independence-based-algorithms.jar \ -dir data \ -dataset msnbc \ -alg pc \ -D 1000 \ -resultsDir results \ -testSufix .test.data -trainSufix .ts.data \ -numTests 2000 \ -useADTree true The parameters are: -dir : Path to datasets folder. -dataset: Name of dataset without sufixes. -alg : The independence-based algorithm you need to use. Available algorithms in this version are: {pc,hiton,gsmn} -D: Number of data points to read from data. -resultsDir: Folder where the results file will be stored . -trainSufix: Sufix of train dataset. It will learn the structure from the file in $dir/$datasetName$trainSufix . -testSufix: Sufix of test dataset. It will evaluate the structure from the file in $dir/$datasetName$testSufix . -numTests: number of tests used to compute the 'accuracy' quality measure -useADTree: boolean value specifying if ADTree cache is used to get better performance for contingency tables construction. Three files will be stored in the results folder: I) rawResults_generatingModel_msnbc_alg_pc.csv with accuracy, runtime and date of execution. II) rawResults_generatingModel_msnbc_alg_pc.graph with the adjacency matrix of the structure learned. III) rawResults_generatingModel_msnbc_alg_pc.mn for learning the parameters and computing the Conditional Marginal Log-likelihood with the Libra toolkit (libra.cs.uoregon.edu). See step 5 to run a script which do that. -------------------------------------------------------------------------------- LICENSE -------------------------------------------------------------------------------- The modified BSD license, which is included in LICENSE, applies to all source code and other files in this package. -------------------------------------------------------------------------------- AUTHORS -------------------------------------------------------------------------------- * Federico Schluter * Yanela Strappa * Facundo Bromberg * Diego H. Milone The content of this package was last modified on March 2016.