ATTENTION: This is an early version of the Blankets Joint Posterior (BJP) scoring function
for Markov network structure learning. Please do not hesitate to contact 
Federico Schluter <federico.schluter@frm.utn.edu.ar> in case you
encounter any difficulties in using the software or find any errors.

This package contains a Java implementation of the Blankets Joint Posterior scoring function
for Markov network structure learning and the Hill climbing search methods
used in its paper for optimizing it. 

For further details on BJP, please refer to:

F. Schluter, Y. Strappa, F. Bromberg, and D. H. Milone. 
"Blankets Joint Posterior score for learning irregular Markov network structures" 
Technical Report, (2016). 
http://192.168.16.89/papers/bjp/fschluterEtAl.MBJP.pdf 

--------------------------------------------------------------------------------
CONTENT
--------------------------------------------------------------------------------
I - The paper manuscript "Blankets Joint Posterior score for learning irregular Markov network structures".

II - Source code and executable files for reproducing the experiments:

The BJP-source-code-schluter-et.al-.zip file contains:

 * source/
    The source code of BJP's Java implementation.

 * libraries/
    The libraries used by BJP's Java implementation.

 * build.xml
    An Ant build file that provides tasks to compile the Java source code and
    to generate two executable JAR files from the resulting class files.

 * generate-synthetic-data.zip
    A Bash script to generate the synthetic data used in our experiments. 

 * hill-climbing.jar
    The algorithm for optimizing the scoring functions for learning Markov networks
    (Blankets Joint posterior, IBMAP, and Marginal Pseudo-likelihood functions are available).
 
 * mpl-interIAMB-and-hill-climbing.jar
    The algorithm for optimizing the Marginal Pseudo Likelihood scoring function 
	for Markov networks structure discover, as proposed in (Pensar et al, 2014), Section 5.

 * independence-based-algorithms.jar
    The competitor independence-based algorithms used in our experiments
    (PC, HITON-PC and GSMN algorithms).

--------------------------------------------------------------------------------
MINIMUM REQUIREMENTS
--------------------------------------------------------------------------------

 * Java 1.7 JRE or newer to run TODTLER.
     <http://www.oracle.com/technetwork/java/javase/downloads/index.html>

 * Java 1.7 JDK or newer to compile TODTLER's source code.
     <http://www.oracle.com/technetwork/java/javase/downloads/index.html>

 * Apache Ant 1.7.0 or newer to execute the build file.
     <http://ant.apache.org/bindownload.cgi>


--------------------------------------------------------------------------------
SETUP
--------------------------------------------------------------------------------

 Compile the Java source code in source/ as follows:

    ant jar                 (from the root directory)

    This command creates the executable JAR files.


--------------------------------------------------------------------------------
USAGE
--------------------------------------------------------------------------------

 1. To generate the synthetic datasets used in the paper, first unzip the 
 	generate-synthetic-data.zip file. Run the bash script generatedata.sh 
 	to generate the synthetic Markov networks distributions, 
 	and then calls the Libra toolkit (libra.cs.uoregon.edu) to run Gibbs sampler. 
 
 2. Run the hill-climbing.jar to learn the structure of a Markov network from data
 	using some scoring function and the heuristic hill-climbing search method proposed. 
 	
    Example:

    java -jar hill-climbing.jar -sf bjp -dir data -dataset msnbc \
    						-numTests 2000 -resultsDir results -testSufix .test.data \
    						-trainSufix .ts.data -D 100 -useADTree true

    This command runs a heuristic hill-climbing search over the space of possible 
    structures maximizing the Blankets Joint Posterior scoring function. 
    Marginal Pseudo Log-likelihood and IBMAP scoring functions are also available 
    by using -sf mpl, or -sf ibscore, respectively. 

	The parameters are:    
    	-dir : Path to datasets folder .
	 	-sf : Scoring function used. Available values are: {bjp, ibscore, mpl} . 
	  	-D: Number of data points to read from data .
	  	-datasetName: Name of dataset without sufixes.  
	  	-trainSufix: Sufix of train dataset. It will learn the structure from 
	  	the file in $dir/$datasetName$trainSufix .  
	  	-testSufix: Sufix of test dataset. It will evaluate the structure from 
	  	the file in $dir/$datasetName$testSufix . 
	  	-resultsDir: Folder where the results file will be stored .
	  	-useADTree: boolean value specifying if ADTree cache is used to get 
	  	better performance for contingency tables construction.
    
    Three files will be stored in the results folder:
		I) rawResults_generatingModel_msnbc_sf_bjp.csv with accuracy, 
		runtime and date of execution.
		II) rawResults_generatingModel_msnbc_sf_bjp.graph with the 
		adjacency matrix of the structure learned.
		III) rawResults_generatingModel_msnbc_sf_bjp.mn for learning 
		the parameters and computing the Conditional Marginal Log-likelihood 
		with the Libra toolkit (libra.cs.uoregon.edu). See step 5 to run a script which do that.  

 3. Run the mpl-interIAMB-and-hill-climbing.jar to learn the structure 
 	of a Markov network from data using the method proposed to optimize 
 	the Marginal Pseudo Likelihood scoring function in (Pensar et al, 2014),Section 5.
 
    Example:

    java -jar mpl-interIAMB-and-hill-climbing.jar -dir data -dataset msnbc \
    						-numTests 2000 -resultsDir results -testSufix .test.data \
    						-trainSufix .ts.data -D 100 -useADTree true

	The parameters are:
		-dir : Path to datasets folder .
	 	-D: Number of data points to read from data .
	 	-datasetName: Name of dataset without sufixes.  
	 	-trainSufix: Sufix of train dataset. It will learn the structure from the file 
	 	in $dir/$datasetName$trainSufix .  
	 	-testSufix: Sufix of test dataset. It will evaluate the structure from the file 
	 	in $dir/$datasetName$testSufix . 
	 	-resultsDir: Folder where the results file will be stored .
	 	-useADTree: boolean value specifying if ADTree cache is used to get better performance 
	 	for contingency tables construction.	
	
	Three files will be stored 
    in the results folder:
		I) rawResults_generatingModel_msnbc_sf_mplIAMBHC.csv with accuracy, 
		runtime and date of execution.
		II) rawResults_generatingModel_msnbc_sf_mplIAMBHC.graph with the 
		adjacency matrix of the structure learned.
		III) rawResults_generatingModel_msnbc_sf_mplIAMBHC.mn for learning 
		the parameters and computing the Conditional Marginal Log-likelihood 
		with the Libra toolkit (libra.cs.uoregon.edu). See step 5 to run a script which do that.  
	
 4. Run the independence-based-algorithms.jar to learn the structure 
 	of a Markov network from data using the independence-based algorithms
 	selected as competitors: PC, HITON-PC and GSMN algorithms.
 
    Example:

    java -jar independence-based-algorithms.jar \ 
    				-dir data \
					-dataset msnbc \
					-alg pc \ 
					-D 1000 \ 
					-resultsDir results \
					-testSufix .test.data -trainSufix .ts.data \
					-numTests 2000 \
					-useADTree true

	The parameters are:
		-dir : Path to datasets folder.
	 	-dataset: Name of dataset without sufixes.
		-alg : The independence-based algorithm you need to use. 
		Available algorithms in this version are: {pc,hiton,gsmn}
	 	-D: Number of data points to read from data.
	 	-resultsDir: Folder where the results file will be stored .
	 	-trainSufix: Sufix of train dataset. It will learn the structure from the file 
	 	in $dir/$datasetName$trainSufix .  
	 	-testSufix: Sufix of test dataset. It will evaluate the structure from the file 
	 	in $dir/$datasetName$testSufix . 
	 	-numTests: number of tests used to compute the 'accuracy' quality measure   
	 	-useADTree: boolean value specifying if ADTree cache is used to get better performance 
	 	for contingency tables construction.	
	
	Three files will be stored 
    in the results folder:
		I) rawResults_generatingModel_msnbc_alg_pc.csv with accuracy, 
		runtime and date of execution.
		II) rawResults_generatingModel_msnbc_alg_pc.graph with the 
		adjacency matrix of the structure learned.
		III) rawResults_generatingModel_msnbc_alg_pc.mn for learning 
		the parameters and computing the Conditional Marginal Log-likelihood 
		with the Libra toolkit (libra.cs.uoregon.edu). See step 5 to run a script which do that. 

--------------------------------------------------------------------------------
LICENSE
--------------------------------------------------------------------------------

The modified BSD license, which is included in LICENSE, applies to all source
code and other files in this package.


--------------------------------------------------------------------------------
AUTHORS
--------------------------------------------------------------------------------
  * Federico Schluter <federico.schluter@frm.utn.edu.ar>
      <http://dharma.frm.utn.edu.ar/~fschluter>  
  
  * Yanela Strappa <yanela.trappa@frm.utn.edu.ar>
      <http://dharma.frm.utn.edu.ar/~ystrappa>  

  * Facundo Bromberg <fbromberg@frm.utn.edu.ar>
      <http://dharma.frm.utn.edu.ar/~fbromberg>

  * Diego H. Milone <dmilone@sinc.unl.edu.ar>
      <http://fich.unl.edu.ar/sinc/blog/staff/diego-milone/>  


The content of this package was last modified on March 2016.