GraDiaS is a software to approximate the diameter of large graphs.

The software is used for the experiments in the following papers:

Building

This project uses sbt to manage the build process. To generate a jar file suitable for deployment issue the following command

sbt assembly

This will generate a single jar file containing all the dependencies needed by the project, with the exclusion of spark and hadoop.

Running

The program is meant to be run using the standard spark-submit script. The main class to pass to the script is it.unipd.dei.gradias.Tool.

The application command line interface documentation can be accessed from the command line itself using the --help flag for the command.

user@host> spark-submit --properties-file default.conf --class it.unipd.dei.gradias.Tool sg.jar --help
gradias version 0.11.0 - (c) 2013-2015 Matteo Ceccarello

Usage: spark-submit [spark-opts] --class it.unipd.dei.gradias.Tool gradias-0.11.0.jar <subcommand> [options]

Options:

      --help      Show help message
      --version   Show version of this program

Subcommands:
  delta-cluster
  delta-sssp
  cluster
  mpx13
  hadi
  bfs
  convert
  generate

Each subcommand has its own help message

user@host> spark-submit --properties-file default.conf --class it.unipd.dei.gradias.Tool sg.jar cluster --help
gradias version 0.11.0 - (c) 2013-2014 Matteo Ceccarello
  -d, --diameter-algorithm  <arg>    (default = Distributed)
  -i, --input  <arg>                the input graph
  -o, --output  <arg>               the output file. If not given, no output is
                                    written
  -s, --skip-diameter
      --noskip-diameter
  -t, --target-fraction  <arg>
      --help                        Show help message
      --version                     Show version of this program

A note about the input format

All the algorithms implemented in this program accept the same graph input format, that is a textual adjacency list:

node-0  neighbour-0A neighbour-0B neighbour-0C neighbour-0D
node-1  neighbour-1A neighbour-1B neighbour-1C
......  .....

Moreover, since the algorithms work on undirected graphs, the input must be symmetric: if 1 is in the adjacency list of 0, then 0 must be in the adjacency list of 1 as well.

Several conversion tools are provided by the program, for more information see

convert --help

Changelog

0.11.0

  • Change the distance datatype from Int to Float

  • Add support for disconnected graphs

  • Add the TEPS metric

  • Optimize graph data structures

  • Block representation

  • Bitmasks to store flags in place of booleans

0.10.1

This release features mainly optimizations for the weighted implementation of the clustering algorithm

0.10.0

This release introduces the extension of the algorithm to weighted graphs.

  • FEATURE Support weighted graphs

  • REFACTOR Generic coder refactoring to reduce duplication

0.9.2

  • BUGFIX Sometimes computing the diameter of the quotient graph with Dijkstra resulted in errors like the following

    java.lang.IllegalArgumentException: Cannot decrease priority of an element not in the queue: 149
    This was due to an integer overflow that is now fixed. Runs that
    were not crashing for the above error should not have been affected
    by this bug.

0.9.1

  • BUILD Skip git properties injection in manifest if there is no git repository, i.e. in source tarballs

0.9.0

This is a breaking release.

This is a cleanup release, with dead code removal.

  • REFACTOR Rename the project from spark-graph to GraDiaS: "Graph Diameter on Spark"

  • FEATURE Upgrade to spark-1.2.0

  • FEATURE Colors! Now the output is colored. Moreover the logging configuration has been changed in order to be more clear.

  • REFACTOR Remove old implementations of CLUSTER and MPX13. They are replaced by more efficient implementations of based on a common cluster growing procedure (see next point).

  • REFACTOR Add class ClusterExpander that centralizes the implementation of cluster growing logic. Now all the algorithms using cluster-growing strategies (BFS, MPX13 and CLUSTER) can use the same implementation, thus reducing code duplication. Being no longer necessary, class VertexGraphFunctions has been removed, along with all the hierarchy of Vertex classes.

0.8.1

This is the release for SPAA15. It mainly features a leaner decomposition strategy, and a BFS implementation to set a baseline.

0.7.0

Improve performance of last step of the algorithm: weighted diameter computation. Now the user can select the algorithm to use between

  • Floyd Warshall: sequential, very fast on small (quotient) graphs

  • Dijkstra: sequential, slightly slower than Floyd Warshall

  • Distributed: runs Dijkstra’s algorithm distributed on the cluster. On big quotient graphs (~8000 nodes) allows for a speedup of up to 40x.

0.6.3

General performance improvements.

  • Upgrade to spark-1.0.0

  • Dead code cleanup

  • Reimplement graph shrinking procedure

  • Reduce communication in weighted edges creation

  • Report timing percentages

0.6.2

Featuring new fast decomposition

0.6.1

Implement Miller, Peng and Xu’s algorithm for graph decomposition.

0.6.0

  • Upgrade to spark-0.9.0

  • Dead code cleanup

  • New version of ball decomposition

0.5.5

Implement new contracting ball decomposition

0.5.4

  • Implement new decomposition strategy based on sample and prune

  • Refactor the Tool class

  • Make partitioning optional

0.5.3

Use Elias Fano codes to compress the color lists

0.5.2

  • Upgrade to spark 0.8.1

0.5.1

Add more information to the output

0.5.0

Complete rewrite of the tool

0.4.3

Ball decomposition strategy using limited-length lists of colors

0.4.2

Probability of selection depends on ball cardinality

0.4.1

Rewrite flood ball decomposition

0.4.0

Rewrite HyperANF spark implementation

0.3.2

  • Use codahale-metrics for timing

  • Print more information to standard output

0.3.1

Refactor flood ball decomposition

0.3.0

Add flood ball decomposition

0.2.0

Switch to Spark 0.8.0-incubating

0.1.0

Implement ball decomposition (deterministic and randomized) and the distributed version of HyperANF