GDlog: A GPU-accelerated deductive engine

convexstrictly · on Dec 3, 2023

The paper claims it builds upon the concepts in HashGraph, an efficient CUDA hashtable implementation.

HashGraph (2019) https://arxiv.org/abs/1907.02900

Anyone know what the most performant CUDA hash table implementations are these days?

cmrdporcupine · on Dec 3, 2023

Very strange / coincidental to see this up here today, when I came across this myself just two days ago and have been busy studying the code all weekend.

I don't have an NVIDIA GPU though, so I've instead just been trying to understand their core hash-indexed sorted array datastructure, by porting it to Rust and hopefully using SIMD intrinsics, instead of via GPU.

kmicinski · on Dec 4, 2023

Sounds awesome--feel free to get in touch with us (the authors of this paper) and share your progress. We have a similar single-node Datalog engine in Rust, it would be cool to benchmark your results against parallel Ascent (https://github.com/s-arash/ascent).

cmrdporcupine · on Dec 4, 2023

Cool, I will try to carve out some time to look at that, too!

convexstrictly · on Dec 3, 2023

Github repo

https://github.com/harp-lab/gdlog

fulafel · on Dec 4, 2023

Datalog related things really seem to have momentum these days.

Interesting that the application areas seem so different. This talks mostly about specialized source code analysis applications, whereas eg in Clojure circles it's used in normal application databases. I wonder if there'd be a way to use this as a backend for eg XTDB or Datalevin.

kmicinski · on Dec 4, 2023

Yeah, there are a ton of substantively different approaches to modern Datalogs, targeting different applications.

To start off: Datalog is distinguished from traditional SQL in its focus on heavily-recursive reachability-based reasoning. With respect to expressivity, you can see Datalog as CDCL/DPLL restricted to boolean constraint propagation (i.e., Horn clauses). Operationally, you can think of this as: tight range-indexed loops which are performing insertion/deduplication into an (indexed) relation-backing data structure (a BTree/trie/etc...). In SQL, you don't know the query a-priori, so you can't just index everything--but in Datalog, you know all of the rules up-front and can generate indices for everything. This ubiquitous indexing enables the state-of-the-art work we see with Datalog in static analysis (DOOP, cclyzer), security (ddisasm), etc...

Our group targets tasks like code analysis and these big graph problems because we think they represent the most computationally-complex, hard problems that we are capable of doing. The next step here is to scale our prototypes (a handful of rules) to large, realistic systems--some potential applications of that are, e.g., raw feature extraction for binaries when you do ML over binary corpuses (which otherwise require, e.g., running IDA) on the GPU (rather than IDA on the CPU), medical reasoning (accelerating MediKanren), and (hopefully) probabilistic programming (these neuro-symbolic applications).

By contrast, I think work which takes a more traditional Databases approach (CodeQL, RDFox, ...) focus a little less on ubiquitous high-performance range-indexed insertion in a tight loop, and focus a little more on supporting robust querying and especially operating on streams. There is some very cool related work there in differential dataflow (upon which differential Datalog is built). There is a solver there named DDlog (written in Rust) which takes that approach. Our in-house experiments show that DDlog is often a constant factor slower than Souffle on GPUs, and we did not directly compare against DDlog in this paper--I expect the results would be roughly similar to Souffle.

zozbot234 · on Dec 4, 2023

> To start off: Datalog is distinguished from traditional SQL in its focus on heavily-recursive reachability-based reasoning.

This was historically true but SQL has CTE's and recursive CTE's these days, and even some extra syntactic sugar for reachability query's. And of course (given the former) "inference" and "deduction" over data are just a CREATE VIEW statement away. This is the "you know all of the rules up-front" part; a VIEW is just a query that you do know upfront. Most uses of "deductive" databases are just for querying within some in-memory database, which is not really playing in the same league as a fully general RDBMS. This is not intended to dismiss the work in OP, of course; if anything, its applicability need not be restricted to a tiny niche of software intended for specific application areas, and can be quite a bit broader than that.

kmicinski · on Dec 5, 2023

Right--but CTEs are orders-of-magnitude slower than Datalog, to the point that they are not seriously worth considering for any modern application where Datalog would be used. As you said, Datalog is less expressive than SQL's ability to create new queries in an ad-hoc way: knowing all queries within the fixed point enables much more efficient compilation.

westurner · on Dec 5, 2023

"Introduction to Datalog" re: Linked Data https://news.ycombinator.com/context?id=34808887

pyDatalog/examples/SQLAlchemy.py: https://github.com/baojie/pydatalog/blob/master/pyDatalog/ex...

GH topics > datalog: https://github.com/topics/datalog

datalog?l=rust: https://github.com/topics/datalog?l=rust ... Cozo, Crepe

Crepe: https://github.com/ekzhang/crepe :

> Crepe is a library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. It provides a procedural macro that generates efficient, safe code and interoperates seamlessly with Rust programs.

Looks like there's not yet a Python grammar for the treeedb tree-sitter: https://github.com/langston-barrett/treeedb :

> Generate Soufflé Datalog types, relations, and facts that represent ASTs from a variety of programming languages.

Looks like roxi supports n3, which adds `=>` "implies" to the Turtle lightweight RDF representation: https://github.com/pbonte/roxi

FWIW rdflib/owl-rl: https://owl-rl.readthedocs.io/en/latest/owlrl.html :

> simple forward chaining rules are used to extend (recursively) the incoming graph with all triples that the rule sets permit (ie, the “deductive closure” of the graph is computed).

ForwardChainingStore and BackwardChainingStore implementations w/ rdflib in Python: https://github.com/RDFLib/FuXi/issues/15

Fast CUDA hashmaps

Gdlog is built on CuCollections.

GPU HashMap libs to benchmark: Warpcore, CuCollections,

https://github.com/NVIDIA/cuCollections

https://github.com/NVIDIA/cccl

https://github.com/sleeepyjack/warpcore

/? Rocm HashMap

DeMoriarty/DOKsparse: https://github.com/DeMoriarty/DOKSparse

/? SIMD hashmap

Google's SwissTable: https://github.com/topics/swisstable

rust-lang/hashbrown: https://github.com/rust-lang/hashbrown

CuPy has array but not yet hashmaps, or (GPU) SIMD FWICS?

NumPy does SIMD: https://numpy.org/doc/stable/reference/simd/

google/highway: https://github.com/google/highway

xtensor-stack/xsimd: https://github.com/xtensor-stack/xsimd

GH topics > HashMap: https://github.com/topics/hashmap

kmicinski · on Dec 5, 2023

Yeah, IncA (compiling to Souffle) and Ascent (I believe Crepe is not parallel, though also a good engine) are two other relevant cites here. Apropos linked data, our group has an MPI-based engine which is built around linked facts (subsequently enabling defunctionalization, ad-hoc polymorphism, etc..), which is very reminiscent of the discussion in the first link of yours: https://arxiv.org/abs/2211.11573

westurner · on Dec 5, 2023

TIL about Approximate Reasoning.

"Approximate Reasoning for Large-Scale ABox in OWL DL Based on Neural-Symbolic Learning" (2023) > Parameter Settings of the CFR [2023 ChunfyReasoner] and NMT4RDFS [2018] in the Experiments. https://www.researchgate.net/figure/Parameter-Settings-of-th...

"Deep learning for noise-tolerant RDFS reasoning" (2018) > NMT4RDFS: http://www.semantic-web-journal.net/content/deep-learning-no... :

> This paper documents a novel approach that extends noise-tolerance in the SW to full RDFS reasoning. Our embedding technique— that is tailored for RDFS reasoning— consists of layering RDF graphs and encoding them in the form of 3D adjacency matrices where each layer layout forms a graph word. Each input graph and its entailments are then represented as sequences of graph words, and RDFS inference can be formulated as translation of these graph words sequences, achieved through neural machine translation. Our evaluation on LUBM1 synthetic dataset shows 97% validation accuracy and 87.76% on a subset of DBpedia while demonstrating a noise-tolerance unavailable with rule-based reasoners.

NMT4RDFS: https://github.com/Bassem-Makni/NMT4RDFS

...

A human-generated review article with an emphasis on standards; with citations to summarize:

"Why do we need SWRL and RIF in an OWL2 world?" [with SPARQL CONSTRUCT, SPIN, and now SHACL] https://answers.knowledgegraph.tech/t/why-do-we-need-swrl-an...

https://spinrdf.org/spin-shacl.html :

> From SPIN to SHACL In July 2017, the W3C has ratified the Shapes Constraint Language (SHACL) as an official W3C Recommendation. SHACL was strongly influenced by SPIN and can be regarded as its legitimate successor. This document explains how the two languages relate and shows how basically every SPIN feature has a direct equivalent in SHACL, while SHACL improves over the features explored by SPIN

/? Shacl datalog https://www.google.com/search?q=%22shacl%22+%22datalog%22

"Reconciling SHACL and Ontologies: Semantics and Validation via Rewriting" (2023) https://scholar.google.com/scholar?q=Reconciling+SHACL+and+O... :

> SHACL is used for expressing integrity constraints on complete data, while OWL allows inferring implicit facts from incomplete data; SHACL reasoners perform validation, while OWL reasoners do logical inference. Integrating these two tasks into one uniform approach is a relevant but challenging problem.

"Well-founded Semantics for Recursive SHACL" (2022) [and datalog] https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A169...

"SHACL Constraints with Inference Rules" (2019) https://arxiv.org/abs/1911.00598 https://scholar.google.com/scholar?cites=1685576975485159766...

Datalog > Evaluation: https://en.wikipedia.org/wiki/Datalog#Evaluation

...

VMware/ddlog: Differential datalog

> Bottom-up: DDlog starts from a set of input facts and computes all possible derived facts by following user-defined rules, in a bottom-up fashion. In contrast, top-down engines are optimized to answer individual user queries without computing all possible facts ahead of time. For example, given a Datalog program that computes pairs of connected vertices in a graph, a bottom-up engine maintains the set of all such pairs. A top-down engine, on the other hand, is triggered by a user query to determine whether a pair of vertices is connected and handles the query by searching for a derivation chain back to ground facts. The bottom-up approach is preferable in applications where all derived facts must be computed ahead of time and in applications where the cost of initial computation is amortized across a large number of queries.

From https://community.openlinksw.com/t/virtuoso-openlink-reasoni... https://github.com/openlink/virtuoso-opensource/issues/660 :

> The Virtuoso built-in (rule sets) and custom inferencing and reasoning is backward chaining, where the inferred results are materialised at query runtime. This results in fewer physical triples having to exist in the database, saving space and ultimately cost of ownership, i.e., less physical resources are required, compared to forward chaining where the inferred data is pre-generated as physical triples, requiring more physical resources for hosting the data.

FWIU it's called ShaclSail, and there's a NotifyingSail: org.eclipse.rdf4j.sail.shacl.ShaclSail: https://rdf4j.org/javadoc/3.2.0/org/eclipse/rdf4j/sail/shacl...

westurner · on Dec 3, 2023

"GDlog: A GPU-Accelerated Deductive Engine" (2023) https://arxiv.org/abs/2311.02206 :

> Abstract: Modern deductive database engines (e.g., LogicBlox and Soufflé) enable their users to write declarative queries which compute recursive deductions over extensional data, leaving their high-performance operationalization (query planning, semi-naïve evaluation, and parallelization) to the engine. Such engines form the backbone of modern high-throughput applications in static analysis, security auditing, social-media mining, and business analytics. State-of-the-art engines are built upon nested loop joins over explicit representations (e.g., BTrees and tries) and ubiquitously employ range indexing to accelerate iterated joins. In this work, we present GDlog: a GPU-based deductive analytics engine (implemented as a CUDA library) which achieves significant performance improvements (5--10x or more) versus prior systems. GDlog is powered by a novel range-indexed SIMD datastructure: the hash-indexed sorted array (HISA). We perform extensive evaluation on GDlog, comparing it against both CPU and GPU-based hash tables and Datalog engines, and using it to support a range of large-scale deductive queries including reachability, same generation, and context-sensitive program analysis . Our experiments show that GDlog achieves performance competitive with modern SIMD hash tables and beats prior work by an order of magnitude in runtime while offering more favorable memory footprint.