More

plamb · on March 19, 2018

SnappyData employee here. In general this is called the "HTAP" industry (Gartner's phrase: Hybrid Transactional/Analytical Processing).

SnappyData: https://www.snappydata.io, MemSQL: https://www.memsql.com/, Splice Machine: https://www.splicemachine.com/, SAP Hana: https://www.sap.com/products/hana.html, GridGain: https://www.gridgain.com/

are some of the technologies within it

plamb · on Oct 9, 2017

Worked on GemFire prior to Pivotal acquisition (before Geode) and currently work for SnappyData.

You can imagine GemFire/Gridgain as an apples-to-apples comparison. Both are "enterprise" in-memory data grids originally intended for managing data in low-latency OLTP applications which later added analytics/OLAP features. Geode/Ignite are the open source options for these two IMDGs and also a good apples-to-apples comparison. (Hazelcast also has enterprise/OSS verisons I would compare accordingly)

I can't speak to the current comparison between these systems, but I can compare them to SnappyData. SnappyData deeply integrates GemFire with Spark to bring high concurrency, high availability and mutability to Spark applications. In the world of combining Spark with a datastore over a connector (cassandra, hive, mysql, mongo etc) to enable "database-like" features in Spark, SnappyData has taken the next step of integration. In Snappy, the database (GemFire) and the Spark executors share the same block manager and VM so the systems no longer communicate over a "connector." This, along with our database optimizations, provides the best performance for Spark applciations in what I like to call the "Spark Database Ecosystem."

As such, comparing SnappyData to GemFire/Hazelcast/Gridgain does not make much sense unless you are trying to use Spark in conjunction with these systems. In that case, the main difference I would point out is that SnappyData will necessarily perform better as any of them would need to use a connector to interact with Spark. The better comparison would be between SnappyData and Ignite, as Ignite contains a direct Spark abstraction called "IgniteRDD." That said, the majority of the comparisons/benchmarks we've run have been against MemSQL+Spark and Cassandra+Spark, so I don't have much to say about Ignite vs SnappyData.

User manigandham mentions SnappyData's Approximate Query Processing features (called Synopses Data Engine) which is unique within this space, but a discussion of which would take this too far afield.

plamb · on July 10, 2017

SnappyData employee here -- This is essentially what we did. The main difference is that we already had a decade old transactional K/V store, that, over time morphed into a more full fledged in-memory database. That is what we integrated with Spark versus rolling a new database. The SQL layer in this database (GemFire/Geode) already had a number of optimizations we could use to speed up Spark SQL queries, even over the native Spark cache.

Like some of the other comments in this thread, the idea was to provide all the guarantees of a OLTP store (HA, ACID, Scalability, Mutations etc) with the powerful analytic capabilities of Spark.

plamb · on Feb 9, 2017

Appreciate these comments, the site did not go through much testing before being deployed. Overflowing was modified to eliminate horizontal scroll on mobile but it looks like there were some vertical issues as well. We will get this fixed

plamb · on Feb 9, 2017

Our impression was that when Databricks released the billion-rows-in-one-second-on-a-laptop benchmark, readers were pretty awed by that result. We wanted to show that when you combine an in-memory database with Spark so it shares the same JVM/block manager, you can squeeze even more performance out of Spark workloads (over and above Spark 's internal columnar storage). Any analytics that require multiple trips to a database will be impacted by this design. E.g. workloads on a Spark + Cassandra analytics cluster will be significantly slower, barring some fundamental changes to Cassandra.

plamb · on Feb 9, 2017

The font in the embedded gists or the font on the page?

minimaxir · on Feb 9, 2017

Likely the font on the page.

A web design QA note for all: thin fonts (e.g 300-400 weight) as a body font but work fine on macOS due to better font rendering, but do not work well on Windows.

matsemann · on Feb 9, 2017

Is it better on Mac? Whenever I boot into Win10 Im struck by how crisp text looks compared to mac.

mtanski · on Feb 9, 2017

Prob the retina display the high PPI makes fonts much easier to read.

plamb · on Feb 9, 2017

Will look into this

plamb · on Jan 21, 2017

SnappyData: https://github.com/SnappyDataInc/snappydata

plamb · on July 13, 2016

There was a talk sort of on this topic at Spark Summit in June... Here is the part where 2.0 is mentioned: https://youtu.be/PViLT2E2WPQ?t=407

plamb · on July 13, 2016

It takes about as long as it takes to start up a Spark cluster, and you can interact with it entirely through Spark APIs if that's what you're comfortable with. It can also be used in "Split Cluster Mode" so you can use your existing Spark build instead of what's embedded within SnappyData if you prefer.

placeybordeaux · on July 13, 2016

Right now it is easy for me to spin up a spark cluster on YARN, but hard for me to acquire a number of machines to run arbitrary code.

plamb · on July 13, 2016

Github repo: https://github.com/SnappyDataInc/snappydata

frak_your_couch · on July 13, 2016

So, you guys forked gemfire, spark and spark-jobserver. Are you planning on going back into mainline or will you be maintaining the forks to stay current at some interval?

jagsr123 · on July 14, 2016

Just to be clear, we do support Snappydata as a library (unfortunately, the docs don't make this clear) that can work with your distribution (upto 1.6 today). Our releases will support the latest Spark version in a staggered manner.

Everything we have done is an extension to Spark. None of the existing functionality is lost. In a sense, we have turned Spark to work more like a database. So, don't think we can turn it back into the "mainline" (assuming that is what you meant).