

We wrote a few views on these source foreign tables that wrangle the data and clean it up. Splitgraph stores data in a columnar format (usingĬstore_fdw), so we'll be able to query it much faster. If performance becomes a concern, we'll be able to snapshot these tables as Splitgraph images. Is data virtualization always the right solution? No, but it should be a starting point. This idea is called "data virtualization" or a "data fabric". With Splitgraph and PostgreSQL, we can query the data at source. Normally, this would require a data warehouse and a few separate ingestion pipelines. Here's an example:įilter: (age((CURRENT_DATE)::timestamp without time zone, visit_last_action_time) Foreign Scan on sql_api_queriesįilter: ("sg.sql.used_images" IS NOT NULL) We made it available as an sgr mount subcommand. To query Elasticsearch, we used a fork of postgres-elasticsearch-fdw with the ability to push down qualifiers.
#Elasticsearch metabase full
The full Matomo schema spec for Splitgraph is available here. In this, we just pull out interesting tables and columns from Matomo. "visit_last_action_time": "timestamp without time zone", "visit_first_action_time": "timestamp without time zone", We have a full set of commands on our GitHub. Here's an abridged version of how we mount Matomo data on a Splitgraph instance. You can then query the data directly or snapshot it.Ĭould we use a Splitgraph instance and add a MySQL FDW to it to query Matomo data?Īnd if we did, could we use an Elasticsearch FDW to proxy to our events data?Īnd if we did that, could we use something like Metabase and point it at Splitgraph, letting it query data across all our data silos? Splitgraph's sgr mount lets you instantiate an FDW with a single command. One of its features is making PostgreSQL foreign data wrappers more user-friendly. Splitgraph itself is built on top of PostgreSQL. Pictured: five different visualization engines that Kibana lets you useīut then we thought about it some more. Basic functionality like plotting sums is only available through scripted Elasticsearch fields. However, we were already using Kibana to visualize Elasticsearch data and its visualizations were sometimes frustrating to use. The schema, albeit complex, is well documented on their website. We wondered if we could query the data from Matomo's MySQL database directly. Also, data we'd see in Matomo didn't include anything we store in Elasticsearch. While it is pretty powerful, it's limited in the kinds of reports it can produce. The idea for this setup came to us when we were trying to get some data from the Matomo Web UI. For example: a user's primary e-mail address or their GitHub ID. Some of it could be useful to know in an analytics context. We use Elasticsearch to log these and other interesting events.įinally, we have a PostgreSQL database that stores actual user data. Checking for updates: we use this to estimate the number of active sgr users.Pushing and pulling data images to/from Splitgraph.Querying the Splitgraph Data Delivery Network from an SQL client.Starring Splitgraph on GitHub or downloading a release.However, unlike GA, it stores all data locally in a MySQL database.īesides visiting the website, there's a lot of other ways users can interact with Splitgraph. It offers a similar interface and feature set to Google Analytics. Matomo is an open-source web analytics platform. HTTP server logs won't show the full story about website visitors. In the age of CDNs, a visit to a website might never reach the origin server. At the same time, we would like to know what's happening on the website and across the company in general. Finally, we'll talk about how we use Metabase to get a clear view of the business.Īrchitecture diagram of our analytics setup. We'll show a sample SQL query that runs a federated JOIN between these three databases. We'll discuss how we use Splitgraph's sgr mount command to proxy to data from Matomo, Elasticsearch and PostgreSQL. In this post, we'll talk about our analytics stack. This works without any ETL pipelines or a data warehouse. We can find out how many people queried the Splitgraph Data Delivery Network on a given week, how they found Splitgraph, or if they ever pulled a data image. These dashboards reference our Web analytics data, user data and all events happening across the estate. We use Metabase to build BI dashboards that can answer questions about how people interact with us. Note (December 2021): We updated our data stack to use Airbyte and dbt
