April 15, 2024
Pinterest Engineering
Pinterest Engineering Blog

Monil Mukesh Sanghavi | Software program Engineer, Actual Time Analytics Group; Xiao Li | Software program Engineer, Actual Time Analytics Group; Ming-Could Hu | Software program Engineer, Actual Time Analytics Group; Zhenxiao Luo | Software program Engineer, Actual Time Analytics Group; Kapil Bajaj | Supervisor, Actual Time Analytics Group

man’s hands holding a stopwatch

At Pinterest, one of many pillars of the observability stack gives inner engineering groups (our customers) the chance to observe their providers utilizing metrics knowledge and arrange alerting on it. Goku is our in-house time sequence database offering price environment friendly and low latency storage for metrics knowledge. Beneath, Goku just isn’t a single cluster however a group of sub-service parts together with:

  • Goku Quick Time period (in-memory storage for the final 24 hours of knowledge, known as GokuS)
  • Goku Lengthy Time period (ssd and hdd primarily based storage for older knowledge, known as GokuL)
  • Goku Compactor (time sequence knowledge aggregation and conversion engine)
  • Goku Root (good question routing)

You’ll be able to learn extra about these parts within the weblog posts on GokuS Storage, GokuL (long run) storage, and Price Financial savings on Goku, however quite a bit has modified in Goku since these had been written. We have now applied a number of options that elevated the effectivity of Goku and improved the consumer expertise. On this 3 half weblog publish sequence, we are going to cowl the effectivity enhancements in 3 main elements:

  1. Enhancing restoration time of each GokuS and GokuL (that is the overall time a single host or cluster in Goku takes to come back up and begin serving time sequence queries)
  2. Enhancing question expertise in Goku by reducing latencies of high-priced and excessive cardinality queries
  3. Lowering the general price of Goku at Pinterest

We’ll additionally share some learnings and takeaways from utilizing Goku for storing metrics at Pinterest.

This 2nd weblog publish focuses on how Goku time sequence queries had been improved. We are going to present a short overview of Goku’s time sequence knowledge mannequin, question mannequin, and structure. We are going to observe up with the development options we added together with rollup, pre-aggregation, and pagination.

The info mannequin of a time sequence in Goku is similar to OpenTSDB’s (which Goku changed) knowledge mannequin. You could find extra particulars here. Right here’s a fast overview of the Goku TimeSeries knowledge mannequin.

A time sequence metadata or key consists of the next:

Metric Name: proc.stat.cpu; Tag Value Combination 1: host=abc; Tag Value Combination 2: cluster=goku; Tag Value Combination 3: az=us-east-1a;  Tag Value Combination n: os=ubuntu-1

The info a part of a time sequence, which we consult with as time sequence stream, consists of knowledge factors which might be time worth pairs, the place time is in unix time and worth is a numerical worth.

Data point 1 — Timestamp: 16:00, Value: 3.0; Data point 2 — Timestamp: 16:01, Value: 4.2; Data point 3 — Timestamp: 16.02, Value: 5.2; Data point n — Timestamp: 16.59, Value: 4.0

A number of hosts can emit time sequence for a novel metric identify. For instance: cpu,reminiscence,disk utilization or some utility metric. The host-specific data is a part of one of many tags talked about above. For instance: tag- key == host and worth == host identify.

Multicolor chart displaying TimeSeries number, Metric Name, Tag Value 1, Tag Value 2, Tag Value 3, Tag Value n

A cardinality of a metric (i.e. metric identify) is outlined as the overall variety of distinctive timeseries for that metric identify. A singular time sequence has a novel mixture of tag keys and values. You’ll be able to perceive extra about cardinality here.

For instance, the cardinality of the metric identify “proc.stat.cpu” within the above desk is 5, as a result of the mixture of tag worth pairs together with the metric identify of every of those 5 timeseries don’t repeat. Equally, the cardinality of the metric identify “proc.stat.mem” is 3. Observe how we signify a specific string (be it metric identify or tag worth) as a novel colour. That is to indicate {that a} sure tag worth pair may be current in a number of time sequence, however the mixture of such strings is what makes a time sequence distinctive.

Goku makes use of apache thrift for Question RPC. The question mannequin of Goku is similar to OpenTSDB’s question mannequin specified here. To summarize, a question to Goku Root is much like the request specified beneath:

Let’s go over the necessary choices within the request construction above:

  • metricName — metric identify with out the tag mixtures
  • listing<Filter> — filters on tag values like sample match, wildcard, embrace/ exclude tag worth (may be a number of), and many others.
  • Aggregator — sum/ max/ min/ p99/ rely/ imply/ median and many others. on the group of timeseries
  • Downsample — consumer specified granularity in time returned in outcomes
  • Rollup aggregation/ interval — downsampling at a time sequence degree. This feature turns into obligatory in lengthy vary queries (you will note the rationale beneath in Rollup).
  • startTime, endTime — vary of question

The question response seems as follows:

The monitoring and alerting framework at Pinterest (internally known as statsboard) question shopper sends QueryRequest to Goku Root, which forwards it to the leaf clusters (GokuS and/ or GokuL) primarily based on the question time vary and the shards they host. The leaf clusters do the mandatory grouping (filtering), interpolation, aggregation, and downsampling as wanted and reply to the Goku Root with QueryResponse. The Root will once more do the aggregation if mandatory and reply to the statsboard question shopper with QueryResponse.

Let’s now take a look at how we improved the question expertise.

Goku helps the bottom time granularity of 1 second within the time sequence stream. Nonetheless, having such advantageous granularity can influence the question efficiency because of the following causes:

  • An excessive amount of knowledge (too many knowledge factors) over the community for a non downsample uncooked question
  • Costly computation and therefore cpu price whereas aggregating due to too many knowledge factors
  • Time consuming knowledge fetch, particularly for GokuL (which makes use of SSD, HDD for knowledge storage)

For outdated metric knowledge residing in GokuL, we determined to additionally retailer rolled up knowledge to spice up question latency. Rolling up means lowering the granularity of the time sequence knowledge factors by storing aggregated values for the determined interval. For instance: A uncooked time sequence stream

when aggregated utilizing rollup interval of 5 and rollup aggregators of sum, min, max, rely, common could have 5 shorter time sequence streams as follows:

The next desk explains the tiering and rollup technique:

Rollup benefitted the GokuL service in 3 methods:

  • Decreased the storage price of plentiful uncooked knowledge
  • Decreased the info fetch price from ssd, decreased the cpu aggregation price, and thus decreased the question latency
  • Some queries that may outing from the OpenTSDB supporting HBase clusters would return profitable question outcomes from GokuL.

The rollup aggregation is finished within the Goku compactor (defined right here) earlier than it creates the sst information containing the time sequence knowledge to be saved within the rocksDB primarily based GokuL situations.

In manufacturing, we observe that p99 latency of queries utilizing rolled up knowledge is sort of 1000x lower than queries utilizing uncooked knowledge.

P99 latency for GokuL question utilizing uncooked knowledge is sort of a number of seconds
GokuL question utilizing rollup knowledge has p99 in milliseconds.

At question time, Goku responds with an exception stating “cardinality restrict exceeded” if the variety of time sequence the question would choose/ learn from publish filtering exceeds the pre-configured restrict. That is to guard the Goku system sources as a result of noisy costly queries. We noticed queries for prime cardinality metrics hitting timeouts, chewing up the system sources, and affecting the in any other case low latency queries. Typically, after analyzing the excessive cardinality or timing out queries, we discovered that the tag(s) that contributed to the excessive cardinality of the metric weren’t even wanted by the consumer within the ultimate question end result.

The pre-aggregation function was launched with the goal of eradicating these undesirable tags within the pre-aggregated metrics, thus, lowering the unique cardinality, lowering the question latency, and efficiently serving the question outcomes to the consumer with out timing out or consuming plenty of system sources. The function creates and shops aggregated time sequence by eradicating pointless tags that the consumer mentions. The aggregated time sequence has tags that the consumer has particularly requested to protect. For instance:

If the consumer asks to allow pre-aggregation for the metric “app.some_stat” and desires to protect solely the cluster and az data, the pre-aggregated time sequence will seem like this:

Observe how the cardinality of the pre-aggregated metric is decreased from 5 to three.

The pre-aggregated metrics are new time sequence created inside Goku that don’t change the unique uncooked time sequence. Additionally for the sake of simplicity, we determined to not introduce these metrics again into the standard ingestion pipeline that we emit to Kafka.

Here’s a movement of how enabling pre-aggregation works:

  1. Customers experiencing excessive latency queries or queries hitting cardinality restrict exceeded timeout resolve to allow pre-aggregation for the metric.
  2. The Goku crew gives the tag mixture distribution of the metric to the consumer. For instance:

3. Customers resolve on the tags they need to protect within the pre-aggregated time sequence. The “to be preserved” tags are known as grouping tags. There’s additionally an elective provision supplied to pick a specific tag key == tag worth mixture to be preserved and discard all different tag worth mixtures for that tag key. These provisions are known as conditional tags.

4. Person is notified of the decreased cardinality and pre-aggregation is enabled for the metric which the consumer finalizes.

Write path change:

After consuming an information level for a metric from Kafka, the Goku Quick Time period host checks if the time sequence qualifies to be pre-aggregated. If the time sequence qualifies, the worth of the datapoint is entered in an in reminiscence knowledge construction, which data the sum, max, min, rely, and imply of the info seen up to now. The info construction additionally emits 5 aggregated knowledge factors (aggregations talked about above) for the time sequence with an internally modified Goku metric identify each minute.

Learn Path change:

Within the question request to Goku Root, the observability statsboard shopper sends a boolean, which determines if the pre-aggregated model of the metric must be queried. Goku Root does the corresponding metric identify change to question the best time sequence.

Success story: One manufacturing metric (within the instance supplied above) saved in Goku on which alerts had been set was seeing excessive cardinality exceptions (cardinality ~32M throughout peak hours).

We reached out to the consumer to assist perceive the use case and recommended enabling pre-aggregation for his or her metric. As soon as we enabled pre-aggregation, the queries efficiently accomplished with latencies beneath 100ms.

We have now onboarded greater than 50 use circumstances for pre-aggregation.

Throughout launch to manufacturing, a question timeout function needed to be applied in Goku Lengthy Time period to keep away from an costly question consuming the server sources for a very long time. This, nevertheless, resulted in customers of high-priced queries seeing timeouts and wastage of server sources even when it was for a brief time period (i.e. configured question timeout). To confront this problem, the pagination function was launched, which might promise a non timed out end result to the tip consumer of an costly question, although it might take longer than normal. It might additionally break/ plan the question in such a means that useful resource utilization on the server is managed.

The workflow of the pagination function is:

  1. Question shopper sends a PagedQueryRequest to Goku Root if the metric is within the listing of pagination supported metrics.
  2. Goku Root plans the question primarily based on time slicing.
  3. Goku Root and Question shopper have a sequence of request-response exchanges with the basis server. This gives the question shopper with a touch of what needs to be the subsequent begin and finish time vary of the question and its personal IP deal with in order that the site visitors managing envoy can route the question to the best server.

We have now included ~10 use circumstances in manufacturing.

The next are concepts we’ve got to additional enhance question expertise in Goku:

Tag-based aggregation in Goku

Throughout compaction, generate pre-aggregated time sequence by aggregating on the excessive cardinality contributing tags like host, and many others. Work with the shopper crew to determine such tags. This may generate time sequence and improve the storage price, however not by a lot. Within the queries, if the excessive cardinality tags will not be current, the leaf server will robotically serve utilizing the pre-aggregated time sequence.

At the moment, the shopper observability crew already has a function in place to take away the excessive cardinality contributing host tag from a set of long run metrics. Sooner or later, this may make use of the tag-based aggregation help in Goku, or Goku can present the tips to the observability crew primarily based on the question evaluation above to incorporate extra long run metrics of their listing.

Put up-query processing help in Goku

Many customers of statsboard use the tscript publish question processing to additional course of their outcomes. The pushing of this processing layer into Goku can present the next advantages:

  1. Leverages further compute sources out there at Goku Root and Goku Leaf (GokuS and GokuL) clusters
  2. Much less knowledge over the community resulting in attainable decrease question latencies

Some examples of publish question processing help embrace discovering the highest N time sequence, summing of the time sequence, and many others.

Backfilling help in pre-aggregation

We at present don’t help pre-aggregated queries for a metric for a time vary that falls earlier than the time the metric was configured for pre-aggregation. For instance: if a metric was enabled for pre-aggregation on 1st Jan 2022 00:00:00, customers gained’t be capable of question pre-aggregated knowledge for time earlier than thirty first Dec 2021 23:59:59. By supporting pre-aggregation throughout compaction, we will take away this restrict and slowly however steadily (as bigger tier buckets begin forming), customers will begin seeing pre-aggregated knowledge for older time ranges.

SQL help

At the moment, Goku is queryable solely through the use of a thrift interface for RPC. SQL is broadly used worldwide as a querying framework for knowledge, and having SQL help in Goku would considerably assist analytical use circumstances. We’re beginning to see an growing demand for this and are exploring options.

Learn from S3

A capability to retailer and browse from S3 would assist Goku lengthen the ttl of uncooked knowledge, and even lengthen the ttl of queryable metrics knowledge. This might additionally show price helpful to retailer metrics which might be sometimes used.

Particular because of Rui Zhang, Hao Jiang, and Miao Wang for his or her efforts in supporting the above options. An enormous because of the Observability crew for his or her assist and help for these options on the consumer dealing with facet.

Within the subsequent weblog, we are going to concentrate on how we introduced down the price of the Goku service(s).

To study extra about engineering at Pinterest, try the remainder of our Engineering Weblog and go to our Pinterest Labs website. To discover and apply to open roles, go to our Careers web page.