September 7, 2024

All software program is constructed atop a core set of assumptions. As new code is added and new use-cases emerge, software program can change into unmoored from these assumptions. When this occurs, a basic stress arises between revisiting these foundational assumptions—which normally entails plenty of work—or making an attempt to assist new habits atop the present structure. The latter method is normally suggested, to avoid wasting time and scale back danger.

Nonetheless, there are occasions when it’s value revising the core structure of a giant software program software. Just lately at Slack we did simply that, taking a step again to vary how our backend and shoppers (the desktop and cell functions) work on a foundational stage.

Slack launched in 2013 with a easy structure—every person belonged to a single workspace, the place they joined channels and despatched messages. To view messages from a special workspace (that you just have been additionally logged in to), you wanted to click on into that workspace. 

This mannequin held till 2017, once we launched Enterprise Grid, which lets Slack’s largest clients divide their organizations into a number of workspaces, every with a specific focus. At first Enterprise Grid customers have been normally in only a single workspace, however over time utilization patterns modified, and immediately these customers typically belong to a number of workspaces. Concurrently, we’ve constructed methods for Slack shoppers to share information throughout a number of workspaces on the identical Grid, such because the Threads and Unreads views and cross-workspace channels. 

This led to a pure query: if information is shared between a number of workspaces on the identical Grid, and customers want to modify between these workspaces to do their jobs, why not as a substitute present a single, unified view of all the info a person can entry inside their Grid? Not solely would this present a superior person expertise, it might get rid of a category of bugs attributable to syncing org-wide information throughout a number of workspaces. And it might enhance efficiency, since information for a number of workspaces might be loaded in a single API request.

With this perception, the Unified Grid undertaking was born. However as a result of Slack was architected with the idea that the majority information is specific to a single workspace, it was initially unclear whether or not Unified Grid was even possible. Nonetheless, we determined that as a result of the product continued to push in opposition to the bounds of a workspace-centric structure, we needed to attempt.

Unified Grid supplies highly effective organizational ideas just like the DMs tab, the Exercise Tab, and Reserve it for Later, whereas nonetheless permitting customers to filter by workspace.

Enterprise Grid: The evolution of Slack’s structure

To grasp what made Unified Grid such an bold undertaking, it’s value zooming out to research Slack’s structure and the way it’s developed over time.

In 2013, Slack launched with a comparatively easy mannequin. Customers belonged to workspaces inside which they joined channels and despatched messages. Every workspace represented a buyer, and all the info for a specific workspace was saved on a single database server, or “shard.” Slack shoppers authenticated their API requests utilizing session tokens containing the person ID and workspace ID (known as “workspace tokens”); the backend then parsed the workspace ID and used it to affiliate every API request with a workspace, route queries to that workspace’s database shard and carry out entry management. This mannequin additionally prolonged to the shopper, the place the info for every workspace was saved in a separate repository with distinct login classes.

The unique Slack information mannequin routed all queries to a database shard recognized by the workspace ID within the session token.

As Slack grew, we seen that particular person divisions inside the identical firm typically created separate Slack workspaces. We wished to present corporations a easy solution to administer these workspaces by way of a single UI, the place they may implement safety insurance policies and deal with billing throughout their complete group. Thus, Enterprise Grid, our answer for our largest and most advanced clients, was born.

To assist Enterprise Grid, we launched the idea of an “org” that successfully served as a “father or mother” to a number of workspaces. Customers nonetheless navigated Slack from the angle of a person workspace, however now it was additionally attainable for information to be saved on the org stage. For instance, clients may create cross-workspace (XWS) channels, which have been saved on the org’s database shard and visual throughout a number of workspaces. This meant that the Slack backend was required to question information on each the workspace shard and, if absent there, on the org shard (for workspaces that are a part of an Enterprise Grid). As a result of Enterprise Grid customers might be assigned permissions on the extent of the workspace and/or org, the backend additionally needed to test permissions at each the workspace and org-level.

In Enterprise Grid, the backend queries each the workspace and org shard to resolve information saved on the org-level (and subsequently out there to all workspaces on the Grid).

The altering panorama

Initially, since finish customers have been normally in a single workspace, their expertise didn’t change a lot in Enterprise Grid. Nonetheless, over time the best way clients use Slack has developed. Now, a good portion of customers do belong to a number of workspaces on the grid, which led to context switching and missed exercise.

We wished to deal with these issues, and several other infrastructure-level modifications we’d made advised a method ahead. With the Vitess migration, we started sharding information alongside axes apart from workspace or org ID, which means that the workspace or org was now not required to route queries to the suitable database shard for our most vital tables. We additionally enhanced our real-time messaging (RTM) stack to take away the necessity to fan-out org-wide information to each workspace on the grid (and a few of our largest clients have hundreds of workspaces!). Lastly, we up to date shoppers to share org-wide information throughout all workspaces inside their grid. Leveraging these infrastructure investments, we constructed views that aggregated content material from a number of workspaces, like our Threads and Unreads view.

Nonetheless, even with these enhancements, our workspace-centric structure nonetheless induced important frustration. We knew that to actually remedy the issue, we’d want to maneuver to an org-wide structure, although this could entail updating hundreds of APIs, database queries, and permissions checks.

Prototyping the trail

Execs—to not point out engineers—have been understandably involved about the price of Unified Grid, and never satisfied that the payoff can be well worth the effort. Subsequently, moderately than begin by tackling what have been probably hundreds of damaged APIs, we determined to construct a proof of idea to higher perceive the advantages of Unified Grid and the work that might be required to ship it end-to-end.

At Slack, we name this prototyping the trail—that’s, constructing incrementally, proving out and refining our concepts as we go. As a result of we’re among the heaviest customers of Slack, we knew that if we may use Unified Grid in our day-to-day work, we’d begin getting good alerts about what did and didn’t work. And because the undertaking grew in maturity, we may choose in additional of our friends, gathering helpful suggestions from them.

First, we wanted to have the ability to boot the Slack shopper in Unified Grid mode, with an org-wide view of all of the person’s channels moderately than a workspace-scoped view. To this finish, we constructed a brand new boot API which returns information for all of the workspaces and channels the person belongs to throughout your entire Grid. We up to date shoppers to retailer this boot information on the org-level, since customers in Unified Grid now not navigate from the angle of a single Grid workspace at a time.

As soon as the shopper may boot, we up to date our homegrown API framework such that an API might be marked appropriate with the brand new Unified Grid shopper. We then started fixing APIs and client-side checks as we encountered points, prioritizing people who impacted our day-to-day work. We had a number of main methods for fixing damaged APIs:

  1. If an API didn’t depend on workspace context for routing—maybe as a result of it had been migrated to a brand new sharding scheme through the Vitess migration—we allowed it to be known as in Unified Grid and confirmed that the question nonetheless behaved appropriately. For instance, as a result of the messages desk is now sharded by channel ID, we may effectively fetch messages for a channel with out important modifications.
  2. If an API acted immediately on a workspace, we may typically immediate customers to pick a workspace after which move that workspace to the API. For instance, we up to date the channel creation circulate such that the person should choose the workspace wherein the channel must be created, for the reason that workspace can now not be inferred from the state of the shopper.
  3. Lastly, if all else failed, we may iterate over the person’s related workspaces, making an attempt to resolve the question in opposition to every workspace’s shard. As a result of most customers are in solely a handful of workspaces, this method is surprisingly performant. Nonetheless, there’s a lengthy tail of customers in lots of of workspaces. As a result of such customers are typically directors who don’t work together with all these workspaces, we determined to cap the variety of “related” workspaces at 50 and permit customers to manually configure this checklist. Limiting the related workspaces for every person ensures cheap efficiency and makes Slack usable for these outliers.
With Unified Grid, within the worst case the Slack backend queries the shard for each Enterprise Grid workspace the person belongs to when loading workspace-level information.

Though our prototype had numerous tough edges, we felt the advantage of decreased context switching and a less complicated UX. From there, we began opting in additional coworkers, finally inviting execs like our then-CEO Stewart Butterfield to attempt the brand new shopper. His suggestions summed up how we felt: “That is clearly higher.”

From prototype to manufacturing

As talked about above, Unified Grid probably impacted each API and permission test invoked by the Slack shopper. It might require important effort from scores of engineers throughout most of Slack’s product engineering groups to make sure these API and permission checks continued to behave appropriately. Concurrently, we have been constructing IA4, a redesign of the Slack shopper which launched our Exercise, DMs, and Later tabs. With a view to keep away from subjecting clients to separate massive modifications on the identical time, Unified Grid grew to become a foundational element of IA4, and with it a high firm precedence.

We started with spreadsheets itemizing all APIs which have been invoked by Slack shoppers in addition to all permission checks carried out by shoppers and the backend, dividing the work amongst numerous associated product groups. In step with prototyping the trail, we requested engineers to take two passes over every API: a primary move to make the API work effectively sufficient for inner utilization, after which—maybe weeks later—a second move to make sure the combination assessments, permissions checks and different edge-cases behaved appropriately. This two-phase method allowed us to manually confirm and get a really feel for performance which was not solely prepared for primetime.

The core staff now pivoted our work away from prototyping to extra scalably assist the migration effort with instruments and frameworks:

Docs: Most significantly, we put collectively an in depth information with step-by-step directions for making certain that an API behaves appropriately in Unified Grid, together with the methods for fixing APIs listed within the “Prototyping the trail” part. 

Assessments: We created a parallel integration take a look at suite which ran all our current integration assessments utilizing org context as a substitute of workspace context. This allow us to reuse hundreds of assessments moderately than rewriting them from the bottom up. As anticipated, lots of of take a look at suites have been damaged initially, offering us with a concrete checklist of take a look at suites to repair as a part of marking an API appropriate with Unified Grid.

Helpers: We added numerous comfort helpers to appropriately fetch channels and carry out permissions checks throughout all a person’s workspaces on their Enterprise Grid, on each shoppers and the backend. For instance, to test whether or not a person can act as an admin inside a cross-workspace channel, these helpers test whether or not the person is a workspace admin in any of the workspaces with which the channel is shared or is an admin on the org-level.

Consumer Infrastructure: Along with the work wanted to assist these permissions checks, shoppers additionally required new infrastructure emigrate workspace-scoped repositories to the brand new information mannequin. The shoppers solved this downside in several methods: some shoppers added an org-level information retailer however continued to avoid wasting information in workspace-scoped repositories, whereas different shoppers moved every thing to an org-wide retailer. These information migrations might be accomplished and shipped in parallel with the general Unified Grid undertaking, which allowed us to de-risk the undertaking itself.

We created a spreadsheet to trace the variety of APIs and permission checks we wanted to repair, and loved watching the graph have a tendency in the direction of zero.

Conclusion

By Summer time 2023, Unified Grid was in a spot the place a lot of the corporate was utilizing it for his or her day-to-day work. We started rolling out to clients in Fall 2023 and accomplished the rollout in March 2024. What had begun as a barely useful prototype was, nearly two years later, a core element of our redesigned shopper and a stable basis atop which to maintain innovating.

It’s a truism that you just shouldn’t try massive rewrites of current software program functions. However like all truisms, it’s solely nearly at all times true. Generally, when the structure of an software drifts far sufficient from how that software is used, prototyping a path in the direction of rewriting the core basis is definitely one of the best ways to attain your objectives.

Now that Unified Grid is reside, we’re excited to see what’s subsequent. What else could be constructed atop a extra versatile data structure? No matter it’s, we all know that we’ll be prototyping the trail to new, intuitive product experiences effectively into the long run. If that’s one thing that excites you too, come join us.