June 21, 2024

How Airbnb leverages ML to derive visitor curiosity from unstructured textual content information and supply customized suggestions to Hosts

By: Joy Jing and Jing Xia

At Airbnb, we endeavor to construct a world the place anybody can belong wherever. We try to know what our friends care about and match them with Hosts who can present what they’re on the lookout for. What higher supply for visitor preferences than the friends themselves?

We constructed a system known as the Attribute Prioritization System (APS) to hearken to our friends’ wants in a house: What are they requesting in messages to Hosts? What are they commenting on in opinions? What are widespread requests when calling buyer help? And the way does it differ by the house’s location, property sort, worth, in addition to friends’ journey wants?

With this customized understanding of what dwelling facilities, services, and placement options (i.e. “dwelling attributes”) matter most to our friends, we advise Hosts on which dwelling attributes to accumulate, merchandize, and confirm. We will additionally show to friends the house attributes which can be most related to their vacation spot and wishes.

We do that by way of a scalable, platformized, and data-driven engineering system. This weblog submit describes the science and engineering behind the system.

What do friends care about?

First, to find out what issues most to our friends in a house, we have a look at what friends request, touch upon, and make contact with buyer help about essentially the most. Are they asking a Host whether or not they have wifi, free parking, a personal scorching tub, or entry to the seaside?

To parse this unstructured information at scale, Airbnb constructed LATEX (Listing ATtribute EXtraction), a machine studying system that may extract dwelling attributes from unstructured textual content information like visitor messages and opinions, buyer help tickets, and itemizing descriptions. LATEX accomplishes this in two steps:

The named entity recognition (NER) module makes use of textCNN (convolutional neural network for text) and is educated and nice tuned on human labeled textual content information from varied information sources inside Airbnb. Within the coaching dataset, we label every phrase that falls into the next 5 classes: Amenity, Exercise, Occasion, Particular POI (i.e. “Lake Tahoe”), or generic POI (i.e. “submit workplace”).

The entity mapping module makes use of an unsupervised studying strategy to map these phrases to dwelling attributes. To realize this, we compute the cosine distance between the candidate phrase and the attribute label within the fine-tuned phrase embedding house. We take into account the closest mapping to be the referenced attribute, and may calculate a confidence rating for the mapping.

We then calculate how incessantly an entity is referenced in every textual content supply (i.e. messages, opinions, customer support tickets), and mixture the normalized frequency throughout textual content sources. House attributes with many mentions are thought-about extra essential.

With this technique, we’re in a position to achieve perception into what friends are excited about, even highlighting new entities that we might not but help. The scalable engineering system additionally permits us to enhance the mannequin by onboarding extra information sources and languages.

An example of a listing’s description with keywords highlighted and labeled by the Latex NER model.
An instance of a list’s description with key phrases highlighted and labeled by the Latex NER mannequin.

What do friends care about for several types of houses?

What friends search for in a mountain cabin is totally different from an city house. Gaining a extra full understanding of friends’ wants in an Airbnb dwelling allows us to supply extra customized steerage to Hosts.

To realize this, we calculate a novel rating of attributes for every dwelling. Based mostly on the traits of a house–location, property sort, capability, luxurious stage, and so on–we predict how incessantly every attribute shall be talked about in messages, opinions, and customer support tickets. We then use these predicted frequencies to calculate a personalized significance rating that’s used to rank all doable attributes of a house.

For instance, allow us to take into account a mountain cabin that may host six folks with a median day by day worth of $50. In figuring out what’s most essential for potential friends, we be taught from what’s most talked about for different houses that share these identical traits. The consequence: scorching tub, fireplace pit, lake view, mountain view, grill, and kayak. In distinction, what’s essential for an city house are: parking, eating places, grocery shops, and subway stations.

Picture: An instance picture of a mountain cabin dwelling
An example of home attributes ranked for a mountain cabin vs an urban apartment.
An instance of dwelling attributes ranked for a mountain cabin vs an city house.
Picture: An instance of an city house dwelling

We might straight mixture the frequency of key phrase utilization amongst related houses. However this strategy would run into points at scale; the cardinality of our dwelling segments might develop exponentially massive, with sparse information in very distinctive segments. As a substitute, we constructed an inference mannequin that makes use of the uncooked key phrase frequency information to deduce the anticipated frequency for a section. This inference strategy is scalable as we use finer and extra dimensions to characterize our houses. This permits us to help our Hosts to greatest spotlight their distinctive and numerous assortment of houses.

How can friends’ preferences assist Hosts enhance?

Now that we’ve got a granular understanding of what friends need, we may also help Hosts showcase what friends are on the lookout for by:

However to make these suggestions related, it’s not sufficient to know what friends need. We additionally have to be positive about what’s already within the dwelling. This seems to be trickier than asking the Host because of the 800+ dwelling attributes we accumulate. Most Hosts aren’t in a position to instantly and precisely add the entire attributes their dwelling has, particularly since facilities like a crib imply various things to totally different folks. To fill in a number of the gaps, we leverage friends suggestions for facilities and services they’ve seen or used. As well as, some dwelling attributes can be found from reliable third events, akin to actual property or geolocation databases that may present sq. footage, bed room depend, or if the house is overlooking a lake or seaside. We’re in a position to construct a very full image of a house by leveraging information from our Hosts, friends, and reliable third events.

We make the most of a number of totally different fashions, together with a Bayesian inference mannequin that will increase in confidence as extra friends affirm that the house has an attribute. We additionally leverage a supervised neural community WiDeText machine studying mannequin that makes use of options concerning the dwelling to foretell the probability that the subsequent visitor will affirm the attribute’s existence.

Along with our estimate of how essential sure dwelling attributes are for a house, and the probability that the house attribute already exists or wants clarification, we’re in a position to give customized and related suggestions to Hosts on what to accumulate, merchandize, and make clear when selling their dwelling on Airbnb.

Cards shown to Hosts to better promote their listings.
Playing cards proven to Hosts to higher promote their listings.

What’s subsequent?

That is the primary time we’ve identified what attributes our friends need right down to the house stage. What’s essential varies significantly primarily based on dwelling location and journey sort.

This full-stack prioritization system has allowed us to offer extra related and customized recommendation to Hosts, to merchandize what friends are on the lookout for, and to precisely signify common and contentious attributes. When Hosts precisely describe their houses and spotlight what friends care about, friends can discover their excellent trip dwelling extra simply.

We’re at the moment experimenting with highlighting facilities which can be most essential for every sort of dwelling (i.e. kayak for mountain cabin, parking for city house) on the house’s product description web page. We imagine we are able to leverage the information gained to enhance search and to find out which dwelling attributes are most essential for various classes of houses.

On the Host aspect, we’re increasing this prioritization methodology to embody extra ideas and insights into how Hosts could make their listings much more fascinating. This contains actions like liberating up common nights, providing reductions, and adjusting settings. By leveraging unstructured textual content information to assist friends join with their excellent Host and residential, we hope to foster a world the place anybody can belong wherever.

If any such work pursuits you, take a look at a few of our associated positions at Careers at Airbnb!

It takes a village to construct such a sturdy full-stack platform. Particular due to (alphabetical by final title) Usman Abbasi, Dean Chen, Guillaume Guy, Noah Hendrix, Hongwei Li, Xiao Li, Sara Liu, Qianru Ma, Dan Nguyen, Martin Nguyen, Brennan Polley, Federico Ponte, Jose Rodriguez, Peng Wang, Rongru Yan, Meng Yu, Lu Zhang for his or her contributions, dedication, experience, and thoughtfulness!