Airbnb Seattle Dataset from Kaggle

How can the effectiveness of marketing be improved?

This blog post should present, how the marketing effectiveness of Airbnb can be enhanced by the analysis of a dataset of 2016. In order to improve the marketing, the four Ps of the marketing mix should be addressed. The dataset contains listings of rented apartments and their attributes.

Marketing Mix — The four P’s


Marketing is about fulfilling customer needs and expectations. Within product policy, the aim is to understand one’s market and be able to figure out which needs and wants the customers to have. In general, one can say that the main need for travelers is to find accommodation but nowadays it is not only about finding accommodations but even more about discovering the right accommodation. It is not only having a nice and clean room bathroom, with white and clean towels and bedsheets.


The price of a product is a very important aspect regarding the marketing of a product. If customers consume a product on a certain price level, their willingness to pay is higher than the price level. This means that they expect the value they obtain by purchasing is higher than their cost and therefore they accept paying a certain price. So for a certain price level, the customer expects to obtain a certain value and therefore immediately raises some expectation the product needs to fulfill in order to satisfy the customer.


For promotion policy, it is important to find out when the advertisement should issue a marketing campaign and which content to bring up. The aim of this is obvious: I need to be able to make suggestions, which advertise content will address future visitors and when it will be best to effectively reach the customer. So far my suggestion mainly focused on suggestions for (former) lessors how they can improve renting their apartment.


The place policy considers where customers get in touch with the product and consume it, in order to find a suitable retail location that is accessible to customers. The first contact between lessor and renter happens on Airbnb but the final ‘purchase’ of the product, takes place in Seattle. As Airbnb is a platform, which acts as an agent between lessor in Seattle and renter, I do not need to care about this in our data analysis, as this fact is fixed and can not be changed.

Executive Summary of the Dataset

The aim of this report is to find out how the effectiveness of marketing ‘Airbnb Seattle’ can be improved. This report orients itself at the four P’s from Marketing Mix, which is Product, Price, Place, and Promotion.
The data originally consisted of three datasheets Listing, Reviews, and Calendar and had to be prepared for the analysis.

The analysis wants to answer the following 3 questions:

  • Which facts do lessors need to address in their description to raise the interest of the potential customers?
  • Which price can be charged for an apartment with certain characteristics?
  • When can lessors increase the price per night for their apartment and when should they lower it?

The analysis yields out which keywords lessor should use best to put into their accommodation description such as parks, shops, restaurants, etc.
Furthermore, the number of positive reviews is vital for possible renters being interested in ones listing. How long the lessors are active is also an important factor.
Host_response_time, host_response_rate, and host_acceptance_rate are the most important indicators if and at which price the accommodation gets booked. The number of bathrooms, bedrooms, and beds together have an influence on pricing as well.


The dataset consists of three excel sheets: listings.csv, calendar.csv, and reviews.csv.
Listings consist of 92 attributes with 3,818 data entries. Every single row represents one apartment in Seattle that has been offered for rent via the platform of Airbnb.
The calendar consists of 1,393,570 data entries and 4 columns. The dataset connects a certain time period with an apartment and indicates whether it has been rented out or been available during that period. The last dataset reviews.csv contains all reviews former visitors have handed in for an apartment, it contains the reviewer’s Id, name comments, the date as well as the house ID.So, if the excel sheets are combined it can be deducted the following information: the key facts about an apartment (like the size, price, number of beds, usable facilities, …), furthermore one can get a general idea about, what the apartment looks like by the description (written by the lessor) as well as by the reviews (written by former guests) and it is known when the apartment has been available or rented.

Further investigation for the three datasets


The listing dataset consists of 92 columns, with attributes that describe different characteristics of the rented apartments. It describes 3818 apartments that are allocated in 79 neighborhoods. Combined with the review dataset it represents a dataset, which can be used to find some attributes, valued as most important by customers. If it is combined with the calendar dataset, data analysis can help to determine the factors which are important too, not only attract visitors but also to finally rent it out successfully. Especially we will use this combination to find out which attributes of the apartments did influence the price charged the most. The following image shows the districts with the most offers, split by their room type.

Seven districts with most accommodations split by room-type


In order to have an insight into whether the reviews are written by satisfied or rather unsatisfied visitors, we computed a bar chart that shows the rating-number of reviews ratio.

Occurrences of review scores

Besides, it shows that most of the reviews give the highest value possible (10) and there are hardly any reviews that give a rating value lower than (8). So, I further assume that the reviews are written in a rather positive tone and we need to keep in mind that the reviews have been written by overall satisfied tourists. B
Another comparison shows the number of reviews depending on the neighborhood. We can see a direct correlation between those two features. Apparently there is no district that gets fewer reviews because of any special reason.

Compare review and accommodation count


This part of the data shows, when certain apartments are available and when they are rented out. Within my analysis I assume, that an apartment, which is listed as not available in the calender.csv is occupied by a visitor. If it is available I assume that the owners would have liked to rent the apartment out but there has not been any visitor renting it.

Occupation ratio of appartments

While in January only half of the apartments have been occupied, the number of rented apartments raised until April when a sudden drop in rented apartments appeared. The same procedure repeats in July. After that second drop, the number of occupied accommodations raises again. Nevertheless, it has to be mentioned that the percentage of free apartments never exceeds 40% after February anymore but also never drops below 20%. At this point the question, whether there is a correlation between the degree of booking and a perfect description exists and therefore the percentage of free apartments can be decreased by mentioning the right aspects in the description.

Business Questions

Which facts do lessors need to address in their description to raise the interest of the potential customers?

In order to know what words the lessors are supposed to use in the description, I took a look at the reviews with a wordcloud. The bigger the words in the wordcloud, the more often the word was used. For example, if the customers often write the word walk in the reviews due to informing what was within walking distance of the Airbnb, the work walk will be big in the wordcloud.

To answer this question the six most reviewed cities Capitol Hill, Ballard, Queen Anne, Belltown, Minor, Wallingford were analyzed. First, you can see that all cities have the words walk, park and restaurant written in big. The word “downtown” is also used often. For Minor, the word “lake” is very important. The rest of the words indicate that words like a minute, shop, market are used often by customers. Interestingly, the word “Washington” is important for the cities Minor and Capitol Hill.

Which price can be charged for an apartment with certain characteristics?

First of all, I created a Linear Regression model in order to predict the price influenced by all other attributes. For this prediction I achieved a Mean Absolute Error of 44.28, which is, taking into account that the minimum rent is 10 dollars and maximum rent per night 1,650$, very low. Therefore, I can assume that the listed attributes have a significant impact on the price of accommodation and it is possible to estimate the rental price by these attributes and therefore it makes sense to use a classification tree, which takes these attributes into account.

The second important attribute has by showing 0.112 as the correlation coefficient. This attribute indicates how long the lessor is already registered as a host on Airbnb.

A third important factor, with a correlation coefficient of a little bit less than 0.1 the number of extra people influences the price. Other attributes that influence the price are for example the number of listings a host has/ had, number of beds, number of baths- and bedrooms. The least significant attribute in our visualization is, whether the guest needs a profile picture (with a coefficient of about 0.01).

When can lessors increase the price per night for their apartment and when should they lower it?

To answer this question, I want to predict the price changes for 2017. First of all, I started by checking out the change in mean flat prices in 2016.

Mean flat prices

Prices increase from January to July from about 120 to 150 dollars and decrease until November to 135 dollars. During the last month prices slightly start to increase again. It can be seen that the correlation between charged prices and the occupation ratio is rather low, as the first drop of the occupation ratio in April does not reflect in a price drop. Nevertheless, the second decrease in the number of rented apartments does reflect in the charged prices as also the price curve decreases after July. But during the month of December, prices increase as well as the number of free apartments decreases. This irregularity might be due to the fact that many people travel during Christmas time and therefore their willingness to pay is slightly increased.


The main goal of this blog post is to give some suggestions, on how to apply marketing-mix strategies effectively. Hereby the focus is on the Product, Price, and Promotion policy.
Regarding the Product policy the paper shows, it is of interest which characteristics of the accommodations visitors’ value high. In order to obtain a clearer result, the six neighborhoods with the highest number of reviews were analyzed. A result should be obtained by conducting a word cloud that displays the most used 50 words within all reviews. The size of the displayed words depended on the number of occurrences of this word in a review for the particular neighborhood. The analysis shows that especially words like “walk”, “park”, “lake”, “downtown”, “minute”, “shop” and “market” occur quite often. Lessors should definitely include these words in their description in order to make sure to address the right people, who will enjoy their stay in the apartment and the surrounding area. This has the positive impact that a satisfied customer will use Airbnb also for his next booking of accommodation, which increases the revenue of Airbnb. As the analysis shows, most of the reviews are very positive. So it might also exist a correlation between the satisfaction of a customer and whether he writes a review or not. If this is the case, it can be assumed that a satisfied customer will write a positive review and this will be positive for the lessor, as his number of reviews increases, which increases the price the lessor can charge. This is automatically a question, which can be kept in mind for further investigation. To check whether there really exists a correlation between these two variables or not and if there is a connection, why it exists.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tim Löhr

Tim Löhr

Machine Learning Engineer @Siemens and Computer Science Master Student @University of Erlangen-Nürnberg.