The IMPORTXML function in Google Sheets is useful if you want to import data from structured data types such as HTML and XML.
Meaning, the IMPORTXML function allows you to import data from websites for data scraping purposes.
Table of Contents
The rules for using the IMPORTXML function in Google Sheets are as follows:
- Both arguments of the IMPORTXML function need to be enclosed with quotation marks or can be a cell reference.
- The higher the number of results from the website, the longer it may take to finish the data import.
- The IMPORTXML function returns arrays of information.
- The IMPORTXML function will return a #REF! Error if the expected range of results isn’t clear for any values.
- To fully utilize the use of IMPORTXML function, it requires basic knowledge of HTML or XML.
Let’s take an example.
Rein is tasked to identify all the cities in the United Kingdom and their regions. Her supervisor explicitly instructed her to get this information from the Wikipedia website.
While she can type them in Google Sheets manually, there are 69 cities listed in Wikipedia and it may take her a while to pull the list.
Rein remembered that there’s a function in Google Sheets that does exactly what she needs to do. So, she created the file and put the function to work.
See below her file:
The IMPORTXML function saved her from doing the manual task.
Let’s have another example!
Robin’s supervisor asked him to create a Google Sheet file that monitors the top 15 most popular books published in 2021 on the website www.goodreads.com.
Instead of going on the website to check the ranking daily, Robin used the IMPORTXML function to import the ranking made by the website.
See the file he created below:
The good thing about using the IMPORTXML function is that the information being imported is exactly what’s available on the source website.
Watch out for a more advanced tutorial and examples on how you can use the IMPORTXML function in the coming weeks. Be sure to subscribe to be notified.
Perfect! Let’s begin getting to know more about our IMPORTXML function in Google Sheets.
The Anatomy of the IMPORTXML Function
So the syntax (the way we write) the IMPORTXML function is as follows:
Let’s dissect this thing and understand what each of these terms means:
- = the equal sign is just how we start any function in Google Sheets. It is how Google Sheets understand that we are asking it to either do computation or use a function.
- IMPORTXML() this is our IMPORTXML function. The IMPORTXML function import datas from any web page.
- url is the address (uniform resource locator) to the structured page on the web, from which you’re importing data.
- xpath_query is the XPath query to run on the structured data. This tells the function what kind of information we are trying to import.
A Real Example of Using IMPORTXML Function
Take a look at the UK cities file below to see how the IMPORTXML function is used in Google Sheets.
Note that there are just two arguments in the IMPORTXML function. One answers where to get and the other one answers what to get.
The first argument is the website address where you want the function to search for information. It should be the exact website page and is enclosed with quotation marks.
Rein could have used a cell reference where the website address is located. In which case, she no longer needs to enclose the argument with quotation marks:
Now, let’s proceed to talk about the second argument in the IMPORTXML function above:
This is the xpath_query, which tells the function of what information to pull from the website you provided in the first argument.
You can learn more about the wide range of options to be used for this argument in this article.
In the meantime, let’s continue to discuss the xpath_query used in our example.
Visiting the website URL provided as the first argument, you’ll notice that the information we need is located in columns of a table:
In HTML, rows correspond to TR tags and columns correspond to TD tags. Take a look at the location of the city name in the first column. The city name is always the first hyperlink in the first column of each row.
Translating it to Xpath_query, it should be:
//each row/first column/first hyperlink
Furthermore, it translates to:
The “” means you instructed the function to give you the first instance of something. In this case, the “td” means you’re asking for the first instance of the column, which is where the city is located.
Now, inside the first column of every row, some values don’t necessarily mean the city.
So, we have to instruct the function to pull only the first instance of a hyperlink in every row in the first column. In HTML, the tags for hyperlinks are <a></a>. Hence, the “a”.
The return values should be the city name in the table:
Now, let’s try another example!
On the same website source, we will try to pull the nation or region of each UK city.
The information is no longer located in the first column of each row. Instead, it is in the fifth column.
Furthermore, the nation/region is the only information in this column and is in hyperlink format.
So, to translate that:
//each row/fifth column/hyperlink
The xpath_query should be:
We are instructing the function to pull the data which contains hyperlinks in the fifth column of each row. Please note that there’s no need to put “” after the “a” tag since there’s only 1 hyperlink value inside the fifth column of each row.
So, the result should be the following:
You may make a copy of the spreadsheet using the link I have attached below.
How to Use IMPORTXML Function in Google Sheets
- Click on any cell to make it the active cell. For this guide, I will be selecting A2, where I want to show the resulting array of information.
- Next, type the equal sign ‘=‘ to begin the function and then follow it with the name of the function, which is our ‘IMPORTXML‘ (or ‘IMPORTXML‘, not case sensitive like our other functions).
- Type open parenthesis ‘(‘ or simply hit Tab key to let you use that function.
- Now the exciting part! Let’s give our function its first argument, the url. Type in the quotation mark (“) and follow it with the website address. In this case, I will be using ‘https://en.wikipedia.org/wiki/List_of_cities_in_the_United_Kingdom‘. Don’t forget to close it with another quotation mark (“).
- To let the Google sheet know that we’re done typing our first argument, we should now type in the delimiter or the character that separates each argument on a function. In this case, type comma ‘,’.
- Type in our second argument, which is the xpath_query. Type in quotation mark (“) and follow it with ‘//tr/td/a‘.
- Finally, hit your Enter or Tab key. Cell A2:A70 will now show you the resulting string or the return value of the IMPORTXML function, which are the cities from your website source.
- Now, let’s extract the other information, the Nation/Region, from the website.
- On cell B2, repeat steps 1 to 5.
- Type in our second argument, which is the xpath_query to extract the nation/region from our source website. Type in quotation mark (“) and follow it with ‘//tr/td/a‘. Don’t forget to close it with another quotation mark (“).
- Finally, hit your Enter or Tab key. Cell B2:B70 will now show you the resulting string or the return value of the IMPORTXML function, which are the regions from your website source.
That’s pretty much it. You can now use the IMPORTXML function in Google Sheets together with the other numerous Google Sheets formulas to create even more powerful formulas that can make your life much easier.