{"id":2562,"date":"2020-06-19T22:50:27","date_gmt":"2020-06-19T15:50:27","guid":{"rendered":"https:\/\/www.indowhiz.com\/articles\/?p=2562"},"modified":"2022-10-15T13:58:31","modified_gmt":"2022-10-15T06:58:31","slug":"sitemap-couldnt-be-read","status":"publish","type":"post","link":"https:\/\/www.indowhiz.com\/articles\/en\/sitemap-couldnt-be-read\/","title":{"rendered":"Fixing Issue on Access to Sitemap"},"content":{"rendered":"\n<p>A sitemap is an important tool to aid search engines (e.g., Google, Bing, Yandex) in indexing your web pages. Generally, search engines recognize several common sitemap formats, including XML, RSS, Atom, or TXT <span id=\"a9631802-64f8-4b15-8f4f-8b1f18b7a457\" data-items=\"[&quot;2300740570&quot;]\" class=\"abt-citation\" contenteditable=\"false\">\u200b[1]\u200b<\/span>. Usually, you need to prior register your sitemap in the Webmaster Tool (e.g., Google Search Console), before they can start indexing your web pages.<\/p>\n\n\n\n<p>However, registering a sitemap is not 100% error-free. Website administrators may experience unexpected things. For example, a sitemap cannot be accessed or read by Webmaster Tools on some search engines. Many factors can cause this issue, including the sitemap problem or unreadable due to a 403 error (access denied).<\/p>\n\n\n\n<p><em>Note: the 403 error (access denied), means that the search engine does not have permission to access your sitemap page.<\/em><\/p>\n\n\n\n<!--more-->\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img decoding=\"async\" width=\"798\" height=\"407\" src=\"https:\/\/www.indowhiz.com\/articles\/wp-content\/uploads\/2020\/06\/sitemap-error-403-google-console.jpg\" alt=\"Google Search Console cannot read the sitemap\" class=\"wp-image-2541\"\/><figcaption>Figure 1. Google Search Console cannot read the sitemap<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">1. Checking access to the sitemap<\/h2>\n\n\n\n<p>There are several ways to check whether a search engine bot could access and crawl your sitemap. You can try to use any websites that offer bot-checking services or use chrome by changing its user agent. Generally, these services are specifically for sitemaps in XML format. But it never hurts to try if you want to check any sitemap format other than XML.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">A. Using the sitemap checker site<\/h3>\n\n\n\n<p>Many sites offer services to check the access to sitemap, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/www.xml-sitemaps.com\/http-headers-viewer.html\" target=\"_blank\" rel=\"noreferrer noopener\">XML-Sitemaps.com<\/a> (select <code>Google<\/code> in <code>User Agent<\/code>)<\/li><li>Google <a href=\"https:\/\/www.google.com\/webmasters\/tools\/robots-testing-tool\" target=\"_blank\" rel=\"noreferrer noopener\">Robots.txt Tester<\/a> <\/li><li>Google <a href=\"https:\/\/search.google.com\/search-console?action=inspect\" target=\"_blank\" rel=\"noreferrer noopener\">URL Inspection Tool<\/a><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">B. Using the Chrome browser<\/h3>\n\n\n\n<p>Alternatively, we can use the Chrome browser to check access to the sitemap. To this end, you could change the user agent to <code>Googlebot<\/code>, which is the Google&#8217;s main crawler<span data-items=\"[&quot;1135713584&quot;]\" id=\"c73af996-a57e-41e7-8e29-6a92d08b6b82\" class=\"abt-citation\"> [2]<\/span>.<\/p>\n\n\n\n<p>First, open the <code>developer tools<\/code> as follows:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>on the Chrome browser, click the menu or <code>\u22ee<\/code> button in the upper-right of your browser,<\/li><li>select <code>More tools<\/code> &gt; <code>Developer tools<\/code>.<\/li><\/ol>\n\n\n\n<p>Then you will see a screen similar to Figure 2 in your Chrome browser.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img decoding=\"async\" width=\"537\" height=\"436\" src=\"https:\/\/www.indowhiz.com\/articles\/wp-content\/uploads\/2020\/06\/chrome-developer-tools.png\" alt=\"Developer tools on the Chrome browser\" class=\"wp-image-2549\"\/><figcaption>Figure 2. Developer tools on the Chrome browser<\/figcaption><\/figure>\n\n\n\n<p>Second, change the <code>user agent<\/code> as follows:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>click <code>\u22ee<\/code> button in the lower-left of developer tools (in the bottom menu near the <code>Console<\/code> tab),<\/li><li>click <code>Network conditions<\/code>,<\/li><li>in the <code>user agent<\/code> box, uncheck the <code>Select automatically<\/code>,<\/li><li>then select <code>Googlebot<\/code> from the dropdown option.<\/li><\/ol>\n\n\n\n<p>After changing the <code>user agent<\/code>, check whether you can access your sitemap, as illustrated in Figure 3.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"799\" height=\"471\" src=\"https:\/\/www.indowhiz.com\/articles\/wp-content\/uploads\/2020\/06\/googlebot-chrome-access-403.jpg\" alt=\"Checking Googlebot access using chrome\" class=\"wp-image-2553\"\/><figcaption>Figure 3. Checking Googlebot access using chrome<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">C. Waiting for the next crawling<\/h3>\n\n\n\n<p>You may have tried sections 1-A and 1-B above, and there are no problems accessing the sitemap. In that case, please wait for the search engine crawlers to read the sitemap in the next few days.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"764\" height=\"520\" src=\"https:\/\/www.indowhiz.com\/articles\/wp-content\/uploads\/2020\/06\/check-sitemap-website.png\" alt=\"Access to sitemap does not show any problems (response code 200)\" class=\"wp-image-2607\"\/><figcaption>Figure 4. Access to the sitemap does not show any problems (response code 200)<\/figcaption><\/figure>\n\n\n\n<p>For example, resubmitting a sitemap to Google&#8217;s Webmaster Tool may show a &#8220;<code>couldn't fetch<\/code>&#8221; error. Yet, there is no problem with your sitemap access, as in subsections 1-A and 1-B above. In that case, wait for the next &#8220;<code>last read<\/code>&#8221; date, as Google reads the sitemap once in several days. Generally, there are no further problems after that.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/www.indowhiz.com\/articles\/wp-content\/uploads\/2022\/06\/wait-for-next-last-read-sitemap-google-search-console.jpg\" alt=\"Wait for the next &quot;last read&quot; date\" class=\"wp-image-5817\" width=\"840\" height=\"600\"\/><figcaption>Figure 5. Wait for the next &#8220;last read&#8221; date<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">2. Fixing access to the sitemap<\/h2>\n\n\n\n<p>If the search engine cannot access or read your sitemap (e.g., 404 or 403 error), some methods may be worth trying.<\/p>\n\n\n\n<p class=\"has-ast-global-color-6-background-color has-background\"><strong>Before trying to solve the issue, create a complete backup of your website!<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">A. Make sure the sitemap address is correct<\/h3>\n\n\n\n<p>Before trying various technical things, try opening your sitemap URL directly in the browser. If you get a 404 error, there are several possibilities, including:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Incorrect sitemap URL<\/strong>; try to double-check your sitemap URL.<\/li><li><strong>Your site failed to generate a sitemap<\/strong>; If you&#8217;re using a CMS (e.g., WordPress) and a sitemap generator plugin, there might be an issue with the sitemap plugin settings.<\/li><\/ul>\n\n\n\n<p><em>Note: if you use chrome and change the user agent as in Subsection 1-B above, then first reset the user agent by checking <code>Select automatically<\/code>.<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">B. Use only one sitemap generator<\/h3>\n\n\n\n<p>In the case of WordPress CMS, many people may install an SEO plugin (e.g., Yoast, RankMath) and a sitemap generator (e.g., Google XML Sitemaps). Because both of them may generate sitemaps independently, this may interfere with each other.<\/p>\n\n\n\n<p>Therefore, make sure only one plugin is allowed to generate the sitemap. After that, re-check the access to the sitemap using the sitemap checker or Chrome, as in Section 1.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">C. Check <code>robot.txt<\/code> and <code>.htaccess<\/code> files<\/h3>\n\n\n\n<p>Either <code>robot.txt<\/code> or <code>.htaccess<\/code> file can block crawlers (e.g., <code>Googlebot<\/code>) from reading the sitemap <span id=\"fcc379e3-dc2c-4b56-9c2c-69d40199a3e6\" data-items=\"[&quot;2727978432&quot;]\" class=\"abt-citation\" contenteditable=\"false\">\u200b[3]\u200b<\/span>. An example of <code>robot.txt<\/code> that blocks <code>Googlebot<\/code> <span id=\"380bf640-c854-4edb-b6f8-2d6a23f32dd9\" data-items=\"[&quot;3206051884&quot;]\" class=\"abt-citation\" contenteditable=\"false\">\u200b[4]\u200b<\/span>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"markdown\" class=\"language-markdown line-numbers\">User-agent: Googlebot\nDisallow: \/<\/code><\/pre>\n\n\n\n<p>The code above means that the <code>Googlebot<\/code> cannot access the entire site. If you want to allow <code>Googlebot<\/code> to access your site, delete these two consecutive lines.<\/p>\n\n\n\n<p>In addition, check the <code>.htaccess<\/code> file for the following code (or similar):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code lang=\"apacheconf\" class=\"language-apacheconf line-numbers\">RewriteEngine on\nRewriteCond %{HTTP_USER_AGENT} Googlebot [OR]\nRewriteCond %{HTTP_USER_AGENT} msnbot [OR]\nRewriteCond %{HTTP_USER_AGENT} yandexbot\nRewriteRule ^.*$ \"https\\:\\\/\\\/www\\.indowhiz\\.com\" [R=301,L]<\/code><\/pre>\n\n\n\n<p>If it exists, try to delete it temporarily. Then, re-check the access to the sitemap using the sitemap checker or Chrome, as in Section 1.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">D. Check CDN settings<\/h3>\n\n\n\n<p>Many well-known sites, including Google Cloud CDN, AWS, Cloudflare, and QUIC.cloud, offer Content Delivery Network (CDN) services. Sometimes some problems occur due to cache, firewall, or settings on the CDN.<\/p>\n\n\n\n<p>To check whether you have a CDN issue, try disabling the CDN. Then, re-check the access to the sitemap using the sitemap checker or Chrome, as in Section 1. There are two possibilities after disabling the CDN:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>The sitemap is accessible<\/strong>. It means that the CDN causes the issue, which blocks access to the sitemap. You may need to verify and adjust your cache, firewall, and other settings that may cause the issue. If you have trouble adjusting them, try asking for some help from the CDN provider.<\/li><li><strong>The sitemap is not yet accessible<\/strong>. It means that your CDN may or may not cause the issue. Because there is a possibility that not only the CDN is causing the issue, we can not be sure that your CDN settings are fine. In this case, we suggest you keep the CDN disabled until you solve the access issue using other methods.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">E. Check CMS security plugins<\/h3>\n\n\n\n<p>Use of CMS (e.g., WordPress), usually along with a security plugin (e.g., Wordfence, Sucuri, or iThemes). Yet, this is a double-edged sword for website administrators. Security plugins can be helpful but can also be annoying.<\/p>\n\n\n\n<p>To check whether or not the CMS security plugin causes the issue, try disabling it. Then, re-check the access to the sitemap using the sitemap checker or Chrome, as in Section 1. There are two possibilities after disabling the CMS security plugin:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>The sitemap is accessible<\/strong>. It means that the CMS security plugin causes the issue, which blocks access to the sitemap. You may need to verify and adjust the settings of the security plugin. If you have trouble adjusting them, try asking for some help from the plugin&#8217;s developer.<\/li><li><strong>The sitemap is not yet accessible<\/strong>. The CMS security plugin may or may not cause the issue. There is a possibility that not only the CMS security plugin causes the problem. We can not be sure that your CMS security plugin settings are fine. In this case, we suggest you keep the CMS security plugin disabled until you solve the access issue using other methods.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">F. Check WAF settings<\/h3>\n\n\n\n<p>Use of a web control panel (e.g., CPanel, Plesk, or WHM), usually along with a Web Application Firewall (WAF) (e.g., ModSecurity or Imunify360). In rare cases, WAF such as ModSecurity blocks bot access (even <code>Googlebot<\/code>) because it is considered spam or a dangerous bot.<\/p>\n\n\n\n<p>To check whether or not the WAF settings cause the issue, try disabling it. Then, re-check the access to the sitemap using the sitemap checker or Chrome, as in Section 1.<\/p>\n\n\n\n<p>If we can access the sitemap after disabling WAF, it means the WAF settings cause the issue, which blocks access to the sitemap. You may need to verify and adjust the WAF settings. If you have trouble adjusting them, try asking for some help from your hosting provider.<\/p>\n\n\n\n<p>Generally, CDN offers online protection through its WAF. Therefore, check whether your CDN has WAF. If the CDN offers WAF to you, it is safe to disable WAF on your server.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">G. Ask for help<\/h3>\n\n\n\n<p>Maybe you have tried all of the methods above but to no avail. So, you may need help from your hosting provider to solve the issue. Alternatively, you can ask a professional to solve your problem.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<section aria-label=\"Bibliography\" class=\"wp-block-abt-bibliography abt-bibliography\" role=\"region\"><ol class=\"abt-bibliography__body\" data-maxoffset=\"3\" data-linespacing=\"1\" data-second-field-align=\"flush\"><li id=\"2300740570\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">[1]<\/div><div class=\"csl-right-inline\">Google, \u201cManage your sitemaps: Sitemaps report,\u201d <i>Search Console Help<\/i>. <a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/support.google.com\/webmasters\/answer\/7451001?hl=en\">https:\/\/support.google.com\/webmasters\/answer\/7451001?hl=en<\/a> (accessed Jun. 19, 2020).<\/div>\n  <\/div>\n<\/li><li id=\"1135713584\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">[2]<\/div><div class=\"csl-right-inline\">Google, \u201cOverview of Google crawlers (user agents),\u201d <i>Search Console Help<\/i>. <a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/support.google.com\/webmasters\/answer\/1061943?hl=en\">https:\/\/support.google.com\/webmasters\/answer\/1061943?hl=en<\/a> (accessed Jun. 19, 2020).<\/div>\n  <\/div>\n<\/li><li id=\"2727978432\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">[3]<\/div><div class=\"csl-right-inline\">A. Gent, \u201cHow to Check XML Sitemaps are Valid,\u201d <i>DeepCrawl<\/i>, Apr. 10, 2019. <a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/www.deepcrawl.com\/knowledge\/guides\/check-xml-sitemaps-are-valid\/\">https:\/\/www.deepcrawl.com\/knowledge\/guides\/check-xml-sitemaps-are-valid\/<\/a> (accessed Jun. 19, 2020).<\/div>\n  <\/div>\n<\/li><li id=\"3206051884\">  <div class=\"csl-entry\">\n    <div class=\"csl-left-margin\">[4]<\/div><div class=\"csl-right-inline\">Remiz, \u201cBlock Google and bots using htaccess and robots.txt,\u201d <i>HTML Remix<\/i>, May 03, 2011. <a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/www.htmlremix.com\/seo\/block-google-and-bots-using-htaccess-and-robots-txt\">https:\/\/www.htmlremix.com\/seo\/block-google-and-bots-using-htaccess-and-robots-txt<\/a> (accessed Jun. 19, 2020).<\/div>\n  <\/div>\n<\/li><\/ol><\/section>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A sitemap is an important tool to aid search engines (e.g., Google, Bing, Yandex) in indexing your web pages. Generally, search engines recognize several common sitemap formats, including XML, RSS, Atom, or TXT \u200b[1]\u200b. Usually, you need to prior register your sitemap in the Webmaster Tool (e.g., Google Search Console), before they can start indexing [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2543,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"no","_lmt_disable":"no","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[504],"tags":[436,437,435,440,439,433,438,389],"class_list":["post-2562","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cyberspace","tag-436","tag-bot","tag-error","tag-googlebot","tag-search-engine","tag-sitemap","tag-webmaster-tool","tag-wordpress"],"modified_by":"Philip F. E. Adipraja","_links":{"self":[{"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/posts\/2562","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/comments?post=2562"}],"version-history":[{"count":2,"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/posts\/2562\/revisions"}],"predecessor-version":[{"id":6512,"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/posts\/2562\/revisions\/6512"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/media\/2543"}],"wp:attachment":[{"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/media?parent=2562"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/categories?post=2562"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.indowhiz.com\/articles\/wp-json\/wp\/v2\/tags?post=2562"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}