Showing posts with label intermediate. Show all posts
Showing posts with label intermediate. Show all posts

Thursday, February 26, 2015

Finding more mobile-friendly search results

Finding more mobile-friendly search results

Webmaster level: all

When it comes to search on mobile devices, users should get the most relevant and timely results, no matter if the information lives on mobile-friendly web pages or apps. As more people use mobile devices to access the internet, our algorithms have to adapt to these usage patterns. In the past, we’ve made updates to ensure a site is configured properly and viewable on modern devices. We’ve made it easier for users to find mobile-friendly web pages and we’ve introduced App Indexing to surface useful content from apps. Today, we’re announcing two important changes to help users discover more mobile-friendly content:

1. More mobile-friendly websites in search results

Starting April 21, we will be expanding our use of mobile-friendliness as a ranking signal. This change will affect mobile searches in all languages worldwide and will have a significant impact in our search results. Consequently, users will find it easier to get relevant, high quality search results that are optimized for their devices.

To get help with making a mobile-friendly site, check out our guide to mobile-friendly sites. If you’re a webmaster, you can get ready for this change by using the following tools to see how Googlebot views your pages:

  • If you want to test a few pages, you can use the Mobile-Friendly Test.
  • If you have a site, you can use your Webmaster Tools account to get a full list of mobile usability issues across your site using the Mobile Usability Report.

2. More relevant app content in search results

Starting today, we will begin to use information from indexed apps as a factor in ranking for signed-in users who have the app installed. As a result, we may now surface content from indexed apps more prominently in search. To find out how to implement App Indexing, which allows us to surface this information in search results, have a look at our step-by-step guide on the developer site.

If you have questions about either mobile-friendly websites or app indexing, we’re always happy to chat in our Webmaster Help Forum.


Wednesday, February 18, 2015

Case Studies: Fixing Hacked Sites

Case Studies: Fixing Hacked Sites

Webmaster Level: All

Every day, thousands of websites get hacked. Hacked sites can harm users by serving malicious software, collecting personal information, or redirecting them to sites they didn't intend to visit. Webmasters want to fix hacked sites quickly, but unfortunately recovering from a hack can be a complicated process.

We're trying to make the process of recovering from a hack easier for webmasters with features like Security Issues, Help for Hacked Sites, and a section of our forum just for hacked sites. Recently we talked to two webmasters with hacked sites to learn more about how they were able to fix their sites. We're sharing their stories with the hope that they might provide ideas to other webmasters who have been victims of hacking. We're also using these stories and other feedback for improving our documentation for hacked sites to make the process easier for everyone going forward.

Case Study #1: Restaurant website with multiple hack-injected scripts

A restaurant website using Wordpress received a message from Google in their Webmaster Tools account, alerting them that their site had been altered by hackers. To protect Google users, the website was labelled as hacked in Google's search results. The webmaster of the site, Sam, looked at the source code and noticed many unfamiliar links on the site with pharmaceuticals terms such as "viagra" and "cialis." She also noticed many pages where the meta description tags (in the HTML) had added content such as "buy valtrex in florida." There were also hidden div tags (also in the HTML) of many pages that linked to many sites. None of these links were added by Sam.

Sam removed all of the hacked content she found and filed a reconsideration request. The request was rejected but in the message she received from Google, she was advised to check for any unfamiliar scripts in the any PHP files (or any other server files), as well as changes to the .htaccess file. These files are likely to have scripts added by the hackers that modify the site. These scripts typically only show the hacked content to search engines, while hiding the content from a normal user. Sam checked out all of the .php files and compared them to the clean copies she had in her backup. She found new content added to her footer.php, index.php, and functions.php. When she replaced those files with the clean backups, she could no longer find any hacked content on her site. When she filed another reconsideration request, she got a response from Google notifying her that her site was free from hacked content!

Even though Sam had cleaned up the hacked content on her site, she knew that she would need to continue to secure her site against future attacks. She followed the steps below to keep her site safe in the future:

  • Keep the CMS (content management system like WordPress, Joomla, Drupal, etc) up to date with the most current version. Make sure plugins are up to date as well.
  • Make sure the account used to access the administrative features of the CMS uses a difficult and unique password.
  • If the CMS supports it, enable 2-step verification for login. (This might also be called two factor authentication or two step authentication.) This is recommended for the account being used for password recovery as well. Most email providers, like Google, Microsoft, Yahoo all support this!
  • Make sure the plugins and themes installed are from a reputable source - pirated plugins or themes can often contain code that makes it even easier for hackers to get in!

Case Study #2: Professional website with lots of hard to find hacked pages

A small business owner named Maria who also manages her own website received a message in her Webmaster Tools that her site was hacked. The message provided an example of a page added by hackers: http://example.com/where-to-buy-cialis-over-the-counter/. She talked to her hosting provider who looked at the source code on the homepage but could not find any pharmaceutical keywords. When the hosting provider visited http://example.com/where-to-buy-cialis-over-the-counter/, it returned an error page. Maria also bought a malware scanning service but the service was not able to find any malicious content on her site.

Maria then went to Webmaster Tools and used the Fetch as Google tool on the example URL Google had provided (http://example.com/where-to-buy-cialis-over-the-counter/) which returned no content. Confused, she filed a reconsideration request and received a rejection message which advised her to do two things:

  1. Verify the non-www version of her site as hackers often try to hide content in folders that may be overlooked by the webmaster.

    While it may seem like http://example.com and http://www.example.com are the same site, Google actually treats these as different sites. http://example.com is referred to as the "root domain" while http://www.example.com is called the subdomain. Maria had http://www.example.com verified but not http://example.com verified which is important because the pages added by hackers were non-www pages like http://example.com/where-to-buy-cialis-over-the-counter/. Once she verified http://example.com she was able to successfully see the hacked content on the provided URL with the Fetch as Google tool in Webmaster Tools.

  2. Check her .htaccess file for new rules.

    Maria talked to her hosting provider who showed her how to access her .htaccess file. She noticed right away that her .htaccess file had some strange content that she had not added:

    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (google|yahoo|msn|aol|bing) [OR]
    RewriteCond %{HTTP_REFERER} (google|yahoo|msn|aol|bing)
    RewriteRule ^([^/]*)/$ /main.php?p=$1 [L]
    </IfModule>

    The mod_rewrite rule you see above was inserted by the hacker and redirects anyone coming from certain search engines, as well as search engine crawlers, to main.php, which generates all of the hacked content. It's also possible that these rules can redirect users accessing the site on a mobile device. On the same day, she also saw that a recent malware scan found suspicious content on the main.php file. One top of that, she also noticed an unknown user in the ftp users area of her website development software.

She removed the main.php file, the .htaccess file, and removed the unknown user from her FTP users area and her site was no longer hacked!

Steps to prevent getting hacked in the future

  • Avoid using FTP when transferring files to your servers. FTP does not encrypt any traffic, including passwords. Instead, use SFTP, which will encrypt everything, including your password, as a protection against eavesdroppers examining network traffic.
  • Check the permissions on sensitive files like .htaccess. Your hosting provider may be able to assist you if you need help. The .htaccess file can be used to improve and protect your site, but it can also be used for malicious hacks if they are able to gain access to it.
  • Be vigilant and look for new and unfamiliar users in your administrative panel and any other place where there may be users that can modify your site.

We hope your site never gets hacked, but if it does, we have many resources for hacked webmasters on our Help for Hacked Sites page. If you need more help or would like to share your own tips, you can post in our Webmaster Help Forum. If you do post to the forum or submit a reconsideration request for your site, please include #NoHacked.

Tuesday, December 9, 2014

The four steps to appiness

Webmaster Level: intermediate to advanced

App deep links are the new kid on the block in organic search, and they’re picking up speed faster than you can say “schema.org ViewAction”! For signed-in users, 15% of Google searches on Android now return deep links to apps through App Indexing. And over just the past quarter, we've seen the number of clicks on app deep links jump by 10x.

We’ve gotten a lot of feedback from developers and seen a lot of implementations gone right and others that were good learning experiences since we opened up App Indexing back in June. We’d like to share with you four key steps to monitor app performance and drive user engagement:

1. Give your app developer access to Webmaster Tools

App indexing is a team effort between you (as a webmaster) and your app development team. We show information in Webmaster Tools that is key for your app developers to do their job well. Here’s what’s available right now:

  • Errors in indexed pages within apps
  • Weekly clicks and impressions from app deep link via Google search
  • Stats on your sitemap (if that’s how you implemented the app deep links)
...and we plan to add a lot more in the coming months!

We’ve noticed that very few developers have access to Webmaster Tools. So if you want your app development team to get all of the information they need to fix app-related issues, it’s essential for them to have access to Webmaster Tools.

Any verified site owner can add a new user. Pick restricted or full permissions, depending on the level of access you’d like to give:

2. Understand how your app is doing in search results

How are users engaging with your app from search results? We’ve introduced two new ways for you to track performance for your app deep links:

  • We now send a weekly clicks and impressions update to the Message center in your Webmaster Tools account.
  • You can now track how much traffic app deep links drive to your app using referrer information - specifically, the referrer extra in the ACTION_VIEW intent. We're working to integrate this information with Google Analytics for even easier access. Learn how to track referrer information on our Developer site.

3. Make sure key app resources can be crawled

Blocked resources are one of the top reasons for the “content mismatch” errors you see in Webmaster Tools’ Crawl Errors report. We need access to all the resources necessary to render your app page. This allows us to assess whether your associated web page has the same content as your app page.

To help you find and fix these issues, we now show you the specific resources we can’t access that are critical for rendering your app page. If you see a content mismatch error for your app, look out for the list of blocked resources in “Step 5” of the details dialog:

4. Watch out for Android App errors

To help you identify errors when indexing your app, we’ll send you messages for all app errors we detect, and will also display most of them in the “Android apps” tab of the Crawl errors report.

In addition to the currently available “Content mismatch” and “Intent URI not supported” error alerts, we’re introducing three new error types:

  • APK not found: we can’t find the package corresponding to the app.
  • No first-click free: the link to your app does not lead directly to the content, but requires login to access.
  • Back button violation: after following the link to your app, the back button did not return to search results.

In our experience, the majority of errors are usually caused by a general setting in your app (e.g. a blocked resource, or a region picker that pops up when the user tries to open the app from search). Taking care of that generally resolves it for all involved URIs.

Good luck in the pursuit of appiness! As always, if you have questions, feel free to drop by our Webmaster help forum.

Wednesday, December 3, 2014

Are you a robot? Introducing “No CAPTCHA reCAPTCHA”

reCAPTCHA protects the websites you love from spam and abuse. So, when you go online—say, for some last-minute holiday shopping—you won't be competing with robots and abusive scripts to access sites. For years, we’ve prompted users to confirm they aren’t robots by asking them to read distorted text and type it into a box, like this:
But, we figured it would be easier to just directly ask our users whether or not they are robots—so, we did! We’ve begun rolling out a new API that radically simplifies the reCAPTCHA experience. We’re calling it the “No CAPTCHA reCAPTCHA” and this is how it looks:
On websites using this new API, a significant number of users will be able to securely and easily verify they’re human without actually having to solve a CAPTCHA. Instead, with just a single click, they’ll confirm they are not a robot.
A brief history of CAPTCHAs 

While the new reCAPTCHA API may sound simple, there is a high degree of sophistication behind that modest checkbox. CAPTCHAs have long relied on the inability of robots to solve distorted text. However, our research recently showed that today’s Artificial Intelligence technology can solve even the most difficult variant of distorted text at 99.8% accuracy. Thus distorted text, on its own, is no longer a dependable test.

To counter this, last year we developed an Advanced Risk Analysis backend for reCAPTCHA that actively considers a user’s entire engagement with the CAPTCHA—before, during, and after—to determine whether that user is a human. This enables us to rely less on typing distorted text and, in turn, offer a better experience for users.  We talked about this in our Valentine’s Day post earlier this year.

The new API is the next step in this steady evolution. Now, humans can just check the box and in most cases, they’re through the challenge.

Are you sure you’re not a robot?

However, CAPTCHAs aren't going away just yet. In cases when the risk analysis engine can't confidently predict whether a user is a human or an abusive agent, it will prompt a CAPTCHA to elicit more cues, increasing the number of security checkpoints to confirm the user is valid.
Making reCAPTCHAs mobile-friendly

This new API also lets us experiment with new types of challenges that are easier for us humans to use, particularly on mobile devices. In the example below, you can see a CAPTCHA based on a classic Computer Vision problem of image labeling. In this version of the CAPTCHA challenge, you’re asked to select all of the images that correspond with the clue. It's much easier to tap photos of cats or turkeys than to tediously type a line of distorted text on your phone.
Adopting the new API on your site

As more websites adopt the new API, more people will see "No CAPTCHA reCAPTCHAs".  Early adopters, like Snapchat, WordPress, Humble Bundle, and several others are already seeing great results with this new API. For example, in the last week, more than 60% of WordPress’ traffic and more than 80% of Humble Bundle’s traffic on reCAPTCHA encountered the No CAPTCHA experience—users got to these sites faster. To adopt the new reCAPTCHA for your website, visit our site to learn more.

Humans, we'll continue our work to keep the Internet safe and easy to use. Abusive bots and scripts, it’ll only get worse—sorry we’re (still) not sorry.


Wednesday, October 29, 2014

Tracking mobile usability in Webmaster Tools

Webmaster Level: intermediate

Mobile is growing at a fantastic pace - in usage, not just in screen size. To keep you informed of issues mobile users might be seeing across your website, we've added the Mobile Usability feature to Webmaster Tools.

The new feature shows mobile usability issues we’ve identified across your website, complete with graphs over time so that you see the progress that you've made.

A mobile-friendly site is one that you can easily read & use on a smartphone, by only having to scroll up or down. Swiping left/right to search for content, zooming to read text and use UI elements, or not being able to see the content at all make a site harder to use for users on mobile phones. To help, the Mobile Usability reports show the following issues: Flash content, missing viewport (a critical meta-tag for mobile pages), tiny fonts, fixed-width viewports, content not sized to viewport, and clickable links/buttons too close to each other.

We strongly recommend you take a look at these issues in Webmaster Tools, and think about how they might be resolved; sometimes it's just a matter of tweaking your site's template! More information on how to make a great mobile-friendly website can be found in our Web Fundamentals website (with more information to come soon).

If you have any questions, feel free to join us in our webmaster help forums (on your phone too)!

Thursday, October 16, 2014

Best practices for XML sitemaps & RSS/Atom feeds

Best practices for XML sitemaps & RSS/Atom feeds

Webmaster level: intermediate-advanced

Submitting sitemaps can be an important part of optimizing websites. Sitemaps enable search engines to discover all pages on a site and to download them quickly when they change. This blog post explains which fields in sitemaps are important, when to use XML sitemaps and RSS/Atom feeds, and how to optimize them for Google.

Sitemaps and feeds

Sitemaps can be in XML sitemap, RSS, or Atom formats. The important difference between these formats is that XML sitemaps describe the whole set of URLs within a site, while RSS/Atom feeds describe recent changes. This has important implications:

  • XML sitemaps are usually large; RSS/Atom feeds are small, containing only the most recent updates to your site.
  • XML sitemaps are downloaded less frequently than RSS/Atom feeds.

For optimal crawling, we recommend using both XML sitemaps and RSS/Atom feeds. XML sitemaps will give Google information about all of the pages on your site. RSS/Atom feeds will provide all updates on your site, helping Google to keep your content fresher in its index. Note that submitting sitemaps or feeds does not guarantee the indexing of those URLs.

Example of an XML sitemap:

<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <url>
   <loc>http://example.com/mypage</loc>
   <lastmod>2011-06-27T19:34:00+01:00</lastmod>
   <!-- optional additional tags -->
 </url>
 <url>
   ...
 </url>
</urlset>

Example of an RSS feed:

<?xml version="1.0" encoding="utf-8"?>
<rss>
 <channel>
   <!-- other tags -->
   <item>
     <!-- other tags -->
     <link>http://example.com/mypage</link>
     <pubDate>Mon, 27 Jun 2011 19:34:00 +0100</pubDate>
   </item>
   <item>
     ...
   </item>
 </channel>
</rss>

Example of an Atom feed:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 <!-- other tags -->
 <entry>
   <link href="http://example.com/mypage" />
   <updated>2011-06-27T19:34:00+01:00</updated>
   <!-- other tags -->
 </entry>
 <entry>
   ...
 </entry>
</feed>

“other tags” refer to both optional and required tags by their respective standards. We recommend that you specify the required tags for Atom/RSS as they will help you to appear on other properties that might use these feeds, in addition to Google Search.

Best practices

Important fields

XML sitemaps and RSS/Atom feeds, in their core, are lists of URLs with metadata attached to them. The two most important pieces of information for Google are the URL itself and its last modification time:

URLs

URLs in XML sitemaps and RSS/Atom feeds should adhere to the following guidelines:

  • Only include URLs that can be fetched by Googlebot. A common mistake is including URLs disallowed by robots.txt — which cannot be fetched by Googlebot, or including URLs of pages that don't exist.
  • Only include canonical URLs. A common mistake is to include URLs of duplicate pages. This increases the load on your server without improving indexing.
Last modification time

Specify a last modification time for each URL in an XML sitemap and RSS/Atom feed. The last modification time should be the last time the content of the page changed meaningfully. If a change is meant to be visible in the search results, then the last modification time should be the time of this change.

  • XML sitemap uses  <lastmod>
  • RSS uses <pubDate>
  • Atom uses <updated>

Be sure to set or update last modification time correctly:

  • Specify the time in the correct format: W3C Datetime for XML sitemaps, RFC3339 for Atom and RFC822 for RSS.
  • Only update modification time when the content changed meaningfully.
  • Don’t set the last modification time to the current time whenever the sitemap or feed is served.

XML sitemaps

XML sitemaps should contain URLs of all pages on your site. They are often large and update infrequently. Follow these guidelines:

  • For a single XML sitemap: update it at least once a day (if your site changes regularly) and ping Google after you update it.
  • For a set of XML sitemaps: maximize the number of URLs in each XML sitemap. The limit is 50,000 URLs or a maximum size of 10MB uncompressed, whichever is reached first. Ping Google for each updated XML sitemap (or once for the sitemap index, if that's used) every time it is updated. A common mistake is to put only a handful of URLs into each XML sitemap file, which usually makes it harder for Google to download all of these XML sitemaps in a reasonable time.

RSS/Atom

RSS/Atom feeds should convey recent updates of your site. They are usually small and updated frequently. For these feeds, we recommend:

  • When a new page is added or an existing page meaningfully changed, add the URL and the modification time to the feed.
  • In order for Google to not miss updates, the RSS/Atom feed should have all updates in it since at least the last time Google downloaded it. The best way to achieve this is by using PubSubHubbub. The hub will propagate the content of your feed to all interested parties (RSS readers, search engines, etc.) in the fastest and most efficient way possible.

Generating both XML sitemaps and Atom/RSS feeds is a great way to optimize crawling of a site for Google and other search engines. The key information in these files is the canonical URL and the time of the last modification of pages within the website. Setting these properly, and notifying Google and other search engines through sitemaps pings and PubSubHubbub, will allow your website to be crawled optimally, and represented accordingly in search results.

If you have any questions, feel free to post them here, or to join other webmasters in the webmaster help forum section on sitemaps.

Monday, August 25, 2014

#NoHacked: a global campaign to spread hacking awareness

Webmaster level: All

This June, we introduced a weeklong social campaign called #NoHacked. The goals for #NoHacked are to bring awareness to hacking attacks and offer tips on how to keep your sites safe from hackers.

We held the campaign in 11 languages on multiple channels including Google+, Twitter and Weibo. About 1 million people viewed our tips and hundreds of users used the hashtag #NoHacked to spread awareness and to share their own tips. Check them out below!

Posts we shared during the campaign:


Some of the many tips shared by users across the globe:
  • Pablo Silvio Esquivel from Brazil recommends users not to use pirated software (source)
  • Rens Blom from the Netherlands suggests using different passwords for your accounts, changing them regularly, and using an extra layer of security such as two-step authentication (source)
  • Дмитрий Комягин from Russia says to regularly monitor traffic sources, search queries and landing pages, and to look out for spikes in traffic (source)
  • 工務店コンサルタント from Japan advises everyone to choose a good hosting company that's knowledgeable in hacking issues and to set email forwarding in Webmaster Tools (source)
  • Kamil Guzdek from Poland advocates changing the default table prefix in wp-config to a custom one when installing a new WordPress to lower the risk of the database from being hacked (source)

Hacking is still a surprisingly common issue around the world so we highly encourage all webmasters to follow these useful tips. Feel free to continue using the hashtag #NoHacked to share your own tips or experiences around hacking prevention and awareness. Thanks for supporting the #NoHacked campaign!

And in the unfortunate event that your site gets hacked, we’ll help you toward a speedy and thorough recovery:

Wednesday, August 6, 2014

HTTPS as a ranking signal

Webmaster level: all

Security is a top priority for Google. We invest a lot in making sure that our services use industry-leading security, like strong HTTPS encryption by default. That means that people using Search, Gmail and Google Drive, for example, automatically have a secure connection to Google.

Beyond our own stuff, we’re also working to make the Internet safer more broadly. A big part of that is making sure that websites people access from Google are secure. For instance, we have created resources to help webmasters prevent and fix security breaches on their sites.

We want to go even further. At Google I/O a few months ago, we called for “HTTPS everywhere” on the web.

We’ve also seen more and more webmasters adopting HTTPS (also known as HTTP over TLS, or Transport Layer Security), on their website, which is encouraging.

For these reasons, over the past few months we’ve been running tests taking into account whether sites use secure, encrypted connections as a signal in our search ranking algorithms. We've seen positive results, so we're starting to use HTTPS as a ranking signal. For now it's only a very lightweight signal — affecting fewer than 1% of global queries, and carrying less weight than other signals such as high-quality content — while we give webmasters time to switch to HTTPS. But over time, we may decide to strengthen it, because we’d like to encourage all website owners to switch from HTTP to HTTPS to keep everyone safe on the web.


Lock


In the coming weeks, we’ll publish detailed best practices (it's in our help center now) to make TLS adoption easier, and to avoid common mistakes. Here are some basic tips to get started:

  • Decide the kind of certificate you need: single, multi-domain, or wildcard certificate
  • Use 2048-bit key certificates
  • Use relative URLs for resources that reside on the same secure domain
  • Use protocol relative URLs for all other domains
  • Check out our Site move article for more guidelines on how to change your website’s address
  • Don’t block your HTTPS site from crawling using robots.txt
  • Allow indexing of your pages by search engines where possible. Avoid the noindex robots meta tag.

If your website is already serving on HTTPS, you can test its security level and configuration with the Qualys Lab tool. If you are concerned about TLS and your site’s performance, have a look at Is TLS fast yet?. And of course, if you have any questions or concerns, please feel free to post in our Webmaster Help Forums.

We hope to see more websites using HTTPS in the future. Let’s all make the web more secure!

Monday, August 4, 2014

Introducing the Google News Publisher Center

Introducing the Google News Publisher Center

(Cross-posted on the Google News Blog)

Webmaster level: All

UPDATE: Great News -- The Publisher Center is now available in all countries where Google News has an edition.

If you're a news publisher, your website has probably evolved and changed over time -- just like your stories. But in the past, when you made changes to the structure of your site, we might not have discovered your new content. That meant a lost opportunity for your readers, and for you. Unless you regularly checked Webmaster Tools, you might not even have realized that your new content wasn’t showing up in Google News. To prevent this from happening, we are letting you make changes to our record of your news site using the just launched Google News Publisher Center.

With the Publisher Center, your potential readers can be more informed about the articles they’re clicking on and you benefit from better discovery and classification of your news content. After verifying ownership of your site using Google Webmaster Tools, you can use the Publisher Center to directly make the following changes:

  • Update your news site details, including changing your site name and labeling your publication with any relevant source labels (e.g., “Blog”, “Satire” or “Opinion”)
  • Update your section URLs when you change your site structure (e.g., when you add a new section such as http://example.com/2014commonwealthgames or http://example.com/elections2014)
  • Label your sections with a specific topic (e.g., “Technology” or “Politics”)

Whenever you make changes to your site, we’d recommend also checking our record of it in the Publisher Center and updating it if necessary.

Try it out, or learn more about how to get started.

At the moment the tool is only available to publishers in the U.S. but we plan to introduce it in other countries soon and add more features.  In the meantime, we’d love to hear from you about what works well and what doesn’t. Ultimately, our goal is to make this a platform where news publishers and Google News can work together to provide readers with the best, most diverse news on the web.

Wednesday, July 16, 2014

Testing robots.txt files made easier

Webmaster level: intermediate-advanced

To crawl, or not to crawl, that is the robots.txt question.

Making and maintaining correct robots.txt files can sometimes be difficult. While most sites have it easy (tip: they often don't even need a robots.txt file!), finding the directives within a large robots.txt file that are or were blocking individual URLs can be quite tricky. To make that easier, we're now announcing an updated robots.txt testing tool in Webmaster Tools.

You can find the updated testing tool in Webmaster Tools within the Crawl section:

Here you'll see the current robots.txt file, and can test new URLs to see whether they're disallowed for crawling. To guide your way through complicated directives, it will highlight the specific one that led to the final decision. You can make changes in the file and test those too, you'll just need to upload the new version of the file to your server afterwards to make the changes take effect. Our developers site has more about robots.txt directives and how the files are processed.

Additionally, you'll be able to review older versions of your robots.txt file, and see when access issues block us from crawling. For example, if Googlebot sees a 500 server error for the robots.txt file, we'll generally pause further crawling of the website.

Since there may be some errors or warnings shown for your existing sites, we recommend double-checking their robots.txt files. You can also combine it with other parts of Webmaster Tools: for example, you might use the updated Fetch as Google tool to render important pages on your website. If any blocked URLs are reported, you can use this robots.txt tester to find the directive that's blocking them, and, of course, then improve that. A common problem we've seen comes from old robots.txt files that block CSS, JavaScript, or mobile content — fixing that is often trivial once you've seen it.

We hope this updated tool makes it easier for you to test & maintain the robots.txt file. Should you have any questions, or need help with crafting a good set of directives, feel free to drop by our webmaster's help forum!

Tuesday, May 27, 2014

Rendering pages with Fetch as Google

Webmaster level: all

The Fetch as Google feature in Webmaster Tools provides webmasters with the results of Googlebot attempting to fetch their pages. The server headers and HTML shown are useful to diagnose technical problems and hacking side-effects, but sometimes make double-checking the response hard: Help! What do all of these codes mean? Is this really the same page as I see it in my browser? Where shall we have lunch? We can't help with that last one, but for the rest, we've recently expanded this tool to also show how Googlebot would be able to render the page.

Viewing the rendered page

In order to render the page, Googlebot will try to find all the external files involved, and fetch them as well. Those files frequently include images, CSS and JavaScript files, as well as other files that might be indirectly embedded through the CSS or JavaScript. These are then used to render a preview image that shows Googlebot's view of the page.

You can find the Fetch as Google feature in the Crawl section of Google Webmaster Tools. After submitting a URL with "Fetch and render," wait for it to be processed (this might take a moment for some pages). Once it's ready, just click on the response row to see the results.

Fetch as Google

Handling resources blocked by robots.txt

Googlebot follows the robots.txt directives for all files that it fetches. If you are disallowing crawling of some of these files (or if they are embedded from a third-party server that's disallowing Googlebot's crawling of them), we won't be able to show them to you in the rendered view. Similarly, if the server fails to respond or returns errors, then we won't be able to use those either (you can find similar issues in the Crawl Errors section of Webmaster Tools). If we run across either of these issues, we'll show them below the preview image.

We recommend making sure Googlebot can access any embedded resource that meaningfully contributes to your site's visible content, or to its layout. That will make Fetch as Google easier for you to use, and will make it possible for Googlebot to find and index that content as well. Some types of content – such as social media buttons, fonts or website-analytics scripts – tend not to meaningfully contribute to the visible content or layout, and can be left disallowed from crawling. For more information, please see our previous blog post on how Google is working to understand the web better.

We hope this update makes it easier for you to diagnose these kinds of issues, and to discover content that's accidentally blocked from crawling. If you have any comments or questions, let us know here or drop by in the webmaster help forum.

Monday, May 19, 2014

Making your site more mobile-friendly with PageSpeed Insights

Webmaster level: all


To help developers and webmasters make their pages mobile-friendly, we recently updated PageSpeed Insights with additional recommendations on mobile usability.




Poor usability can diminish the benefits of a fast page load. We know the average mobile page takes more than 7 seconds to load, and by using the PageSpeed Insights tool and following its speed recommendations, you can make your page load much faster. But suppose your fast mobile site loads in just 2 seconds instead of 7 seconds. If mobile users still have to spend another 5 seconds once the page loads to pinch-zoom and scroll the screen before they can start reading the text and interacting with the page, then that site isn’t really fast to use after all. PageSpeed Insights’ new User Experience rules can help you find and fix these usability issues.

These new recommendations currently cover the following areas:
  • Configure the viewport: Without a meta-viewport tag, modern mobile browsers will assume your page is not mobile-friendly, and will fall back to a desktop viewport and possibly apply font-boosting, interfering with your intended page layout. Configuring the viewport to width=device-width should be your first step in mobilizing your site.

  • Size content to the viewport: Users expect mobile sites to scroll vertically, not horizontally. Once you’ve configured your viewport, make sure your page content fits the width of that viewport, keeping in mind that not all mobile devices are the same width.

  • Use legible font sizes: If users have to zoom in just to be able read your article text on their smartphone screen, then your site isn’t mobile-friendly. PageSpeed Insights checks that your site’s text is large enough for most users to read comfortably.
  • Size tap targets appropriately: Nothing’s more frustrating than trying to tap a button or link on a phone or tablet touchscreen, and accidentally hitting the wrong one because your finger pad is much bigger than a desktop mouse cursor. Make sure that your mobile site’s touchscreen tap targets are large enough to press easily.
  • Avoid plugins: Most smartphones don’t support Flash or other browser plugins, so make sure your mobile site doesn't rely on plugins.
These rules are described in more detail in our help pages. When you’re ready, you can test your pages and the improvements you make using the PageSpeed Insights tool. We’ve also updated PageSpeed Insights to use a mobile friendly design, and we’ve translated our documents into additional languages.

As always, if you have any questions or feedback, please post in our discussion group.

Wednesday, April 30, 2014

Webmaster Guidelines for sneaky redirects updated

Webmaster Guidelines for sneaky redirects updated

Webmaster Level: All

Redirects are often used by webmasters to help forward visitors from one page to another. They are a normal part of how the web operates, and are very valuable when well used. However, some redirects are designed to manipulate or deceive search engines or to display different content to human users than to search engines. Our quality guidelines strictly forbid these kinds of redirects.

For example, desktop users might receive a normal page, while hackers might redirect all mobile users to a completely different spam domain. To help webmasters better recognize problematic redirects, we have updated our quality guidelines for sneaky redirects with examples that illustrate redirect-related violations.

We have also updated the hacked content guidelines to include redirects on compromised websites. If you believe your site has been compromised, follow these instructions to identify the issues on your site and fix them.

As with any violation of our quality guidelines, we may take manual action, including removal from our index, in order to maintain the quality of the search results. If you have any questions about our guidelines, feel free to ask in our Webmaster Help Forum.


Tuesday, April 22, 2014

Introducing our global Google+ page for webmasters

Webmaster Level: All

We’ve recently launched our global Google Webmasters Google+ page. Have you checked it out yet? Our page covers a plethora of topics:
Follow us at google.com/+GoogleWebmasters and let us know in the comments what else you’d like to see on our page! If you speak Italian, Japanese, Russian or Spanish, be sure to also join one of our webmaster communities to stay up-to-date on language and region-specific news.

Monday, March 31, 2014

More Precise Index Status Data for Your Site Variations

Webmaster Level: Intermediate

The Google Webmaster Tools Index Status feature reports how many pages on your site are indexed by Google. In the past, we didn’t show index status data for HTTPS websites independently, but rather we included everything in the HTTP site’s report. In the last months, we’ve heard from you that you’d like to use Webmaster Tools to track your indexed URLs for sections of your website, including the parts that use HTTPS.

We’ve seen that nearly 10% of all URLs already use a secure connection to transfer data via HTTPS, and we hope to see more webmasters move their websites from HTTP to HTTPS in the future. We’re happy to announce a refinement in the way your site’s index status data is displayed in Webmaster Tools: the Index Status feature now tracks your site’s indexed URLs for each protocol (HTTP and HTTPS) as well as for verified subdirectories.

This makes it easy for you to monitor different sections of your site. For example, the following URLs each show their own data in Webmaster Tools Index Status report, provided they are verified separately:

The refined data will be visible for webmasters whose site's URLs are on HTTPS or who have subdirectories verified, such as https://example.com/folder/. Data for subdirectories will be included in the higher-level verified sites on the same hostname and protocol.

If you have a website on HTTPS or if some of your content is indexed under different subdomains, you will see a change in the corresponding Index Status reports. The screenshots below illustrate the changes that you may see on your HTTP and HTTPS sites’ Index Status graphs for instance:

HTTP site’s Index Status showing drop

HTTPS site’s Index Status showing increase

An “Update” annotation has been added to the Index Status graph for March 9th, showing when we started collecting this data. This change does not affect the way we index your URLs, nor does it have an impact on the overall number of URLs indexed on your domain. It is a change that only affects the reporting of data in Webmaster Tools user interface.

In order to see your data correctly, you will need to verify all existing variants of your site (www., non-www., HTTPS, subdirectories, subdomains) in Google Webmaster Tools. We recommend that your preferred domains and canonical URLs are configured accordingly.

Note that if you wish to submit a Sitemap, you will need to do so for the preferred variant of your website, using the corresponding URLs. Robots.txt files are also read separately for each protocol and hostname.

We hope that you’ll find this update useful, and that it’ll help you monitor, identify and fix indexing problems with your website. You can find additional details in our Index Status Help Center article. As usual, if you have any questions, don’t hesitate to ask in our webmaster Help Forum.


Wednesday, March 12, 2014

Musical artists: your official tour dates in the Knowledge Graph

Webmaster level: all

tour dates online

When music lovers search for their favorite band on Google, we often show them a Knowledge Graph panel with lots of information about the band, including the band’s upcoming concert schedule. It’s important to fans and artists alike that this schedule be accurate and complete. That’s why we’re trying a new approach to concert listings. In our new approach, all concert information for an artist comes directly from that artist’s official website when they add structured data markup.

If you’re the webmaster for a musical artist’s official website, you have several choices for how to participate:

  1. You can implement schema.org markup on your site. That’s easier than ever, since we’re supporting the new JSON-LD format (alongside RDFa and microdata) for this feature.
  2. Even easier, you can install an events widget that has structured data markup built in, such as Bandsintown, BandPage, ReverbNation, Songkick, or GigPress.
  3. You can label the site’s events with your mouse using Google’s point-and-click webmaster tool: Data Highlighter.

All these options are explained in detail in our Help Center. If you have any questions, feel free to ask in our Webmaster Help forums. So don’t you worry `bout a schema.org/Thing ... just mark up your site’s events and let the good schema.org/Times roll!


Thursday, February 27, 2014

3 tips to find hacking on your site, and ways to prevent and fix it



Google shows this message in search results for sites that we believe may have been compromised.You might not think your site is a target for hackers, but it's surprisingly common. Hackers target large numbers of sites all over the web in order to exploit the sites' users or reputation.

One common way hackers take advantage of vulnerable sites is by adding spammy pages. These spammy pages are then used for various purposes, such as redirecting users to undesired or harmful destinations. For example, we’ve recently seen an increase in hacked sites redirecting users to fake online shopping sites.

Once you recognize that your website may have been hacked, it’s important to diagnose and fix the problem as soon as possible. We want webmasters to keep their sites secure in order to protect users from spammy or harmful content.

3 tips to help you find hacked content on your site

  1. Check your site for suspicious URLs or directories
    Keep an eye out for any suspicious activity on your site by performing a “site:” search of your site in Google, such as [site:example.com]. Are there any suspicious URLs or directories that you do not recognize?

    You can also set up a Google Alert for your site. For example, if you set a Google Alert for [site:example.com (viagra|cialis|casino|payday loans)], you’ll receive an email when these keywords are detected on your site.

  2. Look for unnatural queries on the Search Queries page in Webmaster Tools
    The Search Queries page shows Google Web Search queries that have returned URLs from your site. Look for unexpected queries as it can be an indication of hacked content on your site.

    Don’t be quick to dismiss queries in different languages. This may be the result of spammy pages in other languages placed on your website.


    Example of an English site hacked with Japanese content.
  3. Enable email forwarding in Webmaster Tools
    Google will send you a message if we detect that your site may be compromised. Messages appear in Webmaster Tools’ Message Center but it's a best practice to also forward these messages to your email. Keep in mind that Google won’t be able to detect all kinds of hacked content, but we hope our notifications will help you catch things you may have missed.

Tips to fix and prevent hacking

  • Stay informed
    The Security Issues section in Webmaster Tools will show you hacked pages that we detected on your site. We also provide detailed information to help you fix your hacked site. Make sure to read through this documentation so you can quickly and effectively fix your site.

  • Protect your site from potential attacks
    It's better to prevent sites from being hacked than to clean up hacked content. Hackers will often take advantage of security vulnerabilities on commonly used website management software. Here are some tips to keep your site safe from hackers:

    • Always keep the software that runs your website up-to-date.
    • If your website management software tools offer security announcements, sign up to get the latest updates.
    • If the software for your website is managed by your hosting provider, try to choose a provider that you can trust to maintain the security of your site.

We hope this post makes it easier for you to identify, fix, and prevent hacked spam on your site. If you have any questions, feel free to post in the comments, or drop by the Google Webmaster Help Forum.

If you find suspicious sites in Google search results, please report them using the Spam Report tool.

Monday, January 27, 2014

Affiliate programs and added value

Affiliate programs and added value

Webmaster level: All

Our quality guidelines warn against running a site with thin or scraped content without adding substantial added value to the user. Recently, we’ve seen this behavior on many video sites, particularly in the adult industry, but also elsewhere. These sites display content provided by an affiliate program—the same content that is available across hundreds or even thousands of other sites.

If your site syndicates content that’s available elsewhere, a good question to ask is: “Does this site provide significant added benefits that would make a user want to visit this site in search results instead of the original source of the content?” If the answer is “No,” the site may frustrate searchers and violate our quality guidelines. As with any violation of our quality guidelines, we may take action, including removal from our index, in order to maintain the quality of our users’ search results. If you have any questions about our guidelines, you can ask them in our Webmaster Help Forum.

Wednesday, January 22, 2014

Changes in crawl error reporting for redirects

Webmaster level: intermediate-advanced

In the past, we have seen occasional confusion by webmasters regarding how crawl errors on redirecting pages were shown in Webmaster Tools. It's time to make this a bit clearer and easier to diagnose! While it used to be that we would report the error on the original - redirecting - URL, we'll now show the error on the final URL - the one that actually returns the error code.


Let's look at an example:



URL A redirects to URL B, which in turn returns an error. The type of redirect, and type of error is unimportant here.

In the past, we would have reported the error observed at the end under URL A. Now, we'll instead report it as URL B. This makes it much easier to diagnose the crawl errors as they're shown in Webmaster Tools. Using tools like cURL or your favorite online server header checker, you can now easily confirm that this error is actually taking place on URL B.

This change may also be visible in the total error counts for some websites. For example, if your site is moving to a new domain, you'll only see these errors for the new domain (assuming the old domain redirects correctly), which might result in noticeable changes in the total error counts for those sites.

Note that this change only affects how these crawl errors are shown in Webmaster Tools. Also, remember that having crawl errors for URLs that should be returning errors (e.g. they don't exist) does not negatively affect the rest of the website's indexing or ranking (also as discussed on Google+).

We hope this change makes it a bit easier to track down crawl errors, and to clean up the accidental ones that you weren't aware of! If you have any questions, feel free to post here, or drop by in the Google Webmaster Help Forum.


Tuesday, January 7, 2014

More detailed search queries in Webmaster Tools

Webmaster level: intermediate

To help jump-start your year and make metrics for your site more actionable, we've updated one of the most popular features in Webmaster Tools: data in the search queries feature will no longer be rounded / bucketed. This change will become visible over the next few days.

The search queries feature gives insights into the searches that have at least one page from your website shown in the search results. It collects these "impressions" together with the times when users visited your site - the "clicks" - and displays these for the last 90 days.

Before and after:


We hope this makes it easier for you to see the finer details of how users are finding your website, and when they're clicking through. Should you have any questions, feel free to visit our help forum.