English Google Webmaster Central office-hours hangout

JOHN MUELLER: OK Welcome, everyone, to today’s Google Webmaster Central Office Hours Hangouts My name is John Mueller I’m a webmaster trends analyst here at Google in Switzerland, and part of what I do is talk with webmasters like the ones here in the Hangout, or the ones who submitted lots of questions or on the help forums, and try to get the information to you that you need to make great websites that work well in search We have a handful of people here today, so if you’re watching this live, feel free to jump on in There’s still room And, as always, do any of you have any questions that we should start off with? MALE SPEAKER: So, John, I have a question I was wondering whether you could look up my website and tell us if there is something holding back the site in terms of whether it is content, quality, or links? I know you’ve done that in the past in the English chats, and that would be extremely helpful for us JOHN MUELLER: I can take a quick look, but it’s probably not something I can give you, like, a short, 5-second answer on So, yeah I know you sent me an email as well I need to take a look at the details there, too But– MALE SPEAKER: The only thing is that I think it’s been 10, 11 months since we believe that we are in the Panda penalty, and we would just like, at a minimum, a confirmation whether that’s the case or not I think if that’s the case, quite honestly we would love an example of something that as a human, you would find the Panda worthy of a penalty– for a sitewide penalty If it’s not, that would be also very helpful, so we don’t think that it’s something that we cannot anymore fix There were problems with the site a few years back and we took a lot of corrective action So we are also concerned that maybe, to a certain extent, Google cannot tell that the past few years have been very different than before And it’s kind of looking at those signals as well So could we at least– could you at least tell us whether it is a Panda penalty? It definitely feels like one, but maybe it’s not JOHN MUELLER: I guess the good part is, we are rolling out an update to the Panda algorithm at the moment It’s something that’s more of a slower rollout, for technical reasons, but there is an update happening there So if there’s something from Panda that’s affecting your site based on how we evaluate higher- or lower-quality content, then that’s something that should be reflected in the future as well So that’s, at least, from that point of view, that’s happening MALE SPEAKER: Good So it’s fair to say that it has a Panda given that given the Panda update, we are actually seeing a small– a hit? JOHN MUELLER: As far as I can tell, that algorithms did at least think that it was not as high-quality as it could be So that’s something that will probably be updated as Panda rolls out, but I don’t know which direction it’s headed at the moment for your site specifically, since this is such a slower rollout MALE SPEAKER: Fair enough And then, maybe, could we, offline, take, maybe, looking at an example that would, from a human standpoint, the issue with the quality? It’s just that, one of these things that we’re getting the exact opposite feedback when humans look at the site And I’m open to the group also climbing in and saying, we’re seeing, oh, you know the pages– the content is not original I mean, we have been cut and scraped like crazy in the past few years Plus, we’ve had a history a few years back of doing certain things that were not right And so maybe the combination of those two things are throwing off But definitely it seems like there’s nothing worthy of a sitewide penalty But maybe we’re wrong JOHN MUELLER: I don’t know if I can give you specific examples there We tend not to do that for algorithms when they review a site, unless it’s something really obvious that we could point at– where we can say, well, your whole page is hidden text or something like that, or for technical reasons That’s something where we might be able to give examples But when it comes to quality algorithms, usually we can’t give any specific examples

from our side But I think you’re taking the right approach and taking a lot of this feedback in and trying to collect as much feedback as possible So I’ll definitely see if there’s something specific we can point at which is easy to recognize which might not be specific to just this quality algorithm But I can’t guarantee that I’ll have something specific there MALE SPEAKER: I think in the past, you had suggested the idea in these chats to do site audits where webmasters volunteer their sites And I’m wondering, is this one where it would be helpful for the community to understand other things that are important, that maybe are not so obvious? Because you’re definitely not– tell me if you agree with that, but you definitely don’t land on the site and you’re like, oh, this is a Spanish site, right? JOHN MUELLER: At least from my clicking around on your site, I’d agree with you on that But it’s been a while since I’ve taken a look at the details there, so it’s just hard for me to say that, from my first impressions, our algorithm should be doing this or this, because we do take into account a lot of different factors in these algorithms But I have passed that on to the team to kind of take a look there But I don’t know if I really have anything specific to point at there Maybe it’s something where, if we do a site-cleaning type hangout, this will be something where we could pull out some examples In the past we’ve been able to do that So I’ll definitely keep that in mind MALE SPEAKER: OK, thank you very much JOHN MUELLER: Sure MALE SPEAKER: John, can we step in with another question, or do you– JOHN MUELLER: Sure MALE SPEAKER: Want to take the– OK On 13th of July, Google started sending out emails about hreflang implementation Well, I’ve taken into consideration one specific website I’ve been working on, and I cleaned it up like seven months ago, and all the examples I find out in Webmaster Tools are turned to be wrong I mean, I have like 4,000 hreflang tags with errors, and I took them one by one, and they all are just fine But Google, for some reason, doesn’t seem to pick them up Should I sent you an email with specifics about this? JOHN MUELLER: It might be useful to have the details So one thing that I’ve seen on a number of sites is they implement the hreflang in the right way, in that it links to the other pages, but they implemented it in a place where we can’t pick it up So for example, if you have a head section on a page, and within the head you have a JavaScript that writes something to the document, then theoretically the head section closes when it processes that JavaScript MALE SPEAKER: Oh, I see No, I’ve implemented it in– JOHN MUELLER: –kind of below there will get ignored And it could be that maybe you have a no index error hreflang below that, and then we would ignore that because it’s not in the head, and it would be almost a security issue if we were to pick up metatags from the content of your page MALE SPEAKER: Oh, I see No, I’ve implemented it in the XML site maps All of it So I’ll send you the details later JOHN MUELLER: OK, sure MALE SPEAKER: So what my question was, do you know if Google takes more time now that it sends out these emails until it completely recrawls all the websites with the implementation? JOHN MUELLER: No It’s picked up essentially immediately when we crawl and index those pages MALE SPEAKER: OK JOHN MUELLER: So as we crawl and index the pages and see that the implementation is correct, we’ll reflect that in Webmaster Tools It might take a few days for the data to be visible in Webmaster Tools, but that’s where we show it MALE SPEAKER: So seven months is out of the question JOHN MUELLER: Seven months sounds like there’s something that we’re still not picking up properly MALE SPEAKER: Yeah, all right OK I’ll get back to you with an email later Thank you JOHN MUELLER: OK All right, let’s run through some of the questions, and then we’ll open it up again for more questions from all of you afterwards My site got a message in search console about having JavaScript and CSS files blocked I use a CDN for all my files I’ve read that Googlebot doesn’t crawl these How can I allow Googlebot to crawl these files and still have them on a CDN? By default, we do still crawl files on a CDN, so it’s not that we would prevent– block crawling of them in any way They’re essentially crawled like any other content But I’ve seen some CDNs have a robust text file,

and they block crawling with that robust text file on the CDN So that’s something you might want to take a look at I think in your case, the CDN is actually using your main site’s robust text file, so you might want to double check that it’s actually allowing crawling of those files But there’s nothing that blocks us by default from crawling files that are hosted on a CDN I wanted to know if there’s a specific way to declare that we’re doing aggregation in some pages I know about the Google on/off tags So first of all, the Google on/off tags are for the Google Search Appliance They don’t have any effect at all in web search It’s only if you run your own search engine with the Search Appliance that they would have any effect And in general, what happens with aggregation is, we recognize that it’s super big content We try to ignore that in search And it’s mostly just a problem if a large part of your site is actually just aggregated content– if there’s no unique value of your own And that’s not something where marking up the aggregated content will make any difference Because if users come to your pages and they see all of this aggregated content, no unique content of your own, then they’ll still feel that this is kind of a lower-quality scraped, aggregated site So from that point of view, it’s something where you don’t need to mark up aggregated content You just need to make sure that your website has great content of its own and doesn’t just rely on aggregated, scraped, pulled-together, rewritten content Can you crawl, fetch and index protected pages so that a general user can find information when they use our Google custom search on our site and only display the results on our results pages? I took a quick look at that, and Google’s custom search engine crawls the way normal web search crawls So we have to be able to actually see that content in a public way to index it for this custom search engine If you want to keep your content behind authentication in some kind of protected way, you’d almost need to use something like a Google Search Appliance, which you can run on your site, to kind of index the content within your website, or intranet, or wherever you have that Discovered about 10 sites that have completely copied our site’s content, images and text What do suggest on something like this? If they link to us as all pages as the original source, is that fine, or do we need to take them down via the DMCA? The DMCA is a legal process, and I can’t give you legal advice with regards to what you should or shouldn’t do there I imagine this might be something where it could be relevant, so you might want to talk with someone who can give you some legal help in that regard In general, if we recognize that sites are copying content, we try to, kind of, just ignore that in search and focus on the actual unique content on those sites, not on the copied content So that’s something where we should be able to get that right in search But if you feel that there are legal reasons to kind of block that content from being published or shown in search, then maybe get legal advice to see if that makes sense in your case MALE SPEAKER: John, can I just ask a follow-up on that? JOHN MUELLER: Sure MALE SPEAKER: I’ve been advised in the past that the one thing that might work in that case is using a canonical tag, but using the same page as canonical So rather than saying the real page is over here, you canonical the page itself, saying the real page is this one And therefore, if someone scrapes the entire code, they take the canonical tag as well, and it basically tells you exactly what you need to know JOHN MUELLER: If they scrape the page like that, sure, that’s definitely an option I see a lot of scrapers who change the URLs that they find anyway, so they turn the canonical into their canonical, which doesn’t really help you that much And if it’s someone who’s copying the content out of your site and just placing it on their own site, then they’re not going to copy any of the metadata with the page So sometimes it can help It probably only works against the really, let’s say, incompetent scrapers who we would recognize as being irrelevant websites anyway Recently received many messages in Google Webmaster Tools

Search Console that Google can’t access CSS and JavaScript files Why does Googlebot need to access these files? What kind of data is Google looking for in these files? So, we’re not looking for anything crazy in the JavaScript files We essentially just want to render the pages like they would render in a browser So we want to see what users are seeing We want to make sure that any content you’re pulling in with JavaScript, that we can also see that, so that we can rank those pages appropriately in search And to some extent, that also plays in with the mobile-friendly guidelines that we have, where if we can recognize that a page is mobile-friendly, then we’ll be able to show that appropriately in search We’ll have the mobile-friendly label We can give it a slight ranking boost, those kind of things And we can only recognize if the page is mobile-friendly if we can look at it how a browser might look at it So for that, we need access to the CSS and JavaScript files And this isn’t something that’s really new, so there’s no algorithm or ranking change that’s happening here It’s really something that over the, I don’t know, maybe half a year, year now, we’ve been recommending that people let us crawl the JavaScript and CSS files for these reasons And obviously, if we can’t access the JavaScript and CSS files, then we can’t tell that you’ve done a great job on making your website mobile-friendly We can’t tell if there’s content that you’re pulling in with JavaScript that we should rank your website for So these are things where we’ve given out a lot of information on what you could be doing to make this better We’ve added tools to Search Console about that, and now we sent out an email blast to let people know that, hey, this is really something where you could improve your website by letting us crawl these pages so that we can recognize what we should rank your website for better I think maybe the messaging was a bit confusing, and I saw that a lot of sites were being flagged or had this message sent that have plug-ins that pull in maybe those small things with JavaScript, and those were blocked I’ve noticed a lot of those plug-ins actually updated in the meantime, and if you have your WordPress your other CMS set up to auto-update those plug-ins, then that’s something where probably this issue doesn’t play a big role anymore But I’d definitely take a look to see that we’re at least able to recognize that your site is mobile-friendly You can use a mobile-friendly test for that It also gives you information about the block URLs Yeah, question Nothing? OK Maybe we can get back to this in the end as well, I guess In order to benefit from the small SSL ranking factor, would you put the certificate on the entire site, or just the checkout process? From my point of view, you might as well put it everywhere So we use– we look at the URLs that are actually indexed in the way that they’re indexed, and we check to see if the SSL is working there– or the TLS, the new name is TLS– to kind of determine whether or not we should use this as a small ranking factor there And this is per URL So it’s not that we’re specifically looking for a checkout process and saying, well, only this needs to be secure It’s really per URL And if these are landing pages on your website, where you’re getting traffic through search, then we will use that there So it’s definitely something I’d put across the whole website I don’t see much of a reason to kind of hold back on not putting it across the whole website Obviously there’s a lot of technical work in implementing HTTPS across a website You have to make sure that all the embedded content is also HTTPS, all of that So it’s not something where I’d just switch it onto HTTPS and hope that everything works I’d really try to go through and make sure that you’re doing everything right How can affiliated sites rank well? Does Google trust them? Are they tips? What should we do? So, of course, affiliate sites can be really useful They can have a lot of great information on them, and we like showing them in search But at the same time, we see a lot of affiliates who are basically just lazy people who copy and paste the feeds that they get and publish them on their websites And this kind of lower-quality content, thin content, is something that’s really hard for us to show in search

Because we see people, maybe like hundreds of people, taking the same affiliate feed, republishing it on their website, and of course they’re not going to all rank number one for those terms We really want to see unique, high-quality content And that’s something that we do like to rank So it’s not something where we per se say that an affiliate site is bad We just see a lot of bad affiliate sites, because there are people who try to find a fast way to publish content on the web Which, I mean, sometimes being fast is also good, but you really also need to make sure that you have a really great website where people want to go to your website to get the information that they can’t find anywhere else And if you have affiliate links on a website, that’s great That’s not going to be something that we would count against the website Case-sensitive question Our company is shifting from uppercase or mixed-case EN-US to lowercase in the URLs The canonicals are currently lowercase The internal links are mixed What should we do? In a case like this, the thing to keep in mind is that any kind of a URL change is a URL change So we have to recrawl and reindex that and forward all the signals that we have to the new canonical URLs And that’s something that can take a bit of time If this is all within your website, then that’s less of a problem, because maybe we’ll still have the old version indexed, maybe we’ll have new version indexed, but regardless of which one we send people to, they’ll still make it to your website So that’s generally less of a problem And what I really just recommend in a case like this is making sure that you’re as consistent as possible with these URLs– that ideally, you have a redirect set up from one version to another, that you use a canonical link in the same way consistently across a website, if you use hreflang that you use that also consistently in the same way, that all internal links are in the same way, so that when we look at your website, we really have a clear signal saying, well, these are the old URLs, but everything is focusing on these new URLs, so Google should just focus on these new URLs as much as possible And that really helps us to make sure that we kind of follow your preference It’s not that you’re going to have any kind of a ranking change because of a URL change like this, but that if you have a strong preference for one of these versions, then let us know about so that we can actually help you to get that implemented Does Google use analytics information to decide about a website authority? If not, how does Google measure a site’s user experience factors? We don’t use any analytics information at all for web search, for crawling, indexing, or ranking, so that’s not something that we take into account there I do think sometimes some parts of Google use some aggregated information from analytics with regards to kind of understanding what’s happening there And that’s usually with– there’s I think a setting in analytics where you can say, I want to allow to share my data But that’s not something that we’d use directly on the site level for crawling, indexing, or ranking I noticed Googlebot crawls nonexistent pages after the end of a pagination series and throws server errors This happens from the last page in a series For example, page URL?page=34 And then Google goes to page URL Page=100, which doesn’t exist So one thing here is that this is a really common failure mode on a lot of websites, in that they’ll have pagination set up for pages that don’t actually exist So especially on long lists, we’ve sometimes seen that you can go to page 100 and it’ll have a Next and Previous button You can change the URL to 5,000, and it’ll still have a Next and Previous button You could just continue clicking the Next button until you’re at page 9 million, or whatever And that’s something where Googlebot is happy to click on links on your website, and it’ll continue clicking on those links until it finds something interesting to index So if there’s really no next page for a series that you have, make sure you don’t have that Next button so that we actually stop crawling there So it’s not that we recognize there’s a number here and just try random things out We really clicked on that at some point, found those URLs, and we were trying to recrawl them So just try to make sure that you don’t have Next buttons

that lead to sections that don’t really have any data That happens in lists Sometimes it also happens in calendars, where maybe you go to the year 9 million and still get a calendar entry, saying, hey, there are no events here, but maybe tomorrow So those are the kind of– we call them infinite spaces, where Googlebot needs to recognize that actually there’s nothing interesting to find here My question is Panda-related First of all, why have you guys decided to update this rollout so slowly? And second, can you tell us how is this update different from the other updates? Is it targeting more than just content? This is actually pretty much a similar update to before For technical reasons, we’re mostly rolling it out a bit slower It’s not that we’re trying to, I don’t know, confuse people with this It’s really just for technical reasons Here’s the same question I think We have a lot of content– a lot of on-page content at the bottom of our page We use Click to Expand links in order to improve the site design Do we need to remove those links in order to improve how Google sees our pages? Does this affect our rankings? So this is the old question about hidden content on a page, and what should we do there? So what generally happens when we recognize that content is hidden on a page like this is, we just don’t see it as being primarily important content for this page So we’ll crawl and index the page We’ll find this content If someone is searching specifically for it, we’ll still show that in search But if someone is searching for something more general, and we know this is actually not directly visible on a page, then we assume that this is not the most relevant content for this specific page And we won’t focus on that so much So with regards to the question of should I leave it or remove it, that’s essentially up to you If you think that this is interesting content for users to additionally have when they go to your pages, then maybe that’s fine to keep It doesn’t cause any problems for your page like this But if you think that this is content which is really critical for your website, which you want to rank for, then that’s something that you might want to place in a way that it’s actually visible directly on the page Or maybe you want to kind of move that content into a separate page, where you say, well, this is a lot of information– additional information It’s critical for some users, though, so I put it on a separate page and let people get to like that– let Google index it like that so that it’s directly visible Could you tell the difference between a 302 and a 303 redirect, and if there would be a case where you would use a 303 redirect So these kind of specific redirect questions come up from time to time From our side it’s actually pretty simple in that we only differentiate between permanent and temporary redirects And we try to recognize which type of redirect matches best So if you’re looking at some less common type of redirect, and you’re wondering how Google treats this– well, it’s basically just either a permanent redirect or a temporary redirect With a temporary redirect, we try to keep the redirecting page in our index as a canonical With a permanent redirect, we try to keep the redirect target in our index as a canonical And what can also happen is if we recognize that a temporary redirect is actually pretty permanent, then we’ll treat that as a permanent redirect as well And there’s no fancy, let’s say, page rank passing problem there with temporary redirects You can have temporary redirects on your website Sometimes there are good reasons to have temporary redirects It doesn’t cause any problems for your website It’s not that the page rank disappears or your ranking drops It’s really just a question of, should we index this URL or should we index the other URL? After getting hacked, we had to take down the site for a day About 200-plus 500 errors showed in Search Console, now completely fixed and running smooth How long will it take to regain the rankings we had before we had to take the servers offline for a bit? So I guess first of all, if you recognize that this is happening, and you have the ability to do this, I strongly recommend using a 503 result code for all requests that come in from search engines

or from users With a 503, we know that this is a temporary situation and you’re working on resolving this as quickly as possible And Google will say, well, fine, we’ll just come back and look at it again in a couple of days Whereas if you take the server down completely, or if you serve 404s, for example, then Googlebot will come and say, oh, looks like this website doesn’t exist anymore These pages don’t exist anymore I’ll drop them from my index So if there’s a way that you can serve a 503 when your site is hacked and you’re taking it down for maintenance, that’s really the best thing to do That way you shouldn’t really have any impact with regards to crawling– your rankings, at least, or your indexing Of course, if you leave the 503 for a longer period of time, if you take your website offline for a couple of weeks, then we’ll assume that this is a permanent state and it’s not just something temporary that we can look over With regards to this situation, where maybe 500 errors were showing, or the server was down, that’s something where once we recrawl those pages, we’ll be able to take that into account again and index them– rank them as we did before So it’s not something where we kind of artificially hold a website back, but it’s more of a technical issue that we have to recrawl those pages, recognize they’re OK, and put them back in our index together with the old signals that we had To some extent, we try to recognize this kind of failure when we see it happening, and keep those pages in our index anyway, just because we think maybe this is temporary and the website will be back soon So some of that might have actually worked here Some of that might be that we recrawl those pages a bunch of times and they drop out and we don’t have them for ranking anymore The good part here is that if we recognize that a page is kind of important for your website, we’ll generally crawl a bit more frequently So if it drops out of the index because of failure like this, then we’ll generally recrawl it a bit more frequently and bring it back in a little bit faster than we would with some random page on your website that never changed for the last 10 years So my guess is something like this, where if you had to take the server down for a day, you might see, maybe a week, two weeks– at most, maybe three weeks of time where things are kind of still in flux and settling down again But it shouldn’t take much longer that that I think you’re muted if you’re saying something MALE SPEAKER: You got me? JOHN MUELLER: Yeah MALE SPEAKER: OK Sorry about that Anyways, you know my site, and we got hacked, and it’s been the worst week ever But the thing about it is, is we were able to finally figure out that we were hacked, because we didn’t even realize that it was in MySQL There was like a billion database entries and it was killing the server when we’d have, like 500, 600 users at any given time– server would just [SNAPS FINGERS] drop out Anyways, it happened, and we couldn’t get in to make it a 503 We were just in hurry mode Anyways, everything’s fixed We use SiteBlock now to try and help deter that We’re also using CloudFlare kind of stuff to stop that stuff from happening But the pages are still there The inner pages that I’m talking about, they’re still indexed They just went from, like, number one to number– page four or five, or bottom of page one So they’re still indexed, it’s just like all the inner pages have just, after this happened– another note, though, is we noticed that there was a security hole in our whole theme So we have also updated our theme to a brand new theme, so I was wondering if maybe design maybe is playing a role in us dropping rankings basically on every inner page that we have Like I said, a lot of those were sitting right at number one And now they’re either page two, three, or four, or the bottom of page one That’s my question They’re still in the index JOHN MUELLER: Now, it’s hard to say because it kind of depends on how the pages were hacked and what was happening there So, especially if they were still indexed like this, and if someone went through the database and injected a bunch of– I don’t know– hidden links,

or spammy links to other sites, or spammy content, even, like a parasite hosting on your website through the database, then that’s something that can theoretically affect the ranking of those pages, because suddenly they look very spammy, and then we think, well, oh, something’s crazy here But usually that settles down very quickly as well But– so what I’d recommend there is if you notice that this is still happening, maybe next week or something like that, then feel free to send me some URLs where you know that they used to be a lot different in search, and they’re not showing up at all anymore So I can kind of take a look there to see what happened, or if there is something even still happening there So sometimes, for example, we’ll see people hack a website and cloak it so that only Google sees this problematic content, which is really hard to diagnose on your side, because you don’t see it So that’s something to kind of watch out for Maybe something like that happened and you fix it in the meantime Those might be things where maybe we point you at something technical that happened or that’s still happening, to kind of see what’s happening there, or maybe it’s just a case that our algorithms need a bit of time to settle down again MALE SPEAKER: OK OK All right, thanks JOHN MUELLER: Sorry to hear about your hack MALE SPEAKER: Oh man, it was the worst thing I mean, we got over 1,000 people on the site because something went viral, and all of a sudden, the server just says nope, I’m done And then we’re all freaking out, hey, we’re going viral, but now the servers are all down JOHN MUELLER: Oh Terrible timing I mean, getting hacked is never fun Finding it quickly really helps I remember my site, way in the beginning, used to get hacked every now and then, and it’s– I always found it worse when you look at it after a month and you realize, oh, it’s been hacked the whole time and I didn’t realize it But it’s always frustrating MALE SPEAKER: Mm-hmm JOHN MUELLER: Panda has been run many times over the past few years This Panda run has been set to be a crawl lasting several months Does that mean it’s moving slowly for one pass over the internet, or what’s happening here? So it’s not that we’re crawling slowly We’re crawling and indexing normal, and we’re using that content as well to recognize higher-quality, lower-quality sites But we’re rolling out this information in a little bit more slower way, mostly for technical reasons Does absolute URLs in a website navigation improve the crawl rate by lowering the Google crawl budget? No So absolute or relative URLs, they don’t affect crawling as long as we find the same URLs Sometimes it makes sense to use absolute URLs For example, if you know that your server automatically serves different variations of your website, like www, non-www, then having absolute URLs within your website for the internal links really help to make sure that we pick up the right version to show in search But it doesn’t affect the crawl budget It’s not something where I’d say you’re trying to– a big bump from going from absolute to relative or back If we really don’t– JOSHUA BERG: [INAUDIBLE] JOHN MUELLER: Yeah JOSHUA BERG: On the question before last, does technical reasons for Panda doing a slower rollout, would that include receiving feedback– aggregate data feedback, quite likely, during that process, which takes time? JOHN MUELLER: I don’t think so So you mean, like, webmasters giving us feedback about their sites? JOSHUA BERG: No Like, feedback directly through either Chrome analytics or any other aggregate data feedback that is coming through, which is– like, sometimes you mentioned we might look at this change and see, well, more people are clicking here or there, so our results must be getting better or worse JOHN MUELLER: No So this is really just an internal, technical problem that we’re rolling this out slower for So it’s not that we’re making this process slower by design It’s really an internal issue on our side

Thanks According to RSS board, the elements linked title and description are mandatory in an RSS site map, whereas according to Google, link and pub date are enough Will there be crawling issues without title and description? From our point of view, you can focus on that URL and the date if that’s all you want to provide, but in general I’d recommend, if you’re going to provide an RSS feed, make sure it’s a well-formed RSS feed that can be reused for other things as well So it’s something where, if you’re already going to do the work to kind of create this URL on your website, then you might as well do it in a way that works for all other systems that process RSS feeds So from that point of view, from Google’s side, of course, you can limit it to the minimum that we recommend From a general web platform point of view, I’d really just recommend making sure that it works as a normal RSS feed for anything else How long does it generally take for Knowledge Graph markup to be reflected in KG? Corrected KG logo contact information on our client’s website three weeks ago, but the change hasn’t been reflected Markup tester recognizes the markup properly I can imagine that there might be situations where it takes a couple of weeks for this to be processed So three weeks, I’d say, is probably around the borderline case where sometimes it can take this long Sometimes it can take maybe a little bit longer Sometimes it takes a bit shorter I’d recommend maybe giving it another week or two, and if you still don’t see any changes there, make sure to send us some examples, some of the URLs where you made these changes, so that we can double-check to make sure that our processes are actually picking this up properly We saw that cached pages URL for our website showed a 404 for a day, and later it showed up the next day I asked other experts about this Can this be a temporary issue, or is it something I need to check on my website in terms of indexing? If these pages are still indexed properly, one way you do that is just do an info query or a site query in search, then that would be fine Sometimes they’re just cache pages that disappear on our side, where we don’t show them at the moment That’s completely normal That can happen from time to time If you’re sure that your pages are still indexed properly, if you’re sure that there’s no technical problem on your website, then that’s something I wouldn’t necessarily worry about Is there any update on Google blocking the spamming traffic that can mess up your analytics data? So I haven’t heard a lot about this recently On the one hand, I know the analytics team is working on resolving that On the other hand, I haven’t heard a lot of complaints about that recently But I’ve also been on vacation, so that might have something to do with that So my guess is maybe the Google Analytics team has been working on this to kind of improve the situation But I know they’re working on cleaning that up in some ways I don’t know the details of what specifically they’re going to be doing, though Panda– is Panda running on a page-by-page basis or on a sitewide basis? We do try to run it on a sitewide basis to kind of recognize lower-quality, higher-quality websites Which also means that if your website has a lot of lower-quality content and some really great content, you should make sure that maybe the lower-quality content isn’t that much, or that you block it from indexing if you know about that I noticed recently that there are many hackers intentionally– there are many hackers intentionally hack to get a backlink from the gov– oh, probably from government websites Can you explain? So I guess this kind of points at hackers that are hacking government websites, or any other kind of websites, to try to get a link there And that’s, of course, something that isn’t really a good practice, because you’re hacking your government’s website– chances are they’ll get kind of upset about that I don’t really know what else to add there Essentially, they’re hacking a website, they’re leaving unnatural links, and both of those things are kind of bad– not something we’d recommend doing And if this is something maybe a previous SEO did,

then I’d definitely work on getting that cleaned up Let’s see, a bunch of more questions here Is there any way to track traffic on an internal page of a mobile app? I don’t know, but I think there are analytic solutions that help you track usage within a mobile app But I’m not really the expert there MALE SPEAKER: There’s also an app Hangout in a couple of weeks– indexing JOHN MUELLER: App indexing Hangout, yeah I think from an app indexing point of view, that’s something we probably wouldn’t cover there, because we’re looking more at how you can make pages from your app visible in search But the tracking side is usually something that you’d implement within your app to see where are people coming from, what are they doing within my app How do I report spammy SEO campaigns? You can use a spam report form, as always If it’s something bigger, where you think this doesn’t really fit into a spam report form, you can also email one of us directly, send us a note on Google+ We can’t promise to respond to everything that we get, but we do review them and we pass onto a spam team We have a huge drop of indexed pages within the last 60 page So I start with SEO Works, submitting a site map, and using Fetch as Google, but there’s no difference yet Can I get a hint to fix? Usually– so I guess there are few things involved here On the one hand, if you’re looking at the index pages with a site query, that’s probably a really bad way to get a count of the index pages, so I wouldn’t recommend doing that If you’re looking at the index status information in Search Console, then you need to keep in mind that it’s specific to that variation that you have verified So if your website goes from www to non-www, or has some pages on HTTP, some on HTTPS, or if you have multiple domains with the same content, then the index status information will be specific to that version that you have verified there So you’d need to, on the one hand, make sure that canonicalization works, so that one version has your content indexed, and on the other hand, maybe double-check the other variations to make sure that you’re not missing anything from a bigger count The site maps index count also works in the same way, in that it focuses specifically on the URLs that you have specified in your site map file Usually if there’s a drop like this, it’s hard to know offhand if this is a technical issue– maybe your website is serving no index on some of these pages– or if this is actually expected Maybe we crawled your website and indexed it with a bunch of different session IDs, and we indexed thousands of pages, but actually the content on your web page is maybe just 100 pages So that’s something where you would expect this number index count to go down over time as we recognize that we basically got lost on your website and accidentally indexed a bunch of pages that don’t need to be indexed So it’s hard to just look at that number and say, oh, you need to do this But rather, you need to look at a bunch of things, from technical issues to canonicalization, to make sure that you’re really doing things right there MALE SPEAKER: So John, is submitting to index, then, in Fetch as Google, is that not guaranteed that you’ll– I assume it’s guaranteed that you will index, but not that you’ll keep it for very long JOHN MUELLER: Exactly Yeah MALE SPEAKER: All right JOHN MUELLER: So it’s not guaranteed that we’ll index it I think maybe like 99% of the cases we do index it directly But we also need to be able to kind of keep it for the long term, and just indexing it once doesn’t necessarily guarantee that we’ll keep it for the long term MALE SPEAKER: John, is it possible that if you have too many comments, and you have them paginated and fully indexable, that that might backfire? I know that the– that Barry says that the only change he made when he recovered from the latest Panda update was that he made the comments JavaScript only Was that the– do you think that could be somehow linked, or– because he didn’t make any other changes, he’s claiming JOHN MUELLER: I don’t know specifically what happened with Barry’s site, so it’s hard to say, but in general, we do process JavaScript nowadays

So if the comments are in the HTML, or the JavaScript, it might be that we’re just treating them exactly the same With regards to paginated comments or not, that’s something where you kind of have to take a look at the quality of the comments there and make a call based on that So I wouldn’t say that a large number of comments is always great, or that a small number of comments is always great But you really need to keep in mind that we see this as a part of your page And if this is high-quality content, if this is something that helps a user make a decision one way or another– for example, if these are reviews, where people are saying, well, this product is great or this other product is great, and this is why, and here’s some detailed information on how I use this– then that’s something that can be really useful On the other hand, if these are just comments saying, hey, visit my site here, or visit my site here, cheap shoes, then that’s something that we’d say, well, this is kind of low-quality content Where– when we index it for your website, it’s probably not what you want to stand for MALE SPEAKER: What if it is somewhere in the middle? So it’s not very insightful It’s definitely not spammy It’s legitimate comments, but it’s not the most insightful comment also A lot of them you won’t read them and be like, oh, wow, I had never thought about that But if it’s something like that, so it’s just normal discussion happening, with most of them I would say, since most people are not experts, not anything earth-shattering being revealed, or some great insight Or sharing their personal experience What would you say– and they’re not reviews JOHN MUELLER: It could be either way So it’s really hard to say there What I would do– try to do it in a case like this, where you have a lot of comments– I think in general having people engaging with your content is a good thing But what you might want to think about doing is finding a way to recognize the really great comments in there and highlighting those on the page, and making it possible for people to click through to the rest of the comments, but maybe, depending on the situation, blocking those from being indexed, for example So kind of like when you go to a product page on– I think Amazon does this– you scroll to the bottom, they have some reviews there, but these are usually the reviews that for one reason or another, they recognize these are important reviews Or they kind of give a balanced view– maybe some positive, some negative reviews And all the other reviews are available from there as well, but you have to kind of click through to see those So maybe that’s something that would make sense there, so that you really, when you look at your pages, you say, well, all of the comment– all of the content I’m providing on this page, including the user comments, is really exactly what I want to stand for This is high-quality stuff This is something I feel proud to provide to anyone who wants to visit my website MALE SPEAKER: OK, that’s very helpful And is it fair to say that Panda is not– doesn’t have a link element? Or is there a link element on Panda as well? JOHN MUELLER: So if people place, like, spammy links on a website or something like that? MALE SPEAKER: No, your inbound links on the site JOHN MUELLER: I don’t think so MALE SPEAKER: OK So it’s all about the quality of the content of the site? JOHN MUELLER: Yeah MALE SPEAKER: OK JOHN MUELLER: I mean, we have other algorithms that look at the quality and the quantity of links that are going to a website, and sometimes that plays a role with other things, but I don’t think with recognizing the quality of the content, we would need to look into external links there MALE SPEAKER: OK Helpful Thank you very much JOHN MUELLER: All right We just have a few minutes left, so I’ll just open it up for all of you MALE SPEAKER: Can I ask another question, John? JOHN MUELLER: All right MALE SPEAKER: I remember– I don’t know if you remember I emailed you a while ago about the number of links going from our old site to the new within Webmaster Tools, and that was around the 100,000 mark, and it’s now gone up to 200,000, when there’s no links at all between the two sites And within Webmaster Tools, every single one of those links is, when you click through to see what they are, they’re all via an intermediary link, every single one of them And so I’m thinking it must be from the time when we had the old site 301ed to the new So is it possible that Google can still be seeing a 301– you store the fact that there was a 301 in your system, and even though we’ve updated

the site months and months ago, you’re still, somewhere in your memory, you’re saying, well, I know that there’s a 301 in place to that page So I’m still going to consider there to be a link on it, even though there isn’t and hasn’t been for months JOHN MUELLER: I would tend to see that more as a reporting problem in Webmaster Tools, or in Search Console, where maybe we’re not showing the newest data that we should be showing there, and– MALE SPEAKER: But then why does it update with more information that’s even more wrong than before? JOHN MUELLER: Don’t know [LAUGHTER] It is confusing We shouldn’t be doing that Yeah, we shouldn’t be reporting something like that I’ll double check with the team on that later today MALE SPEAKER: OK, thanks JOSHUA BERG: John, regarding in the Webmaster– actually, the Search Console– have you’ve given, or has there been further consideration recently about extending the three months of data to a longer time period? JOHN MUELLER: We discuss that all the time We always– I don’t know I think they’re getting tired of us pushing for this But we do look at it with the team to kind of see what we can do there To some extent, it’s kind of balancing different priorities On the one hand, we’d like to offer new features as well On the other hand, people like to have the old features expanded a bit, more storage Adding more storage there isn’t that trivial in the sense that we can’t just, like, say, oh, well, we’ll just double the amount of storage there We really need to kind of then rethink how Search Console uses this data to make sure that it’s still snappy to use, that it’s not– it doesn’t get bogged down, like, turn into something really slow So that’s something where– JOSHUA BERG: It’s not more problematic than analytics? JOHN MUELLER: It’s– JOSHUA BERG: –considering the amount of data it has to store? JOHN MUELLER: It’s not that it’s impossible It’s not that I’d say people, or the engineers, can’t just do that, but that it’s a nontrivial amount of work to actually implement that It’s not just a dial where we can twist and say, oh, well, maybe we’ll just add like 100% more, and that’ll be OK It’s kind of like, if you– I don’t know if you have a– I don’t know If you need storage within your database, and you go from like 1,000 users to 1 million users, then that’s a big step That’s something where you can’t just use the same setup that you have and say, well, it’ll work well with twice as much or 10 times as much storage You really kind of have to rethink your design when you do that And analytics has done that earlier, or maybe from the start, I don’t know And they’ve kind of planned for that On our side, we made a product decision in the beginning to say, well, we have our three months, and people can download the data if they want to keep it longer But it’s focused on those three months So it’s something where, when we talk with the engineers and the product managers, they’re saying, well, should we put more storage here, or should we add this really awesome feature, you know? Tell us what we should do And of course, our preferred answer is, do everything But there’s a limit to what can be done So we kind of have to focus on individual parts and make a decision on either adding more storage, adding this feature, building out this other feature, working on an API All of these things have to balanced somehow JOSHUA BERG: Is how people might misuse or abuse that data also a significant consideration there? In other words, they may see, or think that they see– understand more about the algorithm data or different updates, et cetera? JOHN MUELLER: For some kind of data, that could play a role, like if we added a lot more information for links, for example With regards to how we treat those links internally, that might be something where we fear that people might misinterpret or misuse that information With regards to specifically the search queries, I don’t see that as being a problem That’s, I mean, the search query data, if you download it now, you’ll have it next year here as well And just because you’ve downloaded it now doesn’t necessarily tell you more about our algorithms than if you didn’t download it now JOSHUA BERG: All right, thanks JOHN MUELLER: One thing I suspect will happen is, when we finalize the API for search analytics– for the search query information– that maybe some people will offer some tools to kind

of download this on a regular basis so that you can look at it offline on your side Or that you can keep it on their server, or on your server, and kind of have this aggregated collection of longer-period-of-time search analytics information With the new API, that’s a lot easier than it is with just downloading the CSV files now So I suspect some of that might happen And if you’re a programmer and you have access to the API beta, then that might be something to try out in the setup But I don’t think, from our side, we’d be offering a significant longer period of data for the search analytics information, at least not in the initial drop JOSHUA BERG: You know, you can load it into Google Docs and stuff the data quite easily and use it like that I mean, if there was even some way that we could, even from within the Search Console, automate a couple of steps of anything like that– JOHN MUELLER: Yeah, I mean, I knew that JOSHUA BERG: I mean, we can do that with a API already JOHN MUELLER: Yeah, yeah JOSHUA BERG: But if something like that might be available within the Search Console– JOHN MUELLER: I mean, you can do that currently with the download links Just push that into your spreadsheets That would work But with the API, you can, I don’t know, have a web app where you just push the button once a month or that automatically does it once a month and pushes this data into your Google Spreadsheets, or whatever you’re using The tricky part there is, of course, it pushes the data in a list, or in whatever format that you give this data It’s not that you’d have the same UI, where you can filter and pull things out and compare that easily But I’m sure there’ll be people who have innovative solutions to make that possible JOSHUA BERG: Yes, certainly JOHN MUELLER: All right Let’s take one more question We’re a bit out of time, but– let’s MIHAI APERGHIS: Hey John Since I jumped in late, I hope I can get the last question So actually, it’s two One, just to ask you if you got my email regarding the spam report I sent you that, I think, a week ago Just to know if you– it got your way and maybe you forwarded it to the [INAUDIBLE] JOHN MUELLER: It’s probably on my vacation list somewhere [LAUGHTER] MIHAI APERGHIS: OK And can confirm whether you read it, when you read it, so, that would be fine The other thing was regarding– my client had a bread crumbs issue regarding the home element on the displaying [INAUDIBLE] I actually managed to solve that by using JSON-LD So I guess this is something maybe for people who are struggling with this issue, where the home element isn’t fully visible on the page, or [INAUDIBLE] or something like that, JSON-LD seems to work fine as long as I also mark up the home element I tried without marking up the home element and it didn’t work So I did it with the home element But one issue I saw was that if you have a multilanguage website, and your home link is, like, /EN for the English version, the bread crumbs will show, like, www site .com, brand name, then category, because it thinks that /EN is a soft category of some sort So [INAUDIBLE] JOHN MUELLER: Yeah, I’ve seen something similar with other cases where you have that kind of a structure within a website Sometimes with site links, as well So I suspect it’s something that will take a bit of time to kind of get ironed out completely in our search results MIHAI APERGHIS: Yeah, well, I used the home element just as a normal domain name, and that seems to work fine JOHN MUELLER: OK Good Good to know, yeah I think more and more of the structure data elements will be possible with JSON-LD, and that makes it possible to do some more, I guess, advanced markup, like you’re doing there, so looking forward to seeing how that will roll out MIHAI APERGHIS: By the way, regarding a site link search box, how long– so if a website already has a site link search box, but with no markup that would send you to the site column domain name search, how long from when you implement the JSON-LD markup does it get picked up and send users to the search results page? JOHN MUELLER: That can sometimes take a couple of weeks MIHAI APERGHIS: OK

JOHN MUELLER: So it’s not as fast as just recrawling the homepage We really have to kind of reprocess it specifically for that, and that– that whole process takes a bit longer MIHAI APERGHIS: And if the site is using a Google Custom Search thingie on– inside its website, so would that be a problem, or should– JOHN MUELLER: No That’s fine, yeah MIHAI APERGHIS: OK JOHN MUELLER: All right With that, let’s take a break here Thank you all for all of the questions that you submitted It’s been really interesting to get so much feedback here as well, and to have so many live questions here too I’ll set up the next couple of Hangouts I think the next one currently is for app indexing If you do have an Android app, if you’re implementing app indexing, by all means jump in there For more webmaster, web search Hangouts, I’ll set those up maybe later today so that you can start adding questions there, too With that, thanks again, and I hope you guys have a great weekend, and maybe see you in one of the future Hangouts Bye everyone