Some sort of "pollution" between two Virtual Hosts on the same machine, causes Google to look on site A for files on site B

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Some sort of "pollution" between two Virtual Hosts on the same machine, causes Google to look on site A for files on site B

Dr. David Kirkby
I'm running Apache on a Debian 9 system.

root@localhost:~# apache2ctl -v
Server version: Apache/2.4.25 (Debian)
Server built:   2018-03-31T08:47:16

on a virtual private server, with one IP address. I have about 6 virtual
hosts on there. One is

https://www.g8wrb.org/

which has a directory 'data", with valve data sheets on it.

So for example, there's a file
https://www.g8wrb.org/data/Eimac/4CX10000D.pdf

If Googlebot goes around looking for that it will find it. The problem is,
Googlebot is looking on another domain

https://www.kirkbymicrowave.co.uk/

for the same files, so for example, you can see the last line of the logs
below, that googlebot is looking for

/data/Eimac/4CX10000D.pdf

on the https://www.kirkbymicrowave.co.uk/ domain, despite the fact that the
file has never been on that website. It seems as though Google is mixing
the two sites up in some way, and hunting for files on one domain, that
should (and are) be on another domain hosted on the same server.

Needless to say, when I look with Google Analytics, I see a ton of 404
errors, as Google can't find the files it is looking for on
https://www.kirkbymicrowave.co.uk/, which is hardly surprising, as they
were never there.

Can anyone explain what might be happening? I have posted the four
VirtualHosts related to the https://www.kirkbymicrowave.co.uk/  domain
below. There are 4, to cover 4 possibilities, to cover of going to the
domain without the www, and with www, and also to a non secure version on
port 80, and a secure version on port 443.

access-kirkbymicrowave.co.uk.log.6:66.249.66.66 - - [16/Jun/2018:06:11:01
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/3CX10000H3.pdf
HTTP/1.1" 404 575 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk.log.6:66.249.66.68 - - [16/Jun/2018:06:14:45
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/AB5.pdf
HTTP/1.1" 404 568 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk.log.6:66.249.66.70 - - [16/Jun/2018:06:22:27
+0000] "GET
/complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/4CX5000R.pdf
HTTP/1.1" 404 573 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.64 - -
[28/Jun/2018:22:32:18 +0000] "GET /data/Eimac/4-125A.pdf HTTP/1.1" 404 6325
"-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html
)"
access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.67 - -
[28/Jun/2018:22:45:01 +0000] "GET /data/Eimac/4CX10000D.pdf HTTP/1.1" 404
6325 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"


<VirtualHost *:443>
    # The ServerName directive sets the request scheme, hostname and port
that
    # the server uses to identify itself. This is used when creating
    # redirection URLs. In the context of virtual hosts, the ServerName
    # specifies what hostname must appear in the request's Host: header to
    # match this virtual host. For the default virtual host (this file) this
    # value is not decisive as it is used as a last resort host regardless.
    # However, you must set it for any further virtual host explicitly.
    ServerName www.kirkbymicrowave.co.uk

    ServerAdmin [hidden email]
    DocumentRoot /var/www/html/kirkbymicrowave.co.uk

        SetOutputFilter DEFLATE
        SetEnvIfNoCase Request_URI "\.(?:gif|jpe?g|png)$" no-gzip

    # Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
    # error, crit, alert, emerg.
    # It is also possible to configure the loglevel for particular
    # modules, e.g.
    #LogLevel info ssl:warn

    ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-SSL.log
    CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-SSL.log
combined

        SSLEngine on
        SSLCertificateKeyFile /etc/ssl/private/www_kirkbymicrowave_co_uk.key
        SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
        SSLCertificateChainFile
/etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
    # For most configuration files from conf-available/, which are
    # enabled or disabled at a global level, it is possible to
    # include a line for only one particular virtual host. For example the
    # following line enables the CGI configuration for this host only
    # after it has been globally disabled with "a2disconf".
    #Include conf-available/serve-cgi-bin.conf

        ErrorDocument 404 /error-pages/404.html
        ErrorDocument 410 /error-pages/410.html
        ErrorDocument 500 /error-pages/500.html
        ErrorDocument 503 /error-pages/503.html
</VirtualHost>

<VirtualHost *:80>
        # Redirect www.kirkbymicrowave.co.uk on port 80 to the https site.
    ServerName www.kirkbymicrowave.co.uk
    ServerAdmin [hidden email]
    ErrorLog ${APACHE_LOG_DIR}/error-www.kirkbymicrowave.co.uk-port-80.log
    CustomLog
${APACHE_LOG_DIR}/access-www.kirkbymicrowave.co.uk-port-80.log combined
        Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>

<VirtualHost *:80>
        # Redirect kirkbymicrowave.co.uk on port 80 to the https site.
    ServerName kirkbymicrowave.co.uk
    ServerAdmin [hidden email]
    ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-80.log
    CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-80.log
combined
        Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>


<VirtualHost *:443>
#        Redirect kirkbymicrowave.co.uk on port 443 to the www. site.
        ServerName kirkbymicrowave.co.uk
         SSLEngine on
        SSLCertificateKeyFile /etc/ssl/private/www_kirkbymicrowave_co_uk.key
        SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
        SSLCertificateChainFile
/etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
    ServerAdmin [hidden email]
    ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-443.log
    CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-443.log
combined
        Redirect "/" "https://www.kirkbymicrowave.co.uk/"
</VirtualHost>
Reply | Threaded
Open this post in threaded view
|

Re: Some sort of "pollution" between two Virtual Hosts on the same machine, causes Google to look on site A for files on site B

Matt Sicker
I believe you have the wrong mailing list. Take a look at <
http://httpd.apache.org/lists.html> for the proper user list for Apache
HTTP Server.

On Tue, 3 Jul 2018 at 07:24, Dr. David Kirkby <
[hidden email]> wrote:

> I'm running Apache on a Debian 9 system.
>
> root@localhost:~# apache2ctl -v
> Server version: Apache/2.4.25 (Debian)
> Server built:   2018-03-31T08:47:16
>
> on a virtual private server, with one IP address. I have about 6 virtual
> hosts on there. One is
>
> https://www.g8wrb.org/
>
> which has a directory 'data", with valve data sheets on it.
>
> So for example, there's a file
> https://www.g8wrb.org/data/Eimac/4CX10000D.pdf
>
> If Googlebot goes around looking for that it will find it. The problem is,
> Googlebot is looking on another domain
>
> https://www.kirkbymicrowave.co.uk/
>
> for the same files, so for example, you can see the last line of the logs
> below, that googlebot is looking for
>
> /data/Eimac/4CX10000D.pdf
>
> on the https://www.kirkbymicrowave.co.uk/ domain, despite the fact that
> the
> file has never been on that website. It seems as though Google is mixing
> the two sites up in some way, and hunting for files on one domain, that
> should (and are) be on another domain hosted on the same server.
>
> Needless to say, when I look with Google Analytics, I see a ton of 404
> errors, as Google can't find the files it is looking for on
> https://www.kirkbymicrowave.co.uk/, which is hardly surprising, as they
> were never there.
>
> Can anyone explain what might be happening? I have posted the four
> VirtualHosts related to the https://www.kirkbymicrowave.co.uk/  domain
> below. There are 4, to cover 4 possibilities, to cover of going to the
> domain without the www, and with www, and also to a non secure version on
> port 80, and a secure version on port 443.
>
> access-kirkbymicrowave.co.uk.log.6:66.249.66.66 - - [16/Jun/2018:06:11:01
> +0000] "GET
>
> /complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/3CX10000H3.pdf
> HTTP/1.1" 404 575 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html)"
> access-kirkbymicrowave.co.uk.log.6:66.249.66.68 - - [16/Jun/2018:06:14:45
> +0000] "GET
>
> /complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/AB5.pdf
> HTTP/1.1" 404 568 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html)"
> access-kirkbymicrowave.co.uk.log.6:66.249.66.70 - - [16/Jun/2018:06:22:27
> +0000] "GET
>
> /complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/4CX5000R.pdf
> HTTP/1.1" 404 573 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html)"
> access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.64 - -
> [28/Jun/2018:22:32:18 +0000] "GET /data/Eimac/4-125A.pdf HTTP/1.1" 404 6325
> "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html
> )"
> access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.67 - -
> [28/Jun/2018:22:45:01 +0000] "GET /data/Eimac/4CX10000D.pdf HTTP/1.1" 404
> 6325 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +
> http://www.google.com/bot.html)"
>
>
> <VirtualHost *:443>
>     # The ServerName directive sets the request scheme, hostname and port
> that
>     # the server uses to identify itself. This is used when creating
>     # redirection URLs. In the context of virtual hosts, the ServerName
>     # specifies what hostname must appear in the request's Host: header to
>     # match this virtual host. For the default virtual host (this file)
> this
>     # value is not decisive as it is used as a last resort host regardless.
>     # However, you must set it for any further virtual host explicitly.
>     ServerName www.kirkbymicrowave.co.uk
>
>     ServerAdmin [hidden email]
>     DocumentRoot /var/www/html/kirkbymicrowave.co.uk
>
>         SetOutputFilter DEFLATE
>         SetEnvIfNoCase Request_URI "\.(?:gif|jpe?g|png)$" no-gzip
>
>     # Available loglevels: trace8, ..., trace1, debug, info, notice, warn,
>     # error, crit, alert, emerg.
>     # It is also possible to configure the loglevel for particular
>     # modules, e.g.
>     #LogLevel info ssl:warn
>
>     ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-SSL.log
>     CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-SSL.log
> combined
>
>         SSLEngine on
>         SSLCertificateKeyFile
> /etc/ssl/private/www_kirkbymicrowave_co_uk.key
>         SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
>         SSLCertificateChainFile
> /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
>     # For most configuration files from conf-available/, which are
>     # enabled or disabled at a global level, it is possible to
>     # include a line for only one particular virtual host. For example the
>     # following line enables the CGI configuration for this host only
>     # after it has been globally disabled with "a2disconf".
>     #Include conf-available/serve-cgi-bin.conf
>
>         ErrorDocument 404 /error-pages/404.html
>         ErrorDocument 410 /error-pages/410.html
>         ErrorDocument 500 /error-pages/500.html
>         ErrorDocument 503 /error-pages/503.html
> </VirtualHost>
>
> <VirtualHost *:80>
>         # Redirect www.kirkbymicrowave.co.uk on port 80 to the https site.
>     ServerName www.kirkbymicrowave.co.uk
>     ServerAdmin [hidden email]
>     ErrorLog ${APACHE_LOG_DIR}/error-www.kirkbymicrowave.co.uk-port-80.log
>     CustomLog
> ${APACHE_LOG_DIR}/access-www.kirkbymicrowave.co.uk-port-80.log combined
>         Redirect "/" "https://www.kirkbymicrowave.co.uk/"
> </VirtualHost>
>
> <VirtualHost *:80>
>         # Redirect kirkbymicrowave.co.uk on port 80 to the https site.
>     ServerName kirkbymicrowave.co.uk
>     ServerAdmin [hidden email]
>     ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-80.log
>     CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-80.log
> combined
>         Redirect "/" "https://www.kirkbymicrowave.co.uk/"
> </VirtualHost>
>
>
> <VirtualHost *:443>
> #        Redirect kirkbymicrowave.co.uk on port 443 to the www. site.
>         ServerName kirkbymicrowave.co.uk
>          SSLEngine on
>         SSLCertificateKeyFile
> /etc/ssl/private/www_kirkbymicrowave_co_uk.key
>         SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt
>         SSLCertificateChainFile
> /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle
>     ServerAdmin [hidden email]
>     ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-443.log
>     CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-443.log
> combined
>         Redirect "/" "https://www.kirkbymicrowave.co.uk/"
> </VirtualHost>
>


--
Matt Sicker <[hidden email]>