Tuesday, September 6, 2016

"Disallow ia_archiver" does not remove pages from archive.org

The experiment (September 2016):

- A site exists since 2008 and has many snapshots on archive.org.

- The FAQ says:
You can exclude your site from display in the Wayback Machine by placing a robots.txt file on your web server that is set to disallow User-Agent: ia_archiver.
- Made the robots.txt as follows:

User-agent: ia_archiver
Disallow: /

- In a couple of days, indeed, instead of the site snapshots, a message saying that the archive is not available because of the robots.txt instruction. Success!

- Removed the robots.txt.

My expectations:

The old pages had disappeared from the archive completely, but the current one is now being crawled and included into the archive.

The reality:

All the old pages are back!

The bottom line:

Using the robots.txt, one can instruct Wayback Machine to stop displaying the site (which is exactly what the FAQ says!) but not to remove the pages from the  archive.

Thursday, July 14, 2016

Cygwin and ANSI terminal

Problem

I see ANSI color sequences on Cygwin.

Explanation

The default Cygwin.bat runs bash in a Window terminal, which does not support ANSI sequences, so you'll see the color codes instead of actual colors:

$ codecept selfupdate

[37;41m                                                                   [39;49m
[37;41m  [Symfony\Component\Console\Exception\CommandNotFoundException]   [39;49m

Solution

Edit the Cygwin.bat and make it look like:

@echo off

C:
chdir C:\cygwin64\bin

mintty -

Now, it will run "mintty", which is a different terminal, and it will show the colors correctly.
(Be prepared for the possible changes in copy/paste with mouse clicks).

Tuesday, May 31, 2016

A curious case of "you are not allowed" in WordPress

Register a custom taxonomy. Everything looks great, except for one "tiny" problem: any attempt to Add New results in immediate "You are not allowed" message!

Blame Yoast, of course! :)
He suggested "is_admin, but not DOING_AJAX". So, the "register taxonomy" was hooked on "not AJAX". Oooooopsi!

Re-hooked. Ta-da!

Thursday, May 19, 2016

debug_backtrace and call_user_func in PHP 7

Before PHP 7, functions call_user_func() and call_user_func_array() appeared in debug_backtrace() as separate entries. Since the version 7, they do not.

Demonstration code Gist

Friday, October 16, 2015

WordPress security : disallow author query

Friday afternoon. Looking at the access log... here are some "nice" requests. Happened at the same millisecond, and look very "hack-ish" to me.

54.80.2.64 - - [16/Oct/2015:18:46:25 +0000] "HEAD /?author=5 HTTP/1.1" 404 159 "-" "-"
54.80.2.64 - - [16/Oct/2015:18:46:25 +0000] "HEAD /?author=1 HTTP/1.1" 404 159 "-" "-"
54.80.2.64 - - [16/Oct/2015:18:46:25 +0000] "HEAD /?author=3 HTTP/1.1" 404 159 "-" "-"
54.80.2.64 - - [16/Oct/2015:18:46:25 +0000] "HEAD /?author=2 HTTP/1.1" 404 159 "-" "-"
54.80.2.64 - - [16/Oct/2015:18:46:25 +0000] "HEAD /?author=4 HTTP/1.1" 404 159 "-" "-"


Well, they all resulted in 404 Page not found...

...because I have this in .htaccess:

# - Do not allow author query to avoid real names exposure
RewriteCond %{QUERY_STRING} ^author=\d+
RewriteRule ^ - [R=404,L]

It's that simple. :)