Sep 01 2017

Decomissioning my Drupal blog

If you are looking at this blog post right now... my live Drupal site has finally been decommissioned.. or not .. well these pages are served statically but the content is still generated by an ancient aging Drupal 6, which is hiding somewhere in a container that I only start when I need it.

Given my current low blog volume .. and the lack of time to actually migrate all the content to something like Jekyll or Webby I took the middle road and pulled the internet facing Drupal offline. My main concern was that I want to keep a number of articles that people frequently point to in the exact same location as before. So that was my main requirement, but with no more public facing drupal I have no more worrying about the fact that it really needed updating, no more worrying about potential issues on wednesday evenings etc

My first couple of experiments were with wget / curl but I bumped into. Sending a Drupal site into retirement which pointed me to httrack which was a new tool for me ..

As documented there
httrack -O . -N "%h%p/%n/index%[page].%t" -WqQ%v --robots=0
creates a usuable tree but root page ends up in blog/blog which is not really handy.
So the quick hack for that is to go into the blog/blog subdir and regexp the hell out of all those files generated there direction one level below :)
for file in `ls`; do cat $file | sed -e "s/\.\.\//\/blog\//g" > ../$file ; done

httrack however has one annoying default in which it puts metatdata in the footer of a page it mirrors, where it comes from and when it was generated thats very usefull for some use cases, but not for mine as it means that every time I regenerate the site it actually generates slightly different content rather than identical pages. Luckily I found the -%F "" param to keep that footerstring empty

And that is what you are looking at right now ...

There are still a bunch of articles I have in draft .. so maybe now that I don't have to worry about the Drupal part of things I might blog more frequent again, or not..