Here is a memo on backing up MediaWiki instances, say deployed
as a part of a Web site mywebsite.com
.
Here is a listing of concrete steps:
Get inside the backup root directory on local file system:
cd /Volumes/BACKUP/mywebsite.com
Backup using the backup_mediawiki.sh
backup
script
- Login web server
- Update VCS repository
https://github.com/lumeng/MediaWiki_Backup
- Back up using
MediaWiki_Backup/backup_mediawiki.sh
# assuming web directory is ~/mywebsite.com/wiki WIKI_PATH="mywebsite.com/wiki" # assuming the path to save a subdirectory backup_YYYYMMDD created by backup is path/to/backup/mywebsite.com/wiki WIKI_BACKUP_PATH="path/to/backup/mywebsite.com/wiki" # get to the home path before start cd # Start backup. This will create backup path/to/backup/mywebsite.com/wiki/backup_YYYYMMDD. path/to/backup_mediawiki.sh -d $WIKI_BACKUP_PATH -w $WIKI_PATH
- Rsync the backup to a local hard drive:
cd /Volumes/BACKUP/mywebsite.com
Backup the whole web site user's home directory that includes the backup files created above, using rsync
rsync --exclude-from rsync_backup_exclusion.txt -thrivpbl user@webhost.com:/home/websiteuser rsync_backup/ - Ideally, upload the backup to cloud storage such as Drobpox.
HTML backup using wget
for immediate read
Optionally, one can also keep a crawled version of a MediaWiki instances. Sometimes, it can be useful to have a copy of HTML files for immediate read offline.
cd /Volumes/BACKUP/mywebsite.com/wget_backup
mkdir mywebsite.com-wiki__wget_backup_YYYYMMDD
cd mywebsite.com-wiki__wget_backup_YYYYMMDD
# crawl the whole Web site
# wget -k -p -r -E http://www.mywebsite.com/
# crawl the pages of the MediaWiki instance excluding the Help and Special pages
wget -k -p -r --user-agent='Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36' -R '*Special*' -R '*Help*' -E http://www.mywebsite.com/wiki/
cd ..
7z -a -mx=9 mywebsite.com-wiki__wget_backup_YYYYMMDD.7z wget_backup_YYYYMMDD
Remarks:
-k
: convert links to suit local viewing-p
: download page requisites/dependencies-r
: download recursively--user-agent
: set "fake" user agent for the purpose of emulating regular browsing as sometimes site checks user agent. Check user agent string at useragentstring.com.
As for time cost to create the wget-crawled backup, for reference, it took about 30 min to download a small MediaWiki installation with hundreds of user-created pages in an experiment I did.
If there is a small set of pages that you need to backup,
curl
may be alternatively used, for example,
# download multiple pages
curl -O http://mywebsite.com/wiki/Foo_Bar[01-10]
References
- https://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki
- https://www.mediawiki.org/wiki/Fullsitebackup
- https://www.mediawiki.org/wiki/Manual:DumpBackup.php
- https://wikitech.wikimedia.org/wiki/Category:Dumps