<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blue Cog Blog &#187; sh</title>
	<atom:link href="http://www.bluecog.com/blog/tag/sh/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bluecog.com/blog</link>
	<description>It's just a freaking blue cog...</description>
	<lastBuildDate>Tue, 03 Aug 2010 19:32:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Pair Networks Database Backup Automation</title>
		<link>http://www.bluecog.com/blog/2009/11/10/pair-networks-database-backup/</link>
		<comments>http://www.bluecog.com/blog/2009/11/10/pair-networks-database-backup/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 22:00:47 +0000</pubDate>
		<dc:creator>Bill Melvin</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Automation]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[FTP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[sh]]></category>
		<category><![CDATA[Ubuntu]]></category>

		<guid isPermaLink="false">http://www.bluecog.com/blog/?p=403</guid>
		<description><![CDATA[I have a couple WordPress blogs, this being one of them, hosted at Pair Networks. I also have another non-blog site that uses a MySQL database. I have been doing backups of the databases manually through Pair&#8217;s Account Control Center (ACC) web interface on a somewhat regular basis, but it was bugging me that I [...]]]></description>
			<content:encoded><![CDATA[<p>I have a couple WordPress blogs, this being one of them, hosted at <a href="http://www.pair.com/" target="_blank">Pair Networks</a>. I also have another non-blog site that uses a MySQL database. I have been doing backups of the databases manually through Pair&#8217;s Account Control Center (ACC) web interface on a somewhat regular basis, but it was bugging me that I hadn&#8217;t automated it. I finally got around to doing so.</p>
<p>A search led to this <a href="http://www.bradtrupp.com/mysql-backup-cron.html" target="_blank">blog post</a> by Brad Trupp. He describes how to set up an automated database backup on a Pair Networks host. I used &#8220;technique 2&#8243; from his post as the basis for the script I wrote.</p>
<h3>Automating the Backup on the Pair Networks Host</h3>
<p>First I connected to my assigned server at Pair Networks using SSH (I use <a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/" target="_blank">PuTTY</a> for that). There was already a directory named <strong>backup</strong> in my home directory where the backups done through the ACC were written. I decided to use that directory for the scripted backups as well.</p>
<p>In my home directory I created a shell script named <strong>dbbak.sh</strong>.</p>
<p><code>touch dbbak.sh</code></p>
<p>The script should have permissions set to make it private (it will contain database passwords) and executable.</p>
<p><code>chmod 700 dbbak.sh</code></p>
<p>I used the nano editor to write the script.</p>
<p><code>nano -w dbbak.sh</code></p>
<p>The script stores the current date and time (formatted as YYYYmmdd_HHMM) in a variable and then runs the mysqldump utility that creates the database backups. The resulting backup files are simply SQL text that will recreate the objects in a MySQL database and insert the data. The shell script I use backs up three different MySQL databases so the following example shows the same.</p>
<pre class="brush: bash">
#!/bin/sh

dt=`/bin/date +%Y%m%d_%H%M`

/usr/local/bin/mysqldump -hDBHOST1 -uDBUSERNAME1 -pDBPASSWORD1 USERNAME_DBNAME1 &gt; /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME1.sql

/usr/local/bin/mysqldump -hDBHOST2 -uDBUSERNAME2 -pDBPASSWORD2 USERNAME_DBNAME2 &gt; /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME2.sql

/usr/local/bin/mysqldump -hDBHOST3 -uDBUSERNAME3 -pDBPASSWORD3 USERNAME_DBNAME3 &gt; /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME3.sql
</pre>
<p>Substitute these tags in the above example with your database and account details:</p>
<ul>
<li><strong>DBHOST</strong> is the database server, such as db24.pair.com.</li>
<li><strong>DBUSERNAME</strong><em>n</em> is the full access username for the database.</li>
<li><strong>DBPASSWORD</strong><em>n</em> is the password for that database user.</li>
<li><strong>USERNAME_DBNAME</strong><em>n</em> is the full database name that has the account user name as the prefix. </li>
<li><strong>USERNAME</strong> is the Pair Networks account user name.</li>
<li><strong>DBNAME</strong><em>n</em> is the database name without the account user name prefix.</li>
</ul>
<p>Once the script was written and tested manually on the host, I used the ACC (Advanced Features / Manage Cron jobs) to set up a cron job to run the script daily at 4:01 AM.</p>
<h3>Automating Retrieval of the Backup Files</h3>
<p>It was nice having the backups running daily without any further work on my part but, if I wanted a local copy of the backups, I still had to download them manually. Though <a href="http://filezilla-project.org/" target="_blank">FileZilla</a> is easy to use, downloading files via FTP seemed like a prime candidate for automation as well. I turned to Python for that. Actually I turned to an excellent book that has been on my shelf for a few years now, <a href="http://www.amazon.com/gp/product/1590593715?ie=UTF8&#038;tag=bluecog-20&#038;linkCode=as2&#038;camp=1789&#038;creative=9325&#038;creativeASIN=1590593715">Foundations of Python Network Programming</a><img src="http://www.assoc-amazon.com/e/ir?t=bluecog-20&#038;l=as2&#038;o=1&#038;a=1590593715" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /> by John Goerzen. Using the <strong>ftplib</strong> examples in the book as a foundation, I created a Python script named <strong>getdbbak.py</strong> to download the backup files automatically. </p>
<pre class="brush: python">
#!/usr/bin/env python
# getdbbak.py

from ftplib import FTP
from datetime import datetime
from DeleteList import GetDeleteList
import os, sys
import getdbbak_email

logfilename = &#039;getdbbak-log.txt&#039;
msglist = []

def writelog(msg):
    scriptdir = os.path.dirname(sys.argv[0])
    filename = os.path.join(scriptdir, logfilename)
    logfile = open(filename, &#039;a&#039;)
    logfile.writelines(&quot;%s\n&quot; % msg)
    logfile.close()

def say(what):
    print what
    msglist.append(what)
    writelog(what)

def retrieve_db_backups():
    host = sys.argv[1]
    username = sys.argv[2]
    password = sys.argv[3]
    local_backup_dir = sys.argv[4]

    say(&quot;START %s&quot; % datetime.now().strftime(&#039;%Y-%m-%d %H:%M&#039;))
    say(&quot;Connect to %s as %s&quot; % (host, username))

    f = FTP(host)
    f.login(username, password)

    ls = f.nlst(&quot;dbbak_*.sql&quot;)
    ls.sort()
    say(&quot;items = %d&quot; % len(ls))
    for filename in ls:
        local_filename = os.path.join(local_backup_dir, filename)
        if os.path.exists(local_filename):
            say(&quot;(skip) %s&quot; % local_filename)
        else:
            say(&quot;(RETR) %s&quot; % local_filename)
            local_file = open(local_filename, &#039;wb&#039;)
            f.retrbinary(&quot;RETR %s&quot; % filename, local_file.write)
            local_file.close()

    date_pos = 6
    keep_days = 5
    keep_weeks = 6
    keep_months = 4
    del_list = GetDeleteList(ls, date_pos, keep_days, keep_weeks, keep_months)
    if len(del_list) &gt; 0:
        if len(ls) - len(del_list) &gt;= keep_days:
            for del_filename in del_list:
                say(&quot;DELETE %s&quot; % del_filename)
                f.delete(del_filename)
        else:
            say(&quot;WARNING: GetDeleteList failed sanity check. No files deleted.&quot;)

    f.quit()
    say(&quot;FINISH %s&quot; % datetime.now().strftime(&#039;%Y-%m-%d %H:%M&#039;))
    getdbbak_email.SendLogMessage(msglist)

if len(sys.argv) == 5:
    retrieve_db_backups()
else:
    print &#039;USAGE: getdbbak.py Host User Password LocalBackupDirectory&#039;
</pre>
<p>This script runs via cron on a PC running Ubuntu 8.04 LTS that I use as a local file/subversion/trac server. The script does a bit more than just download the files. It deletes older files from the host based on rules for number of days, weeks, and months to keep. It also writes some messages to a log file and sends an email with the current session&#8217;s log entries.</p>
<p>To set up the cron job in Ubuntu I opened a terminal and ran the following command to edit the crontab file:</p>
<p><code>crontab -e</code></p>
<p>The crontab file specifies commands to run automatically at scheduled times. I added an entry to the crontab file that runs a script named <strong>getdbbak.sh</strong> at 6 AM every day. Here is the crontab file:</p>
<pre class="brush: bash">
MAILTO=&quot;&quot; 

# m h dom mon dow command 

0 6 * * * /home/bill/GetDbBak/getdbbak.sh
</pre>
<p>The first line prevents cron from sending an email listing the output of any commands cron runs. The getdbbak.py script will send its own email so I don&#8217;t need one from cron. I can always enable the cron email later if I want to see that output to debug a failure in a script cron runs.</p>
<p>Here is the getdbbak.sh shell script that is executed by cron:</p>
<pre class="brush: bash">
#!/bin/bash 

/home/bill/GetDbBak/getdbbak.py FTP.EXAMPLE.COM USERNAME PASSWORD /mnt/data2/files/Backup/PairNetworksDb
</pre>
<p>This shell script runs the getdbbak.py Python script and passes the FTP login credentials and the destination directory for the backup files as command line arguments. </p>
<p>As I mentioned, the getdbbak.py script deletes older files from the host based on rules. The call to <strong>GetDeleteList</strong> returns a list of files to delete from the host. That function is implemented in a separate module, <strong>DeleteList.py</strong>:</p>
<pre class="brush: python">
#!/usr/bin/env python
# DeleteList.py

from datetime import datetime
import KeepDateList

def GetDateFromFileName(filename, datePos):
    &quot;&quot;&quot;Expects filename to contain a date in the format YYYYMMDD starting
       at position datePos.
    &quot;&quot;&quot;
    try:
        yr = int(filename[datePos : datePos + 4])
        mo = int(filename[datePos + 4 : datePos + 6])
        dy = int(filename[datePos + 6 : datePos + 8])
        dt = datetime(yr, mo, dy)
        return dt
    except:
        return None

def GetDeleteList(fileList, datePos, keepDays, keepWeeks, keepMonths):
    dates = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if dt != None:
            dates.append(dt)
    keep_dates = KeepDateList.GetDatesToKeep(dates, keepDays, keepWeeks, keepMonths)
    del_list = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if (dt != None) and (not dt in keep_dates):
                del_list.append(filename)
    return del_list
</pre>
<p>That module in turn uses the function <strong>GetDatesToKeep</strong> defined in the module <strong>KeepDateList.py</strong> to decide which files to keep on order to maintain the desired days, weeks, and months of backup history. If a file&#8217;s name contains a date that&#8217;s not in the list of dates to keep then it goes in the list of files to delete.</p>
<pre class="brush: python">
#!/usr/bin/env python
# KeepDateList.py

from datetime import datetime

def ListHasOnlyDates(listOfDates):
    dt_type = type(datetime(2009, 11, 10))
    for item in listOfDates:
        if type(item) != dt_type:
            return False
    return True

def GetUniqueSortedDateList(listOfDates):
    if len(listOfDates) &lt; 2:
        return listOfDates
    listOfDates.sort()
    result = [listOfDates[0]]
    last_date = listOfDates[0].date()
    for i in range(1, len(listOfDates)):
        if listOfDates[i].date() != last_date:
            last_date = listOfDates[i].date()
            result.append(listOfDates[i])
    return result

def GetDatesToKeep(listOfDates, daysToKeep, weeksToKeep, monthsToKeep):
    if daysToKeep &lt; 1:
        raise ValueError(&quot;daysToKeep must be greater than zero.&quot;)
    if weeksToKeep &lt; 0:
        raise ValueError(&quot;weeksToKeep must not be less than zero.&quot;)
    if monthsToKeep &lt; 0:
        raise ValueError(&quot;monthsToKeep must not be less than zero.&quot;)

    if not ListHasOnlyDates(listOfDates):
        raise TypeError(&quot;List must only contain items of type &#039;datetime.datetime&#039;.&quot;)

    dates = GetUniqueSortedDateList(listOfDates)    

    tail = len(dates) - 1
    keep = [dates[tail]]
    days_left = daysToKeep - 1
    while (days_left &gt; 0) and (tail &gt; 0):
        tail -= 1
        days_left -= 1
        keep.append(dates[tail])

    year, week_number, weekday = dates[tail].isocalendar()
    weeks_left = weeksToKeep
    while (weeks_left &gt; 0) and (tail &gt; 0):
        tail -= 1
        yr, wn, wd = dates[tail].isocalendar()
        if (wn &lt;&gt; week_number) or (yr &lt;&gt; year):
            weeks_left -= 1
            year, week_number, weekday = dates[tail].isocalendar()
            keep.append(dates[tail])

    month = dates[tail].month
    year = dates[tail].year
    months_left = monthsToKeep
    while (months_left &gt; 0) and (tail &gt; 0):
        tail -= 1
        if (dates[tail].month &lt;&gt; month) or (dates[tail].year &lt;&gt; year):
            months_left -= 1
            month = dates[tail].month
            year = dates[tail].year
            keep.append(dates[tail])

    return keep
</pre>
<p>I also put the function <strong>SendLogMessage</strong> that sends the session log via email in a separate module, <strong>getdbbak_email.py</strong>:</p>
<pre class="brush: python">
#!/usr/bin/env python
# getdbbak_email.py

from email.MIMEText import MIMEText
from email import Utils
import smtplib

def SendLogMessage(msgList):
    from_addr = &#039;atest@bogusoft.com&#039;
    to_addr = &#039;wm.melvin@gmail.com&#039;
    smtp_server = &#039;localhost&#039;

    message = &quot;&quot;
    for s in msgList:
        message += s + &quot;\n&quot;

    msg = MIMEText(message)
    msg[&#039;To&#039;] = to_addr
    msg[&#039;From&#039;] = from_addr
    msg[&#039;Subject&#039;] = &#039;Download results&#039;
    msg[&#039;Date&#039;] = Utils.formatdate(localtime = 1)
    msg[&#039;Message-ID&#039;] = Utils.make_msgid()

    smtp = smtplib.SMTP(smtp_server)
    smtp.sendmail(from_addr, to_addr, msg.as_string())
</pre>
<p>Here is a ZIP file containing the set of Python scripts, including some unit tests (such as they are) for the file deletion logic: <a href="http://www.bogusoft.com/files/public/GetDbBak.zip">GetDbBak.zip</a></p>
<p>I hope this may be useful to others with a similar desire to automate MySQL database backups and FTP transfers who haven&#8217;t come up with their own solution yet. Even if you don&#8217;t use Pair Networks as your hosting provider some of the techniques may still apply. I&#8217;m still learning too so if you find mistakes or come up with improvements to this solution, please let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bluecog.com/blog/2009/11/10/pair-networks-database-backup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
