Blue Cog Blog Rotating Header Image

bash

COHPy Meeting – October 2010

Here are some links from last night’s meeting of the Central Ohio Python Users Group.

Austin Godber talked about virtualenv. Materials from Austin’s presentation are on GitHub.

Eric Floehr, of Intellovations, presented Building a Small Business/Personal Website With Django. He discussed some Pythonic choices for building web sites such as Blogofile for generating sites that are static content, and Plone for enterprise-scale content management. Django falls somewhere in the middle as a good choice for small business or personal blogging sites.

Other links from Eric’s talk:

Also (FWIW), here’s a bit of .bash_history from my following along with part of Eric’s presentation on a VM running Ubuntu 10.10:

sudo apt-get install python-virtualenv python-pip
mkdir dev
cd dev
mkdir oct
cd oct
virtualenv --no-site-packages pyenv
source pyenv/bin/activate
sudo apt-get install mercurial
pip install -e hg+http://bitbucket.org/stephenmcd/mezzanine#egg=mezzanine
mezzanine-project sample
cd sample
python manage.py syncdb
python manage.py runserver
pip install django-debug-toolbar
python manage.py runserver
pip install django-extensions
python manage.py graph_models blog>blog.dot
sudo apt-get install graphviz
dotty blog.dot

I’m not presenting this as a how-to or a tutorial, just some notes. If you don’t know what the above commands will do then I’d recommend not running them.

Tweaking the Bash Prompt

A little Saturday morning tweaking.

Based on this post at railstips.org, I decided to adjust my Bash prompt by appending the following to my ~/.bashrc file:

#...

function parse_git_branch {
  ref=$(git symbolic-ref HEAD 2> /dev/null) || return
  echo "("${ref#refs/heads/}")"
}

BLACK="\[\033[0;30m\]"
BLUE="\[\033[0;34m\]"
VIOLET="\[\033[1;35m\]"
CYAN="\[\033[0;36m\]"

PS1="\n[$CYAN\u@\h:$BLUE\w$VIOLET \$(parse_git_branch)$BLACK]\n\$ "

The prompt will now show the name of the branch I am working in when the current directory is part of a Git repository. The original code used yellow, red, and green to highlight parts of the prompt. That messed with my mind when I ran RSpec and saw yellow and red when I was expecting all green. Rather than get used to it, I changed the colors. I also added some newlines to perhaps keep the command line neater when deep in a directory tree.

Terminal screen shot

 

[Update 2010-07-23]

After running with the above settings for a while I decided I don’t care for the colors in the prompt. Don’t need the square brackets either. I do like seeing the current git branch. That simplifies things a bit.

#...

function parse_git_branch {
  ref=$(git symbolic-ref HEAD 2> /dev/null) || return
  echo "("${ref#refs/heads/}")"
}

PS1="\n\u@\h:\w  \$(parse_git_branch)\n\$ "

 

[Update 2010-09-25]

Okay, maybe a little color…

#...

function parse_git_branch {
  ref=$(git symbolic-ref HEAD 2> /dev/null) || return
  echo "("${ref#refs/heads/}")"
}

VIOLET="\[\033[1;35m\]"
NO_COLOR="\[\033[0;0m\]"  

PS1="\n$VIOLET\u@\h:\w  \$(parse_git_branch)$NO_COLOR\n\$ "

See. I told you it was "tweaking."

Pair Networks Database Backup Automation

I have a couple WordPress blogs, this being one of them, hosted at Pair Networks. I also have another non-blog site that uses a MySQL database. I have been doing backups of the databases manually through Pair’s Account Control Center (ACC) web interface on a somewhat regular basis, but it was bugging me that I hadn’t automated it. I finally got around to doing so.

A search led to this blog post by Brad Trupp. He describes how to set up an automated database backup on a Pair Networks host. I used “technique 2” from his post as the basis for the script I wrote.

Automating the Backup on the Pair Networks Host

First I connected to my assigned server at Pair Networks using SSH (I use PuTTY for that). There was already a directory named backup in my home directory where the backups done through the ACC were written. I decided to use that directory for the scripted backups as well.

In my home directory I created a shell script named dbbak.sh.

touch dbbak.sh

The script should have permissions set to make it private (it will contain database passwords) and executable.

chmod 700 dbbak.sh

I used the nano editor to write the script.

nano -w dbbak.sh

The script stores the current date and time (formatted as YYYYmmdd_HHMM) in a variable and then runs the mysqldump utility that creates the database backups. The resulting backup files are simply SQL text that will recreate the objects in a MySQL database and insert the data. The shell script I use backs up three different MySQL databases so the following example shows the same.

#!/bin/sh

dt=`/bin/date +%Y%m%d_%H%M`

/usr/local/bin/mysqldump -hDBHOST1 -uDBUSERNAME1 -pDBPASSWORD1 USERNAME_DBNAME1 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME1.sql

/usr/local/bin/mysqldump -hDBHOST2 -uDBUSERNAME2 -pDBPASSWORD2 USERNAME_DBNAME2 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME2.sql

/usr/local/bin/mysqldump -hDBHOST3 -uDBUSERNAME3 -pDBPASSWORD3 USERNAME_DBNAME3 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME3.sql

Substitute these tags in the above example with your database and account details:

  • DBHOST is the database server, such as db24.pair.com.
  • DBUSERNAMEn is the full access username for the database.
  • DBPASSWORDn is the password for that database user.
  • USERNAME_DBNAMEn is the full database name that has the account user name as the prefix.
  • USERNAME is the Pair Networks account user name.
  • DBNAMEn is the database name without the account user name prefix.

Once the script was written and tested manually on the host, I used the ACC (Advanced Features / Manage Cron jobs) to set up a cron job to run the script daily at 4:01 AM.

Automating Retrieval of the Backup Files

It was nice having the backups running daily without any further work on my part but, if I wanted a local copy of the backups, I still had to download them manually. Though FileZilla is easy to use, downloading files via FTP seemed like a prime candidate for automation as well. I turned to Python for that. Actually I turned to an excellent book that has been on my shelf for a few years now, Foundations of Python Network Programming by John Goerzen. Using the ftplib examples in the book as a foundation, I created a Python script named getdbbak.py to download the backup files automatically.

#!/usr/bin/env python
# getdbbak.py

from ftplib import FTP
from datetime import datetime
from DeleteList import GetDeleteList
import os, sys
import getdbbak_email

logfilename = 'getdbbak-log.txt'
msglist = []

def writelog(msg):
    scriptdir = os.path.dirname(sys.argv[0])
    filename = os.path.join(scriptdir, logfilename)
    logfile = open(filename, 'a')
    logfile.writelines("%s\n" % msg)
    logfile.close()

def say(what):
    print what
    msglist.append(what)
    writelog(what)

def retrieve_db_backups():
    host = sys.argv[1]
    username = sys.argv[2]
    password = sys.argv[3]
    local_backup_dir = sys.argv[4]
    
    say("START %s" % datetime.now().strftime('%Y-%m-%d %H:%M'))
    say("Connect to %s as %s" % (host, username))

    f = FTP(host)
    f.login(username, password)

    ls = f.nlst("dbbak_*.sql")
    ls.sort()
    say("items = %d" % len(ls))
    for filename in ls:
        local_filename = os.path.join(local_backup_dir, filename)
        if os.path.exists(local_filename):
            say("(skip) %s" % local_filename)
        else:
            say("(RETR) %s" % local_filename)
            local_file = open(local_filename, 'wb')
            f.retrbinary("RETR %s" % filename, local_file.write)
            local_file.close()
            
    date_pos = 6
    keep_days = 5
    keep_weeks = 6
    keep_months = 4    
    del_list = GetDeleteList(ls, date_pos, keep_days, keep_weeks, keep_months)
    if len(del_list) > 0:
        if len(ls) - len(del_list) >= keep_days:
            for del_filename in del_list:
                say("DELETE %s" % del_filename)
                f.delete(del_filename)
        else:
            say("WARNING: GetDeleteList failed sanity check. No files deleted.")
    
    f.quit()
    say("FINISH %s" % datetime.now().strftime('%Y-%m-%d %H:%M'))
    getdbbak_email.SendLogMessage(msglist)


if len(sys.argv) == 5:
    retrieve_db_backups()
else:
    print 'USAGE: getdbbak.py Host User Password LocalBackupDirectory'

This script runs via cron on a PC running Ubuntu 8.04 LTS that I use as a local file/subversion/trac server. The script does a bit more than just download the files. It deletes older files from the host based on rules for number of days, weeks, and months to keep. It also writes some messages to a log file and sends an email with the current session’s log entries.

To set up the cron job in Ubuntu I opened a terminal and ran the following command to edit the crontab file:

crontab -e

The crontab file specifies commands to run automatically at scheduled times. I added an entry to the crontab file that runs a script named getdbbak.sh at 6 AM every day. Here is the crontab file:

 
MAILTO="" 

# m h dom mon dow command 

0 6 * * * /home/bill/GetDbBak/getdbbak.sh 

The first line prevents cron from sending an email listing the output of any commands cron runs. The getdbbak.py script will send its own email so I don’t need one from cron. I can always enable the cron email later if I want to see that output to debug a failure in a script cron runs.

Here is the getdbbak.sh shell script that is executed by cron:

 
#!/bin/bash 

/home/bill/GetDbBak/getdbbak.py FTP.EXAMPLE.COM USERNAME PASSWORD /mnt/data2/files/Backup/PairNetworksDb 

This shell script runs the getdbbak.py Python script and passes the FTP login credentials and the destination directory for the backup files as command line arguments.

As I mentioned, the getdbbak.py script deletes older files from the host based on rules. The call to GetDeleteList returns a list of files to delete from the host. That function is implemented in a separate module, DeleteList.py:

#!/usr/bin/env python
# DeleteList.py

from datetime import datetime
import KeepDateList


def GetDateFromFileName(filename, datePos):
    """Expects filename to contain a date in the format YYYYMMDD starting 
       at position datePos.
    """   
    try:
        yr = int(filename[datePos : datePos + 4])
        mo = int(filename[datePos + 4 : datePos + 6])
        dy = int(filename[datePos + 6 : datePos + 8])
        dt = datetime(yr, mo, dy)
        return dt
    except:
        return None
 

def GetDeleteList(fileList, datePos, keepDays, keepWeeks, keepMonths):
    dates = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if dt != None:
            dates.append(dt)
    keep_dates = KeepDateList.GetDatesToKeep(dates, keepDays, keepWeeks, keepMonths)        
    del_list = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if (dt != None) and (not dt in keep_dates):
                del_list.append(filename)    
    return del_list

That module in turn uses the function GetDatesToKeep defined in the module KeepDateList.py to decide which files to keep on order to maintain the desired days, weeks, and months of backup history. If a file’s name contains a date that’s not in the list of dates to keep then it goes in the list of files to delete.

#!/usr/bin/env python
# KeepDateList.py

from datetime import datetime


def ListHasOnlyDates(listOfDates):
    dt_type = type(datetime(2009, 11, 10))
    for item in listOfDates:
        if type(item) != dt_type:
            return False
    return True
    

def GetUniqueSortedDateList(listOfDates):
    if len(listOfDates) < 2:
        return listOfDates
    listOfDates.sort()
    result = &#91;listOfDates&#91;0&#93;&#93;
    last_date = listOfDates&#91;0&#93;.date()
    for i in range(1, len(listOfDates)):
        if listOfDates&#91;i&#93;.date() != last_date:
            last_date = listOfDates&#91;i&#93;.date()
            result.append(listOfDates&#91;i&#93;)
    return result
    
    
def GetDatesToKeep(listOfDates, daysToKeep, weeksToKeep, monthsToKeep):
    if daysToKeep < 1:
        raise ValueError("daysToKeep must be greater than zero.")
    if weeksToKeep < 0:
        raise ValueError("weeksToKeep must not be less than zero.")
    if monthsToKeep < 0:
        raise ValueError("monthsToKeep must not be less than zero.")

    if not ListHasOnlyDates(listOfDates):
        raise TypeError("List must only contain items of type 'datetime.datetime'.")
        
    dates = GetUniqueSortedDateList(listOfDates)    
    
    tail = len(dates) - 1
    keep = &#91;dates&#91;tail&#93;&#93;
    days_left = daysToKeep - 1
    while (days_left > 0) and (tail > 0):
        tail -= 1
        days_left -= 1
        keep.append(dates[tail])
        
    year, week_number, weekday = dates[tail].isocalendar()
    weeks_left = weeksToKeep
    while (weeks_left > 0) and (tail > 0):
        tail -= 1
        yr, wn, wd = dates[tail].isocalendar()
        if (wn <> week_number) or (yr <> year):
            weeks_left -= 1
            year, week_number, weekday = dates[tail].isocalendar()
            keep.append(dates[tail])
        
    month = dates[tail].month
    year = dates[tail].year
    months_left = monthsToKeep
    while (months_left > 0) and (tail > 0):
        tail -= 1
        if (dates[tail].month <> month) or (dates[tail].year <> year):
            months_left -= 1
            month = dates[tail].month
            year = dates[tail].year
            keep.append(dates[tail])
        
    return keep

I also put the function SendLogMessage that sends the session log via email in a separate module, getdbbak_email.py:

#!/usr/bin/env python
# getdbbak_email.py

from email.MIMEText import MIMEText
from email import Utils
import smtplib

def SendLogMessage(msgList):
    from_addr = 'atest@bogusoft.com'
    to_addr = 'wm.melvin@gmail.com'
    smtp_server = 'localhost'
    
    message = ""
    for s in msgList:
        message += s + "\n"

    msg = MIMEText(message)
    msg['To'] = to_addr 
    msg['From'] = from_addr 
    msg['Subject'] = 'Download results'
    msg['Date'] = Utils.formatdate(localtime = 1)
    msg['Message-ID'] = Utils.make_msgid()

    smtp = smtplib.SMTP(smtp_server)
    smtp.sendmail(from_addr, to_addr, msg.as_string())

Here is a ZIP file containing the set of Python scripts, including some unit tests (such as they are) for the file deletion logic: GetDbBak.zip

I hope this may be useful to others with a similar desire to automate MySQL database backups and FTP transfers who haven’t come up with their own solution yet. Even if you don’t use Pair Networks as your hosting provider some of the techniques may still apply. I’m still learning too so if you find mistakes or come up with improvements to this solution, please let me know.

How I Split Podcast Files

Update 2011-01-18: The Sansa m250 player finally died, and I now have a newer mp3 player that fast-forwards nicely, so I no longer do this goofy podcast splitting stuff.

Note: This is a “How-I” (works for me) not a “How-to” (do as I say) post.

I do goofy stuff sometimes. For example, I use Linux to download a couple podcasts targeted to Microsoft Windows developers. Specifically, I use µTorrent (that’s the “Micro” symbol so the name is pronounced “MicroTorrent”), a Windows BitTorrent client, running in Wine on Ubuntu to download the .NetRocks and Hanselminutes podcasts. I’ve had no problems running µTorrent in Wine. I got started doing this because my mp3 player was awkward to work with in Windows XP.

When I connected my Sansa m250 mp3 player to a Windows XP box, the driver software XP loaded wanted me to interact with the mp3 player as a media device. It has been a while, and I can’t recall exactly what it did, but I do recall it wanted me to use a media library application (one that would probably try to enforce DRM restrictions) and did not give me direct access to the file system on the player. There is probably a way around that, but I didn’t find it quickly at the time. What I did find was that when I connected the mp3 player to my old PC running Ubuntu it detected it and mounted it as a file system device that I could happily copy mp3 files to as I pleased. Good enough for me.

At first I was using the Azureus BitTorrent client, which is a Java app and runs on Ubuntu, to download the podcasts (and an occasional distro to play with). That application seemed to get more bloated with each release. It started displaying a bunch of flashy stuff and promoting things that you probably shouldn’t be downloading (but it’s okay if you don’t believe in copyright). I read about µTorrent and tried it on a Windows XP PC. It’s a lightweight program that does BitTorrent well without promoting piracy (personally, I do think copyright, with limits, is a good thing). While this worked well for downloading, I didn’t like the extra step of copying files from the PC running Windows to the other running Ubuntu to load them onto my mp3 player. After reading a timely article about Wine (the source of the article escapes me now), I decided to try running µTorrent using Wine. I don’t recall having any problems setting it up, it just worked. I did have to fiddle with my router to set up port forwarding but that’s not related to Wine or Ubuntu, just something you may have to do for BitTorrent to work.

This method of downloading the podcasts works well, but that’s not the end of the story. Occasionally I would be part way through a podcast and, for some reason (maybe I was trying to rewind a little bit within the file but my finger slipped and it went back to the beginning of the file), I would have to fast-forward to where I left off. Hour-long podcasts in a single mp3 file are not easy to fast forward with the Sansa player I have. It doesn’t forward faster the longer you hold the button like some devices do, it just goes at the same (painfully slow for a large file) pace. It seemed like splitting the mp3 files into sections would make that sort of thing easier. Bet there’s an app for that.

A search of the Ubuntu application repository turned up mp3splt. It has a GUI but I only wanted the command line executable which is available in the repository and can be installed from the command line (note that there’s no “i” in mp3splt):

sudo apt-get install mp3splt

After a couple trips to the man page to sort out which command line arguments to use, I had it splitting big mp3 files into sections in smaller mp3 files. That worked for splitting the files but I found that the player didn’t put those files in order when playing back. That’s not acceptable. I probably could just make a playlist file and use that to get the sections to play in order. I wondered if setting the ID3 tags in a way that numbered the tracks would make the player play them in order. Turns out it would. A search for “ID3” in the Ubuntu repository led to id3tool, a simple utility for editing the ID3 tags in mp3 files. I installed it too:

sudo apt-get install id3tool

I wrote a shell script named podsplit.sh to put this splitting apart all together. I use a specific directory to hold the mp3 files I want to split (but I’ll call it a “folder” since that’s the GUI metaphor, and I use the GNOME GUI to move the files around). I manually copy the downloaded mp3 files into the 2Split folder and then open a terminal and run the script. The script creates a sub-folder for each mp3 file that is split. When the script is finished I copy the sub-folders containing the resulting smaller mp3 files to the Sansa mp3 player.

Here’s the shell script:

#!/bin/bash

#------------------------------------------------------------
# podsplit.sh
#
# by Bill Melvin (bogusoft.com)
#
# BASH script for splitting mp3 podcasts into smaller pieces.
# I want to do this because it takes "forever" to fast-
# forward or rewind in a huge mp3 on my Sansa player.
#
# This script requires mp3splt and id3tool.
#
# This script, being a personal-use one-off utility, also 
# assumes some things:
# 1. mp3 files to be split are placed in ~/2Split
# 2. The file names are in the format showname_0001.mp3
#    or showname_0001_morestuff.mp3 where 0001 is the 
#    episode number.
# 
# I'm no nix wiz and I don't write many shell scripts so 
# this script also echoes a bunch of stuff so I can see 
# what's going on. 
#
#------------------------------------------------------------
# [2009-01-18] First version. 
#
# [2009-01-24] Use abbreviated show name for Artist.
#
# [2009-02-12] Changed split time from 3.0 to 5.0.   
#
# [2009-02-16] Use track number instead of end-time in track 
# title.
#
# [2009-02-19] Redirect some output to log file.
#------------------------------------------------------------

split_home=~/2Split
logfn="${split_home}/podsplit-log.txt"

ChangeID3() {
  filepath=$1
  filename=$2

  # Get track number from ID3.
  temp=`id3tool "$filepath" | grep Track: | cut -c9-`
  
  # Zero-pad to length of 3 characters.
  track=`printf "%03d" $temp`
    
  # Extract the name of the show and the episode number from 
  # the file name. This only works if the file naming follows 
  # the convention showname_0001_morestuff.mp3 where 0001 
  # is the episode number. The file name is split into fields 
  # delimited by the underscore character.
  show=`echo $filename | cut -d'_' -f1`
  episode=`echo $filename | cut -d'_' -f2`
  abbr="${show:0:6}"
  album="${abbr}_${episode}"
  title="${abbr}_${episode}_${track}"

  echo "ChangeID3"
  echo "filepath = $filepath" >> $logfn
  echo "filename = $filename" >> $logfn
  echo "show = $show" >> $logfn
  echo "abbr = $abbr" >> $logfn
  echo "episode = $episode" >> $logfn
  echo "album = $album" >> $logfn
  echo "title = $title" >> $logfn
  echo "track = $track" >> $logfn
  echo "BEFORE" >> $logfn
  id3tool "$filepath" >> $logfn
  
  id3tool --set-album="$album" --set-artist="$abbr" --set-title="$title" "$1"
  
  echo "AFTER" >> $logfn
  id3tool "$filepath" >> $logfn
}

SplitMP3() {  
  echo "SplitMP3"
  name1=$1
  echo "name1 = $name1"
  
  # Get file name and extension without directory path.
  name2=${name1#$split_home/}
  echo "name2 = $name2"
  
  # Get just the file name without the extension.
  name3=${name2%.mp3}
  echo "name3 = $name3"

  outdir=$split_home/$name3.split
  echo "Create $outdir"
  mkdir "$outdir"

  mp3splt -a -t 5.0 -d "$outdir" -o @t_@n $1

  for MP3 in $outdir/*.mp3
  do
    ChangeID3 "$MP3" "$name3"
  done   
}

for FN in $split_home/*.mp3
do
  SplitMP3 "$FN"
done

echo "Done."

This is not a flexible script as my folder for splitting files is hard-coded and it assumes a file naming convention for the mp3 files being split. If you’re an experienced shell scripter I’m sure you can do better. I still consider myself a Linux “noob” (and offer proof as well), intermediate in some areas at best. I am posting this because someone else may be trying to solve a similar problem and this can serve as an example of what worked for one person, in one situation, to work around the limitations of one particular mp3 player. Someone less goofy would probably just buy an iPod and use iTunes to handle the podcast files.