XML + SQL + Python:
Here’s a quick example showing how powerful these elements are when we put them together – we can use Python to read data from an XML file, extract data elements we’re interested in, create an SQL database and upload the various data values into the database. We can then query and return various data selects direct from Python (although still have the option to view/query the database through the SQLite web browser as well).
Here I’ve taken the recipe XML data format and saved as a file, which looks like this:
Here’s the example Python code which: (1) creates the recipe database following exactly the same schema as I used in the previous post; (2) opens and reads the XML file; (3) extracts the relevant data values such as title, summary (description) and iterates through the lines of instructions to extract the instruction steps; (4) inserts the various data values into Table: Recipe in the newly-created database; (5) iterates through all the recipe ingredients, and each time inserting the ingredient into Table: Ingr, then inserting the ingredient id value into the Table: RecIngr along with the other recipe ingredient attributes (amount, unit); (6) runs a select query using JOIN .. ON to return all the ingredients for the recipe and list them out.
# import libraries
import xml.etree.ElementTree as ET
import sqlite3
# create database file and connection
conn = sqlite3.connect('recipe1.sqlite')
curs = conn.cursor()
# drop database tables, OPTIONAL (useful while building/testing)
curs.executescript('''
DROP TABLE IF EXISTS Ingr;
DROP TABLE IF EXISTS RecIngr;
DROP TABLE IF EXISTS Recipe;
DROP TABLE IF EXISTS Type
''')
# create database tables/fields if not already created
curs.executescript('''
CREATE TABLE IF NOT EXISTS Type (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
type TEXT UNIQUE,
notes TEXT
);
CREATE TABLE IF NOT EXISTS Ingr (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
ingr TEXT UNIQUE,
notes TEXT
);
CREATE TABLE IF NOT EXISTS RecIngr (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
recipe_id INTEGER,
ingr_id INTEGER,
amnt TEXT, unit TEXT, notes TEXT
);
CREATE TABLE IF NOT EXISTS Recipe (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
title TEXT UNIQUE,
type_id INTEGER,
descr TEXT,
prt INTEGER, ckt INTEGER, serv INTEGER,
prins TEXT, ckins TEXT, notes TEXT
);
''')
# open and read XML file
fname = raw_input('Enter file name: ')
if ( len(fname) < 1 ) : fname = 'recipe.xml'
print 'Reading XML from:', fname, '\n'
root = ET.parse(fname)
# extract title and summary (description)
title = root.find('title').text
descr = root.find('summary').text
# iterate through instructions and extract steps
for instr in root.findall('instructions'):
if instr.get('type') == 'preparation' :
prins = ''; n = 1
for step in instr :
prins = prins + '\n' + str(n) + '. ' + step.text
n = n + 1
elif instr.get('type') == 'cooking' :
ckins = ''; n = 1
for step in instr :
ckins = ckins + '\n' + str(n) + '. ' + step.text
n = n + 1
# insert title, descr, prins, ckins into Table: Recipe
curs.execute('''INSERT OR IGNORE INTO Recipe (title, descr, prins, ckins)
VALUES ( ?, ?, ?, ? )''', ( title, descr, prins, ckins ) )
curs.execute('SELECT id FROM Recipe WHERE title = ? ', (title, ))
recipe_id = curs.fetchone()[0]
print '\n\nADDED Recipe ID (', title, '):', recipe_id
# iterate through ingredients; insert ingr into Table: Ingr;
# insert recipe_id, ingr_id, amnt, unit into Table: RecIngr
inglist = root.findall('ingredient')
print '\n\nNumber of ingredients:', len(inglist)
for item in inglist :
ingr = item.text
amnt = item.get('amount')
unit = item.get('unit')
curs.execute('''INSERT OR IGNORE INTO Ingr (ingr)
VALUES ( ? )''', ( ingr, ) )
curs.execute('SELECT id FROM Ingr WHERE ingr = ? ', (ingr, ))
ingr_id = curs.fetchone()[0]
curs.execute('''INSERT OR IGNORE INTO RecIngr (recipe_id, ingr_id, amnt, unit)
VALUES ( ?, ?, ?, ? )''', ( recipe_id, ingr_id, amnt, unit ) )
conn.commit()
# run a select query on resulting database (ingredients list)
print '\nINGREDIENTS:'
curs.execute('''
SELECT Recipe.title, RecIngr.amnt, RecIngr.unit, Ingr.ingr
FROM RecIngr JOIN Ingr JOIN Recipe
ON RecIngr.recipe_id = Recipe.id and RecIngr.ingr_id = Ingr.id
ORDER BY Recipe.title LIMIT 10
''')
select = curs.fetchall()
print '\n', select[0][0]
print '=' * len(select[0][0])
for item in select :
print item[1], item[2], item[3]
A few notes on the above:
- We don’t need to drop pre-existing tables at the start of the program (especially if we use CREATE TABLE IF NOT EXISTS when creating a new table), but dropping tables at the start is useful when we’re in a building and testing environment (although does make the program run a little slower);
- Setting the Logical Key field to UNIQUE ensures that a value can only be entered into it once, avoiding duplicate entries;
- You can choose to commit changes to the database at stages through the program, or once at the end. The command to use is: conn.commit();
- Depending on how the XML is formatted, i.e. how the data appears within it, will affect what code is required to extract the relevant data. This code was developed using trial and error. There are some deficiencies within it – I currently can’t easily work out how to access the prep_time, cook_time and servings attributes within the <recipe>..</recipe> tag. I’ll either need to go away and read some documentation, or – perhaps a better option – is to redesign the XML format to make for easier parsing in Python;
- Recipe instruction steps are formatted as lines within the XML file; this code translates them into one string, but prefixed with the step number and each separated by a new line.;
- The section “# insert title, descr, prins, ckins into Table: Recipe” inserts various data values into their correct fields in the Recipe table, using the same SQL syntax met saw in the previous post. (As an error-handler, and to avoid duplications, the exact command used here only inserts if the record does not already exist.) Then it fetches back the integer value of the Primary Key id field in Recipe table and stores it in variable recipe_id;
- The recipe_id value is now available to be inserted into the Foreign Key recipe_id in the RecIngr table in the following section “# iterate through ingredients; insert ingr into Table: Ingr; # insert recipe_id, ingr_id, amnt, unit into Table: RecIngr”.
The above example is a very simple one which demonstrates a few of the ideas; but currently only runs for one recipe (since that’s what I have in the source XML file). For uploading more than one recipe, an additional loop would need to be added to the program, to loop through all recipes in the file and deal with each one in turn (as in the example above).
So there’s more work still to do here. Also, the XML format needs to be looked at since that can certainly be improved on too. But, as is my way, I’ll have to put this project to aside again for the time being.
Too many projects; too little time…