coding101

D3.js and Data Visualisation

11/07/2016 By debkr

Data analysis process:
When we encountered the data analysis process earlier in the year, we saw the basic process consists of: gather; clean; analyse (including, checking for accuracy); and finally, visualise/present. We’ve been doing lots of Python programming coupled with creating SQL databases to extract data from some source (web pages, files, XML or JSON files) and sort or store it in a database.

The process we’ve been using during the capstone course – and in line with the original Page/Brin search engine process – is to first collect the raw data and store it – unprocessed – into a holding database. From here we’ve gone on to clean up the data and save it in a more structured way in a new, relational database. This results in a smaller database which is quicker to search and retrieve data from. As I found when writing my own search engine application, these first two databases take a long time to retrieve the data, especially when the search engine’s reach is set widely. [Read more…] about D3.js and Data Visualisation

Simple Search Engine in Python

28/06/2016 By debkr

Part of the Python specialisation capstone (see Refs below) is to recreate a simple web search engine, modelled on the original Google search ranking algorithm (you can read the short version of Page and Brin’s 1998 Stanford paper here). The Google algorithm placed emphasis on information obtained from the HTML “link structure and link text” of all links found in all indexed web pages, and to use this information “for making relevance judgments and quality filtering”.

Google search algorithm:
The basic premise of the algorithm is a probability measure, expressed in laymen’s terms as: “how likely is it that a random surfer would alight on this particular web page if they just randomly surfed through all links on all pages on the web until they got bored and gave up”. The algorithm itself includes a measure of all incoming links to a web page (i.e. the number of “citations or backlinks” to that page), enhanced by the quality-ranking of each of those in-coming citation links. In this way, the search algorithm defines an objective page rank or search ranking for each web page. [Read more…] about Simple Search Engine in Python

Coding 101 (part 15) Databases and beyond

09/06/2016 By debkr

SQL-databases-with-many-to-many-relationships

Getting even more complex – many-to-many relationships:
We’ve learnt that there are several different kinds of relationships in a database (dependent on what data we’re trying to model). When we map out our data model, we should try to capture each the relationships between each of the tables in the model. In database terminology, the nature of each table-to-table relationship is referred to as the cardinality of the relationship.

Previously we looked at databases with one-to-many (and their converse, many-to-one) relationships using a Primary Key as the unique, auto-incrementing ID number in the One-Table (e.g. Recipe Type) and linking this through to the Foreign Key ID number in the Many-Table (e.g. Recipes). [Read more…] about Coding 101 (part 15) Databases and beyond

Python + SQL: example database

05/06/2016 By debkr

XML + SQL + Python:
Here’s a quick example showing how powerful these elements are when we put them together – we can use Python to read data from an XML file, extract data elements we’re interested in, create an SQL database and upload the various data values into the database. We can then query and return various data selects direct from Python (although still have the option to view/query the database through the SQLite web browser as well).

Here I’ve taken the recipe XML data format and saved as a file, which looks like this: [Read more…] about Python + SQL: example database

Coding 101 (part 14) Relational databases

04/06/2016 By debkr

Starting out with databases:
This section moves on to working with SQL databases (focussing on SQLite3) as well as delving into some data gathering, analysis and visualisation in Python. Why store the data? Well we probably want to build up data over time – maybe it’s coming from reviews of customer activity, or PR hits, or perhaps we’re scraping data from the web and the web crawler is continually replenishing its list of target URLs (hence going crawling some more). Or maybe we’re getting our data from an API which restricts our access on a rate-limiting basis so we can only run x queries today, then have to wait a while before we can make more requests. [Read more…]

Coding 101 (part 13) Object-oriented programming

30/05/2016 By debkr

These postings document my responses to and learnings from the Python for Everybody specialisation on Coursera (links below) which I highly recommend to anyone wanting to learn Python programming in a well-structured and fun way. Earlier posts in the series are (bookmarked here).

Data types, objects and methods:
Python has two data structures which we met before – lists and dictionaries – both of which are known in programming terms as objects. Other bits of data we’ve been using – string, integer, float, Boolean – are also objects. Objects are often described as being code + data. They consist of values (the data contained in the object, sometimes also including attributes, the equivalent of a field in a database or column-header in an Excel spreadsheet), together with methods (the code denoting the functions or procedures that can be applied to the object). Different methods may be applied to different object types, for example append is a method applicable to lists, but not to dictionaries. [Read more…] about Coding 101 (part 13) Object-oriented programming

Coding 101 (part 12) Extracting data with JSON

28/05/2016 By debkr

Jump to ‘Going Public’…

Recap:
As ever, these postings document my responses to and learnings from the much-recommended Python for Everybody specialisation on Coursera (links below) and follow on from earlier posts in the series (bookmarked here).

The previous post was an intro to the XML language used as a data exchange protocol between different applications, especially where data is being read in the form of documents which humans need to read as well as machines. XML is particularly useful where the data structure consists of a branching tree structure (parents and children) and with lots of nested elements.

Introduction to JSON:
Now we go on to look at JSON, the JavaScript Object Notation. JSON is another data exchange protocol that works [Read more…] about Coding 101 (part 12) Extracting data with JSON