Data analysis process:
When we encountered the data analysis process earlier in the year, we saw the basic process consists of: gather; clean; analyse (including, checking for accuracy); and finally, visualise/present. We’ve been doing lots of Python programming coupled with creating SQL databases to extract data from some source (web pages, files, XML or JSON files) and sort or store it in a database.
The process we’ve been using during the capstone course – and in line with the original Page/Brin search engine process – is to first collect the raw data and store it – unprocessed – into a holding database. From here we’ve gone on to clean up the data and save it in a more structured way in a new, relational database. This results in a smaller database which is quicker to search and retrieve data from. As I found when writing my own search engine application, these first two databases take a long time to retrieve the data, especially when the search engine’s reach is set widely. [Read more…] about D3.js and Data Visualisation