• Skip to primary navigation
  • Skip to main content

DebKR

To the Stars

  • About
  • Blog
  • Contact

Web Data

Data Literacy Skills

Data Literacy Skills

01/04/2017 By debkr

I was excited to find the following Data Literacy Skills self-assessment framework, for use as part of a continuing professional development programme, available via the Open University’s library services (OU login required). This URL links to the Masters level skillset; other lower levels are also available on the same page. Examples and links to further resources are also given.

While this framework is geared towards data literacy for research students, it’s still highly relevant and can be mapped over to finance- or IT-based roles. It’s a shame it appeared too late for inclusion in my last assignment (on Data Literacy for Accounting Professionals) but it will prove useful for projects which will come out of that piece of assessed work.

The data literacy link was buried within the continuing personal and professional development section of the post-grad preparatory material (MD500, OU login required). This free module has lots of really important study skills, research skills, advice on becoming an active learner and longer term careers resources too. (Yet more things for me to go through when I find a spare block of time!)

Filed Under: Blog, Data Analytics, Finance & Economics, Web Data

D3.js and Data Visualisation

D3.js and Data Visualisation

11/07/2016 By debkr

data-visualisationData analysis process:
When we encountered the data analysis process earlier in the year, we saw the basic process consists of: gather; clean; analyse (including, checking for accuracy); and finally, visualise/present. We’ve been doing lots of Python programming coupled with creating SQL databases to extract data from some source (web pages, files, XML or JSON files) and sort or store it in a database.

The process we’ve been using during the capstone course – and in line with the original Page/Brin search engine process – is to first collect the raw data and store it – unprocessed – into a holding database. From here we’ve gone on to clean up the data and save it in a more structured way in a new, relational database. This results in a smaller database which is quicker to search and retrieve data from. As I found when writing my own search engine application, these first two databases take a long time to retrieve the data, especially when the search engine’s reach is set widely. [Read more…] about D3.js and Data Visualisation

Filed Under: Blog, Data Analytics, Data Analytics Projects, Personalised Training Plan, Programming, Programming Projects, Web Data Tagged With: coding101

Simple Search Engine in Python

Simple Search Engine in Python

28/06/2016 By debkr

simple-search-engine-in-pythonPart of the Python specialisation capstone (see Refs below) is to recreate a simple web search engine, modelled on the original Google search ranking algorithm (you can read the short version of Page and Brin’s 1998 Stanford paper here). The Google algorithm placed emphasis on information obtained from the HTML “link structure and link text” of all links found in all indexed web pages, and to use this information “for making relevance judgments and quality filtering”.

Google search algorithm:
The basic premise of the algorithm is a probability measure, expressed in laymen’s terms as: “how likely is it that a random surfer would alight on this particular web page if they just randomly surfed through all links on all pages on the web until they got bored and gave up”. The algorithm itself includes a measure of all incoming links to a web page (i.e. the number of “citations or backlinks” to that page), enhanced by the quality-ranking of each of those in-coming citation links. In this way, the search algorithm defines an objective page rank or search ranking for each web page. [Read more…] about Simple Search Engine in Python

Filed Under: Blog, Personalised Training Plan, Programming, Programming Projects, Web Data Tagged With: coding101, programming, projects

Coding 101 (part 11) XML and data serialisation

Coding 101 (part 11) XML and data serialisation

22/05/2016 By debkr

xml-and-data-serialisationQuick recap:
In part 10 of this series I learnt a bit about using both the socket library and the urllib library to browse a web page or some other file on a web server, read it and return it as a text file (including HTML tagging where this was included). I put together two little programs that help me to (a) scrape data or a web page from the ‘net (based on a specified URL) and save it to a text file, and (b) handle the most common HTML tags in that text file. The tags handled so far are as follows:
  • <h1>..</h1> tags: cleaned and saved, labelled as ‘Title’;
  • <h2>..</h2> tags: cleaned and saved, labelled as ‘Header’;
  • <h3>..</h3> to <h6>..</h6> tags: cleaned and saved, labelled as ‘Sub-header’;
  • <em>..</em> tags (italics): cleaned and saved, labelled as ‘Para-header’ (Paragraph header);
  • <p>..</p> tags: indicate text paragraphs, cleaned and saved only (no additional labels added);
  • all other tags: ignored.

[Read more…] about Coding 101 (part 11) XML and data serialisation

Filed Under: Blog, Personalised Training Plan, Programming, Web Data Tagged With: coding101

Coding 101 (part 10) More on Web Data

Coding 101 (part 10) More on Web Data

09/05/2016 By debkr

accessing-web-dataPython gets networked:
Analysing data from files we already hold on the hard-drive is great, but so much data’s being created out there on the internet (especially on social media websites) that we can use for a whole variety of purposes – I’m itching to get my hands on some web data to play with. First I need to learn about how web browsers talk to websites – that is, how my query (view this website url, download that document, search for such-and-such a search term) gets communicated across the ‘net, and how it gets translated into an instruction the website at the other end can understand (in whatever server-side language it might be using: PHP, JavaScript, or whatever). [Read more…] about Coding 101 (part 10) More on Web Data

Filed Under: Blog, Personalised Training Plan, Programming, Web Data Tagged With: coding101

Coding 101 (part 9) Python and the Web

Coding 101 (part 9) Python and the Web

07/05/2016 By debkr

data-playgroundPython and the internet data playground:
The internet is a giant data playground just waiting for us to explore it. This part of the Coursera Course I’m studying (see refs below) covers how to collect data from the web so we can easily record, manipulate and analyse it.

This is where the increasingly-common terms web scraping and parsing come in. (Scraping refers to collecting data from the ‘net, while parsing refers to reading and analysing strings of data/info from the web, just like our previous examples of reading text from files.)

We’ll get to access data using web APIs (Application Programming Interfaces), and learn how to handle data in different technical formats like HTML, XML and JSON. So this is definitely where things will start to get exciting. [Read more…] about Coding 101 (part 9) Python and the Web

Filed Under: Blog, Personalised Training Plan, Programming, Web Data Tagged With: coding101

Copyright © 2016–2025 · Powered by WordPress On Genesis Framework · Log in

  • Writing
  • Developing
  • Consulting