Starting out with databases:
This section moves on to working with SQL databases (focussing on SQLite3) as well as delving into some data gathering, analysis and visualisation in Python. Why store the data? Well we probably want to build up data over time – maybe it’s coming from reviews of customer activity, or PR hits, or perhaps we’re scraping data from the web and the web crawler is continually replenishing its list of target URLs (hence going crawling some more). Or maybe we’re getting our data from an API which restricts our access on a rate-limiting basis so we can only run x queries today, then have to wait a while before we can make more requests. [Read more…] about Coding 101 (part 14) Relational databases
Coding 101 (part 13) Object-oriented programming
These postings document my responses to and learnings from the Python for Everybody specialisation on Coursera (links below) which I highly recommend to anyone wanting to learn Python programming in a well-structured and fun way. Earlier posts in the series are (bookmarked here).
Data types, objects and methods:
Python has two data structures which we met before – lists and dictionaries – both of which are known in programming terms as objects. Other bits of data we’ve been using – string, integer, float, Boolean – are also objects. Objects are often described as being code + data. They consist of values (the data contained in the object, sometimes also including attributes, the equivalent of a field in a database or column-header in an Excel spreadsheet), together with methods (the code denoting the functions or procedures that can be applied to the object). Different methods may be applied to different object types, for example append is a method applicable to lists, but not to dictionaries. [Read more…] about Coding 101 (part 13) Object-oriented programming
Coding 101 (part 12) Extracting data with JSON
Recap:
As ever, these postings document my responses to and learnings from the much-recommended Python for Everybody specialisation on Coursera (links below) and follow on from earlier posts in the series (bookmarked here).
The previous post was an intro to the XML language used as a data exchange protocol between different applications, especially where data is being read in the form of documents which humans need to read as well as machines. XML is particularly useful where the data structure consists of a branching tree structure (parents and children) and with lots of nested elements.
Introduction to JSON:
Now we go on to look at JSON, the JavaScript Object Notation. JSON is another data exchange protocol that works [Read more…] about Coding 101 (part 12) Extracting data with JSON
Why This, Why Now…
goal = 'career-change' reason = 'https://twitter.com/Qwiery/status/727849124138192896' # PTP: Programming Stream (url = 'http://deborahroberts.info/2016/02/diving-into-data-syllabus-2/') longlist = ['Java', 'Python', 'SQL', 'VBA', 'R'] start_with_end_in_mind = {'datascience': ['R', 'Python'], 'machinelearning': ['Python', 'R']} choice1 = start_with_end_in_mind.get('datascience') choice2 = start_with_end_in_mind.get('machinelearning') shortlist = list() for item in choice2 : if item in choice1 : shortlist.append(item) quit() # Due to unorthodoxy I executed the last line just after line 02
Coding 101 (part 11) XML and data serialisation
Quick recap:
In part 10 of this series I learnt a bit about using both the socket library and the urllib library to browse a web page or some other file on a web server, read it and return it as a text file (including HTML tagging where this was included). I put together two little programs that help me to (a) scrape data or a web page from the ‘net (based on a specified URL) and save it to a text file, and (b) handle the most common HTML tags in that text file. The tags handled so far are as follows:
- <h1>..</h1> tags: cleaned and saved, labelled as ‘Title’;
- <h2>..</h2> tags: cleaned and saved, labelled as ‘Header’;
- <h3>..</h3> to <h6>..</h6> tags: cleaned and saved, labelled as ‘Sub-header’;
- <em>..</em> tags (italics): cleaned and saved, labelled as ‘Para-header’ (Paragraph header);
- <p>..</p> tags: indicate text paragraphs, cleaned and saved only (no additional labels added);
- all other tags: ignored.
[Read more…] about Coding 101 (part 11) XML and data serialisation
Coding 101 (part 10) More on Web Data
Python gets networked:
Analysing data from files we already hold on the hard-drive is great, but so much data’s being created out there on the internet (especially on social media websites) that we can use for a whole variety of purposes – I’m itching to get my hands on some web data to play with. First I need to learn about how web browsers talk to websites – that is, how my query (view this website url, download that document, search for such-and-such a search term) gets communicated across the ‘net, and how it gets translated into an instruction the website at the other end can understand (in whatever server-side language it might be using: PHP, JavaScript, or whatever). [Read more…] about Coding 101 (part 10) More on Web Data
Coding 101 (part 9) Python and the Web
Python and the internet data playground:
The internet is a giant data playground just waiting for us to explore it. This part of the Coursera Course I’m studying (see refs below) covers how to collect data from the web so we can easily record, manipulate and analyse it.
This is where the increasingly-common terms web scraping and parsing come in. (Scraping refers to collecting data from the ‘net, while parsing refers to reading and analysing strings of data/info from the web, just like our previous examples of reading text from files.)
We’ll get to access data using web APIs (Application Programming Interfaces), and learn how to handle data in different technical formats like HTML, XML and JSON. So this is definitely where things will start to get exciting. [Read more…] about Coding 101 (part 9) Python and the Web