An Unstructured Glossary (work in progress)

Object-oriented design (software engineering principle)
Decomposition (software engineering principle)
Encapsulation (software engineering principle)
Abstraction (software engineering principle)
Testing (software engineering principle)
Strings
Cyphers
Arrays
Random numbers
Random walk
Variables
Values
Types
Problem-solving
Algorithms
Control statements
Boolean expressions
Arithmetic expressions
Data structure
Programming
Control structures
Java
Classes
Objects
Class hierarchy
Parameters
Graphics
Event-driven programming
Character data
String manipulation
Problem-solving with strings
Objects and memory
Parameter passing
File processing
Exception handling
Debugging
Java collections framework
The HashMap class
Iterators
Object-oriented design
Swing interactors
JComponent hierarchy
Graphical user interface
Interactors
Large-scale data structures
Data-driven programs
Sorting
Searching
Algorithmic analysis
Programming in the large
Software engineering
Programming patterns
Variable / Variable name / Identifier
Operator
Constant
Reserved words
Syntax / valid syntax patterns
Line
Assignment
Assignment with Expression
Statement
Script
Program
Program steps / Program flow (path)
Step or Group of steps
Sequential steps / Sequential execution
Conditional steps / Conditional execution
Conditional (or Comparison) operators
Assignment operator
Boolean expressions
Boolean operators

Iterative steps
Loops
Iteration variables
Inputs
Outputs
Syntax Error = a mistake in the program’s language or grammar
Logic Error = an error in the way a statement is written (usually the order of the words used)
Semantic Error = a statement that’s written correctly so executes, but doesn’t perform the task as expected/designed to do

Data science
Data analytics
Big data
Raw data
Tidy data (processed data)
Web scraping
Cleaning data
Merging data
Reshaping data
Summarising data
Find/replace

Exploratory data analysis
– Structure
– Organisation
Analytic graphics
– Comparisons
– Causality
– Multivariate data
– Multiple modes of evidence
Exploratory graphs
– Scatter plots
Hierarchical clustering
K-means clustering
Dimension reduction
Evidence-based data analysis
Reproducible research

Descriptive analysis
Exploratory analysis
Correlation
Confounding
Confounder
Temporal antecedence
Inferential analysis
Extrapolation
Interpolation
Population
Sub-set
Sample
Data set
Sampling methodology
Statistical significance
Predictive analysis
Data
Objects
Values
Prediction modelling
Causal analysis
Randomisation
Control
Control trials
Mechanistic analysis

Data
Data point
Data set
Population
Variable
Discreet
Continuous
Dependent
Independent
Qualitative
Quantitative
Ordered
Unordered
Raw data
File format
FASTQ file format
Next generation sequencing (machine)
Text file
Comma-delimited
Tab-delimited
OLAP cube
Parsing a file
API
Structured data
Unstructured data
Records
Fields
Tables
Databases
Data warehouses
Data mining
Web scraping
Visual data (images, videos)
Audio data (speech, music)

Probability
Probability argument

Information retrieval
Internet of Things
Hardware systems
Software defined networks
Secure distributed systems

Database research
Network analysis
Graph mining

Artificial intelligence (AI)
Computational biology (areas in AI)
Computer vision (areas in AI)
Deep learning (areas in AI)
Natural language processing (areas in AI)
Robotics (areas in AI)

Computer algorithms
Complexity theory
Cryptography
Algorithmic game theory
Geometric algorithms
Algorithmic techniques for internet applications
Combinatorial, logical, and algebraic mathematics

.R
markdown = a text file formatted to allow reading by programming languages (e.g. R) and platforms (e.g. GitHub)
.md = extension for a markdown text file
Rmarkdown
.Rmd = extension for an Rmarkdown file
LaTeX
RPubs
knit
R = a free/open-source programming language used extensively for statistical analysis and graphical output
RStudio = a free/open-source integrated development environment (IDE) for R including console and syntax-highlighting editor supporting direct code execution
IDE = integrated development environment
Git (http://git-scm.com) = a free/open-source version control system available for Windows; most popular version control system in use; operated from the Command Line; all versions stored on local repositories on your computer
repo’s = local repositories
version control system = a system that records changes to a file or folder over time, allowing you to easily access previous versions; very useful in data science where you want to have ready access to various different versions of a program or script as you test and refine it (e.g. for easy recovery/reinstatement of previous versions) or where working collaboratively with others
Git bash = Windows Command Line Interface for interacting with Git
Github
Data types
Subsetting
Reading data
writing data
Control structures
Functions
Arguments
Scoping
Vectorised operations
Debugging
Simulation
Optimisation
R plotting systems (base, lattice, ggplot2)
R scripts
R packages = code that a developer has built and shared for others to use, for standard or specific problems
devtools
roxygen
testthat
rCharts = used for making interactive graphics/charts for R
Slidify = used for making interactive web apps for R (allows interaction with data and analysis)
Shiny = used for making interactive web apps for R (allows interaction with data and analysis)
cran.r = comprehensive R network (URL: cran.r-project.org)

Excel
XML
JSON
MySQL
HDF5

Statistical inference
Probability
Probabilistic models
Likelihood
Distributions
Normal/Guassian distribution
Continuous distribution
Density
Asymptotics
Confidence intervals
Hypothesis testing
Power
Bootstrapping
Non-parameter testing
Permutation testing
Bayesian statistics

Statistical regression
Linear regression (simple linear regression, SLR)
Multiple/multivariable regression
Confounding
Residuals
Diagnostics
Linear predictive models
Model misspecification
Scatterplot smoothing
Scatterplot splines
Resampling inference
Weighted regression
Mixed models / random intercepts
“Regression to the mean”
Estimation tools
Least squares
Dependent variable
Independent variable
Sensitivity
Specificity
Positive = identified
Negative = rejected
True positives/negatives
False positive/negatives
Correlated predictors
Classifiers
Classification functions
Iteration
Weighting

Predictive analytics
– Supervised predictive
– Unsupervised predictive
Prescriptive analytics

Machine learning
Predictive machine learning
Error types
Cross validation
Caret
Predictive plotting
Pre-processing
Regression (prediction with)
Trees (prediction with)
Boosting
Bagging
Model blending
Forecasting

Hardware architecture
Central processing unit (CPU) = very fast, runs the program (using data from RAM)
Memory
Random access memory (RAM) = fast, temporary memory storage, requires power source. contains the data to allow the program to run
Motherboard
Chip
Input devices (e.g. keyboard, mouse, touchscreen, CD-Rom, CD/DVD-reader)
Output devices (e.g. monitor, printer, speakers, CD/DVD-writer)
USB
HDMI
Ethernet
SOC = System on a chip
Secondary memory = slower, permanent memory storage, does not require power source, maintains through e.g. magnetic
– Fixed: hard disk / disk drive
– Removable: SD cards, flash drives

Software
Machine language / machine code
Binary code
Programming language / source code

Command prompt
Command line interface (CLI)
Directories
Sub-directories
Root directory (indicated by slash: /) = very top of tree structure, contains all other directories
Home directory (indicated by tilde: ~) = where you go when you log in, contains all your personal folders and files; within Windows this is usually indicated by your user/login name
Working directory = your currently active directory; e.g. in Git bash CLI, opens up in Home directory and is also therefore the Working directory; when you go up or down a level into a new directory, this becomes the new Working directory
Path / directory path / file path = the full address of a given directory or file location including all directories, sub-directories etc.
pwd (print working directory) = in Git bash / CLI, returns path of current working directory
command = e.g. CLI the command you wan to run
flags = parameters to be passed to the command to specify certain behaviour
arguments = either other options for the command, or what the command needs to modify (e.g. the file to look at)
workspace = place on computer where you’re currently working
index (Git) = which file Git should be controlling under version control
local repository (local repo) = files stored/version controlled on your local computer
remote repository (remote repo) = files stored/version controlled oremotely (e.g. on GitHub)
adding (Git) = adding a newly-created file to the Git index so it will be monitored and all changes kept up-to-date
committing (Git) = committing the file to local repo for storage and version controltracking
push (Git) = updating latest version/changes from local repo to remote repo
fetch (Git) = updating latest version/changes from remote repo to local repo