Big Data is becoming a fashionable phrase in the world of IT. But what does it mean, and is it useful?
For some people, Big Data is more data than can be processed on a single machine. For others it is the combination of data from more than 1 source.
bestCourse4Me.com is an example of the 2nd type of Big Data, it combines data from HESA, the official agency holding data on all university entrants – their entry qualifications and outcomes on leaving university – with the ONS Quarterly Labour Force Survey to allow 6th Form pupils to make informed decisions about University courses.
This is the first time these data sources had been combined. One question that had to be answered was whether the combination was valid. Fortunately the 2 organisations did use the same standards to classify degree subjects and to classify the occupations taken in later life. A little work had to be carried out in aggregating the data from the 2 sets to the same levels, and in matching different versions of the standards, but then I tried to find some areas where the data sets met, to see whether they produced comparable results.
I selected employed graduates aged 21-23 from the ONS surveys over a 3 year period, and selected only employed graduates in the same range from the HESA leavers data, then grouped the data by degree subject and career taken, and compared the salaries in the 2 data sets. This showed that the data sets were not exactly matched, but were close enough to accept the data combination.
This data combination then seemed clean enough to power the website, and give students and their advisers reliable information.
The Daily Telegraph presented an extract of this information as part of their university advice.