Key Information Sets (UNISTATS)

Raw datasets which provide comparable sets of information about full or part time undergraduate courses.

Licence and access

worldwide, royalty-free, perpetual, non-exclusive licence to use the Unistats Dataset subject to some conditions. The licence and conditions must be accepted before download will begin

Data Content

Format

A single zip download contains a folder of csv files and an xml file containing the data sets.

Structure

Key entities and code lists (integration and data-linking potential) ACCREDITATION.csv, Accreditation entity, Contains information about course accreditation, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

ACCREDITATIONTABLE.csv, Accreditation lookup table, Contains the accrediting body text and accreditation url for each ACCTYPE, Lookup table (This lookup table may be linked to the ACCREDITATION entity using ACCTYPE)

COMMON.csv, Common job types entity, Contains information relating to common job types obtained by students taking the course, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE, Linked to JOBLIST entity using PUBUKPRN, KISCOURSEID, KISMODE and COMSBJ, (Note COMSBJ may contain nulls)

CONTINUATION.csv, Continuation entity, Contains continuation information for students on the course, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

COURSELOCATION.csv, Course location entity, Contains details of the KIS course location, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE Linked to UCASCOURSEID entity using PUBUKPRN, KISCOURSEID, KISMODE and LOCID (Note LOCID may contain nulls)

COURSESTAGE.csv, Course stage entity, Contains details of the learning and teaching and assessment methods for the CourseStage, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

DEGREECLASS.csv, Degree classification entity, Contains information relating to the degree classifications obtained by students, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

EMPLOYMENT.csv, Employment statistics entity, Contains information relating to student employment outcomes, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

ENTRY.csv Entry qualifications entity, Contains information relating to the entry qualifications of students, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

INSTITUTION.csv, Institution table, This entity describes the reporting institution Linked to KISCOURSE entity using PUBUKPRN

JOBLIST.csv, Job list entity, Contains information about common job types obtained by students, Linked to COMMON entity using PUBUKPRN, KISCOURSEID, KISMODE and COMSBJ, (Note COMSBJ may contain nulls)

JOBTYPE.csv, Job type entity, Contains information relating to the types of profession entered by students, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

KISAIM.csv, KIS Aim lookup table, Contains the code and label for each KISAIM, Lookup table, (This lookup table may be linked to the KISCOURSE entity using KISAIMCODE)

KISCOURSE.csv, KIS course entity, This entity records details of KIS courses, Linked to INSTITUTION entity using PUBUKPRN and Linked to child entities using PUBUKPRN, KISCOURSEID and KISMODE

LOCATION.csv, Location lookup table, Contains details for each teaching location, Lookup table, (This lookup table may be linked to the LOCID entity using UKPRN and LOCID)

NHSNSS.csv, NHS NSS entity, Contains the results for the questions on the NSS for students on NHS funded courses, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

NSS.csv, NSS entity, Contains the National Student Survey (NSS) results, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

SALARY.csv, Salary entity, Contains salary information of students, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

SBJ.csv, Subject entity, Contains JACS level subject codes for each KISCourse, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

TARIFF.csv, Tariff entity, Contains information relating to the entry tariff points of students, Linked to KISCOURSE entity using PUBUKPRN, KISCOURSEID and KISMODE

UCASCOURSEID.csv, UCASCOURSEID entity, Contains UCAS course identifiers for each COURSELOCATION, Linked to COURSELOCATION entity using PUBUKPRN, KISCOURSEID, KISMODE and LOCID

ACCREDITATIONBYHEP.csv, Table showing usage of accreditation types by Higher Education Provider. This file enables Accrediting Bodies to determine which HE providers are using which accreditation types, to support their quality assurance and audit functions.

Data issues

Joining up the data

The dataset consists of the collection of tables that make up the relational database of KIS data. For analysis purposes, its necessary to create the joins specified in the data dictionary to reconstruct the database. However, because of the 1 to n relationship between courses and subjects, this can result in a very large sparse dataset if you don't carefully flatten the structure using a set of aggregation rules.

SQL scripts for performing a joining and flattening data can be found on Github at:

https://github.com/Cetis/heidilabs-wrangling-support/tree/master/scripts/unistats

Note that this makes a lot of assumptions about how the data can be flattened using aggregate measures. As the populations in different subjects for a course can vary, some caution needs to be exercised; if you take a look at the coefficient of variance for some of the measures being merged you'll see that there are significant variations between subjects for a course.

Course identifiers

In some tables in the dataset the KISCOURSEID field contains leading zeroes, while others do not; this can cause confusion when joining the data.

Subjects

Subjects in the tables refer to NSS subject codes rather than JACS subject codes; there is no 1:1 mapping between the two. Hopefully in future data could be coded against HECOS.

Tariffs

Tariffs are presented as proportions of tallies within bands; this makes calculating a "mean tariff" for a course somewhat problematic. Note also due to rounding, while individual cells are a proportion, the sum of bands does not always equal 100. Note also that the tariff data does not form a normal distribution, with some courses heavily skewed to the lowest or highest bands, so its not safe to exclude the highest and lowest tariff bands when calculating the mean, median, mode or range. Also, the ranges have high granularity at the low end, but low granularity at the high end; again, this makes it difficult to produce a meaningful average.

Potential applications

The dataset can be used to analyse provision of courses, and to identify gaps or overlaps in coverage as part of curriculum review.

Possible Actions in the Context of the Jisc Agile BI Projects

A more "analysis ready" version of the Unistats data would be helpful to analysts in the sector; its quite likely the challenges in preparing the data put analysts off from using the dataset in a meaningful way.

Data and Resources

Additional Info

Field Value
Source https://www.hesa.ac.uk/unistatsdata
Last Updated 29 March 2017, 08:52 (UTC)
Created 20 May 2016, 12:31 (UTC)