This is the reference page for the Bac Data project containing links to the data itself, analyses that use this data and relevant discussions on the topic. Feel free to comment on the topic or to download and use any of the data provided. If you do find the data useful, please cite this page as the source, so that others can also find and use the resources here.
The “Bac” (short for Bacalaureat) is a yearly national exam for Romanian highschool graduates. Although the data should be publicly available for any use or purpose, the Romanian Ministry of Education and its contractor, the SIVECO company, actively hinder any use of such data, by publishing it in formats that do not allow any direct access to the complete data set. Moreover, the formats used for publishing the data change from year to year and there is no support of analysts who need access to a clean and complete data set.
Given the above issues, I’ve written code to mine the data from the Ministry’s website and I’ve made all the Bac data (from 2006 to 2014) available in a useful format for anybody interested. Moreover, I’ve ran a few analyses and published the results on my blog (in Romanian).
The results data contain the list of candidates for each year (first exam call) and are provided in .zip archives with the MD5 checksum provided below for each file. Currently the data covers years 2006 – 2015:
bac data_2006: 49d7a8f28507ff6e81cc4bb116e874d4
bac data_2007: aca4382b9189f104d43745b32c11746d
bac data_2008: f5846b68b27de00f068b78aa43be54ec
bac data_2009: e4457f1403a9ba0bfa1bbdba6e48b049
bac data_2010: fc4774a8f3df1357a6000b7c3bbc3b62
bac data_2011: 57a971f735e62b7525c40fb80c20559a
bac data_2012: df026420e3851fc672dfcc4d11c54f87
Updated (thanks to Andrei Filip for running the scripts for 2014 and sending me the file):
Updated 05/03/2015 (upon request for data from the 2nd exam session in autumn):
Updated 31/07/2015 Thanks to Gabriel Kreindler who sent me the cleaned data for 2015:
As others contacted me asking for additional data sets related to the main Bac data, I’ve also mined and published data related to the grading centres for the exam, from 2006 to
2012 2015 (thanks, Gabriel Kreindler!):
centres 2015: 4e96a9b94b3d8f24d6f44d8d0c8a0ed7
cleaned dataset centres 2006-2014: eb0996d48005b905859e6d9ee62fa19d
centre 2006 : c622dd2437e8dc50cad2c61a465d79e0
centre 2007 : b3adae3fc41c07759a651a5060e8c68a
centre 2008 : 12f472cc43af776cdfe1f7252ae144dc
centre 2009 : 2d524bc5cbc1308c6889df37ecffb430
centre 2010 : 2b2a432bc53921f4479ded67ed366ddf
centre 2011 : cc81b27dde5736b786fb3a0600169240
centre 2012 : cd6f1e768153efb1c658a6e145309758
Please note that these data sets were cleaned only superficially and hence errors or mismatches might still be present. (I cleaned basically what could be cleaned automatically, but a thorough cleaning would require some manual input too.)
In 2011, when I first started mining this data, I’ve ran a few analyses of it and I published the results on my blog (in Romanian).
Several other people used the data I provided and published their own analyses and results. I’ve published (in Romanian) a summary of the main analyses of this data set, indicating also the people involved.
More recently, I found out of a project of researchers from the University of Gothenburg who use the Bac data to investigate “the consequences of monitoring on corruption and the education opportunities for the poor” (Oana Borcan, Andreea Mitrut si Mikael Lindhal).
If you use the above data, please reference this page as the source. Here is an example:
Coman, D. (2013). Bac Data [online] Available at: <http://dianacoman.com/bac-data>
If you use or plan to use Bac data, I am interested in hearing about your project. If there is some additional data you need, feel free to leave a message below and I might be able to help you.
Any other related comments and suggestions are welcome.