Bac Data

This is the reference page for the Bac Data project containing links to the data itself, analyses that use this data and relevant discussions on the topic. Feel free to comment on the topic or to download and use any of the data provided. If you do find the data useful, please cite this page as the source, so that others can also find and use the resources here.

Background

The “Bac” (short for Bacalaureat) is a yearly national exam for Romanian highschool graduates. Although the data should be publicly available for any use or purpose, the Romanian Ministry of Education and its contractor, the SIVECO company, actively hinder any use of such data, by publishing it in formats that do not allow any direct access to the complete data set. Moreover, the formats used for publishing the data change from year to year and there is no support of analysts who need access to a clean and complete data set.

Given the above issues, I’ve written code to mine the data from the Ministry’s website and I’ve made all the Bac data (from 2006 to 2014) available in a useful format for anybody interested. Moreover, I’ve ran a few analyses and published the results on my blog (in Romanian).

Update 6 October 2017: an updated archive containing all data from 2004 to 2017 (links below to download it) was kindly provided by Costin Bleotu, programmer member of code4romania. His crawler’s source code can be found at his GitHub repository.

Update 12 October 2017: a new archive with the data from “capacitate” – the national exam for younger children before starting high school. This was kindly provided by the same Costin Bleotu, programmer member of code4romania. Corresponding crawler is also available at his GitHub repository.

Update 19 October 2017: a more comprehensive data set of data from “capacitate” (see update above) for years 2004 to 2017, kindly provided by the very helpful Costin Bleotu. This data set is split in three archives below and it is scrapped off admitere.ro, hence a different website than the other capacitate data sets. For this reason I’ll leave all the archives stand as they are rather than replacing any of them – you are warmly invited to compare the data from the different sources for the same year. Let me know of any interesting differences you find and I’ll update this page with your contribution!

Data

capacitate_data_admitere_2004_2008.zip – sha512 for the .zip file is:

33dd2d4702fe4989bccbc9165bb5774df68757cfac96759e32336cd8487820ee18f6dcd206c38ff4ce32ae7919b0ec4ee6189783e669a24381a52e3cc67e163e

capacitate_data_admitere_2009_2013.zip – sha512 for the .zip file is:

a094dadff8ccb832c73a6bd8bef4e6544b548d3d9cddd7db93e5e6ba709e3227c2148a9a5a3d434d0bf965704efaf5fa5e657e4800875526f7804a6de6f637b1

capacitate_data_admitere_2014_2017.zip – sha512 for the .zip file is:

62e4bf5ba04b791f2cecab1223170328b67c343c4de3355aea1f5a2005a0130ffc72457df84b51197d8dedd4e535052fa91e076b2c4b25d21101a5cae9659749

capacitate_data_2015-2017.zip (~17MB) – sha512 for the .zip file is:

cc90ead23f8640dd6258f1999bcff3920468e3027cf335b57ace7a9f02296b758607450375f9ab4629d4cd0cb2e8a6700eb9980c3feccd74ab5d86b79258596a

The results data contain the list of candidates for each year (first exam call) and are provided in .zip archives with the MD5 checksum provided below for each file. Currently the data covers years 2006 – 2015: As of 6 October 2017, there is a new set of data (thanks, Costin Bleotu!) with all data from 2004 to 2017 in addition to the original data for every year from 2006 to 2015:

bac_data_2004-2010_summer (~86MB) – sha512 for the .zip file is:

1322f0ae2438ae411c45a283520509087b97e149e800c6c0285b6c22f210baf3a24d0273c8b066a59ec89fef85015a0bcf4ee8d8cfa7e7e3606d2197699108e7

bac_data_2011_2017_summer (~74M) – sha512 for the .zip file is:

b0ba66fc8c713b7ee8f9f425531ab7edaf78fa5389c09535a71c4ad7f328001f069cc0e3497bc7c6b27c4884dd48add8a153e161680a05e1b7478099dd7f1747

bac_data_2004-2017_fall (~52MB): sha512 for the .zip file is:

25510f4a695c4225c17a34876d765f362aa08d032a08b759732027aeee89adc1b23f5ba82f19fc731cf9a685f188bc4cc66596b0dae9120365342c71597976c1

bac data_2006: 49d7a8f28507ff6e81cc4bb116e874d4

bac data_2007: aca4382b9189f104d43745b32c11746d

bac data_2008: f5846b68b27de00f068b78aa43be54ec

bac data_2009: e4457f1403a9ba0bfa1bbdba6e48b049

bac data_2010: fc4774a8f3df1357a6000b7c3bbc3b62

bac data_2011: 57a971f735e62b7525c40fb80c20559a

bac data_2012: df026420e3851fc672dfcc4d11c54f87

bac_data_2013: bbeb10637f5a6183f90214c65c039500

Updated (thanks to Andrei Filip for running the scripts for 2014 and sending me the file):
bac_data_2014: 653cb7accb52f5393382158f03567ca1

Updated 05/03/2015 (upon request for data from the 2nd exam session in autumn):

bac_data_2014_autumn_exams: 6F0BE97368E5B6F393CFA0BF3055D6E7

Updated 31/07/2015 Thanks to Gabriel Kreindler who sent me the cleaned data for 2015:

bac_data_2015: da17176824e849776d2f15e90d2ebdef

 

As others contacted me asking for additional data sets related to the main Bac data, I’ve also mined and published data related to the grading centres for the exam, from 2006 to 2012 2015 (thanks, Gabriel Kreindler!):

centres 2015:  4e96a9b94b3d8f24d6f44d8d0c8a0ed7

cleaned dataset centres 2006-2014: eb0996d48005b905859e6d9ee62fa19d

centre 2006 : c622dd2437e8dc50cad2c61a465d79e0

centre 2007 : b3adae3fc41c07759a651a5060e8c68a

centre 2008 : 12f472cc43af776cdfe1f7252ae144dc

centre 2009 : 2d524bc5cbc1308c6889df37ecffb430

centre 2010 : 2b2a432bc53921f4479ded67ed366ddf

centre 2011 : cc81b27dde5736b786fb3a0600169240

centre 2012 : cd6f1e768153efb1c658a6e145309758

Please note that these data sets were cleaned only superficially and hence errors or mismatches might still be present. (I cleaned basically what could be cleaned automatically, but a thorough cleaning would require some manual input too.)

Analyses

In 2011, when I first started mining this data, I’ve ran a few analyses of it and I published the results on my blog (in Romanian).

Several other people used the data I provided and published their own analyses and results. I’ve published (in Romanian) a summary of the main analyses of this data set, indicating also the people involved.

More recently, I found out of a project of researchers from the University of Gothenburg who use the Bac data to investigate “the consequences of monitoring on corruption and the education opportunities for the poor” (Oana Borcan, Andreea Mitrut si Mikael Lindhal).

Citation:

If you use the above data, please reference this page as the source. Here is an example:

Coman, D. (2013). Bac Data [online] Available at: <http://dianacoman.com/bac-data>

Your Input

If you use or plan to use Bac data, I am interested in hearing about your project. If there is some additional data you need, feel free to leave a message below and I might be able to help you.

Any other related comments and suggestions are welcome.

RSS Subscribe to Ossasepia

Archive:

Recent comments: