The R Project - The Use of R in Official Statistics - uRos2014

R Project

- Romania Team

Presentations

Duncan E.

1. Use of R in the UK Office for National Statistics
Duncan ELLIOTT, UK Office for National Statistics
Abstract: The UK Office for National Statistics (ONS) use R predominantly as a research tool. This paper gives some examples of areas where it is used in analysis and the one area it is used in production of a National Statistic. Examples of where it is used in analysis, include the dlm package (Petris, 2010) for developing a state space model for unemployment estimates, and the spatstat package (Baddeley and Turner, 2005) for visualisation of crime data. The one area where R is used for publishing a National Statistic is mortality rates. An overview of how the MortalitySmooth package (Camarda, 2012) is used to produce smoothed mortality rates will be presented. There is a small group within ONS interested in exploring the use of R more widely in production systems, and we conclude with some information on the sorts of issues we face.

Matthias T., Alexander K., Bernhard M.

2. Development and Current Practice in Using R at Statistics Austria
Matthias TEMPL, Statistik Austria & Vienna University of Technology
Alexander KOWARIK, Statistik Austria
Bernhard MEINDL, Statistik Austria
Abstract: R becomes more an more popular in national statistical onces not only for simulation tasks. Nowadays R is also often already used in the production process. A lot of new features for a lot of common tasks required in official statistics have been developed over the last years and are freely available in the form of add-on package. These packages - among thousands of other packages - can easily be used. In this contribution we first show the use of R at Statistics Austria. This topic includes the infrastructure, the teaching of employees and various concepts to support and help staff that wants to use R in their daily work. In the second part, the R developments from the methods unit at Statistics Austria is summarised. This includes packages for data pre-processing (e.g imputation) up to packages for the final dissemination of data including packages for statistical disclosure control, estimation of indicators and the visualisation of results.

Gergely D.

3. Creating statistical reports in the past, present and future
Gergely DAROCZI, Easystats Ltd
Abstract: The talk starts by giving a historical overview of a variety of statistical and reporting tools actively used by practicing data analysts in the past 100 years. That time covered lots of changes both in methodology and in tools: existing methods were improved and new ones were also discovered, on the other hand mainframes, personal computers and nowadays cloud computing became the standard source of processed data instead of statistical tables and the slide rule.
This also changed the way how statistics and data analysis are used by an ever growing number of experts, laymen and industries: it is no surprise nowadays to do e.g. customer segmentation without any deep theoretical knowledge on how k-means cluster or latent-class analysis really works.
And this is just fine. Doing data analysis means something different compared to what Karl Pearson did in the past: in our days, statistical wizards and data-driven decision-making tools can help us to run valid analysis on live databases - even on mobile platforms.
This talk will concentrate on how to create annotated reproductible statistical templates and reports in R, building on the “rapport” and “pander” packages (maintained by the presenter).

Mark VAN DER LOO

4. Data cleaning for official statistics with R
Mark VAN DER LOO, Statistics Netherlands
Abstract: Preparing data for statistical analyses, often referred to as data cleaning, or data editing is one of the most resource-consuming activities of national statistical institutes. Indeed, De Waal et al. (2011) report that up to 40% of resources devoted to a statistic may be spent on data cleaning activities. For this reason it is desirable to automate the process when possible.
Statistics Netherlands has adopted R as a strategic tool in 2010, and since then a number of R-based tools aimed to facilitate data cleaning have been developed. These tools include a package for performing and manipulating record-wise (possibly multivariate) restrictions, tools for error localisation based on the minimal change principle of Fellegi and Holt (1976), adjustment methods for numerical variables, and several deterministic and deductive imputation or correction methods. A recent overview of these tools was given in de Jonge and van der Loo (2013).
Here, we present an overview of these tools using a recently developed production system for statistics on child care centers as an example [Pannekoek et al. (2013)]. We show how the tools can be combined into a production system that improves data in several consecutive data cleaning steps. Because of the highly modular approach, the effect of each substep in the chain of data cleaning operations can be traced. We show several useful process indicators that allow us to follow the progress of a data set as it gets cleaned. Finally, we will provide an outlook on our future plans for the toolset.

Ana Maria D., Cecilia-Roxana A.

5. The progress of R in Romanian Official Statistics
Ana Maria DOBRE, National Institute of Statistics, Romania
Cecilia-Roxana ADAM, National Institute of Economic Research
Abstract: The present paper exposes an overview of the state-of-the-art of R statistical software in the official statistics in Romania, predominantly in the social statistics. Examples on data analysis and econometric models of Small Area Estimation successfully completed are given.
The scientific approach includes also a summary of the applications of R in other statistical offices around the world. Other countries like United Kingdom or Netherlands are truly experienced in the use of R. We conclude with a series of proposals on the future research opportunities and other potential analysis procedures of R in the social statistics.

Sofija S.

6. Estimation procedure in Monthly retail trade survey in Serbia using R software
Sofija SUVOCAREV, Statistical Office of the Republic of Serbia
Abstract: The objective of Monthly retail trade survey (MRTS), based on the sample and on the VAT reports received from Tax administration, is to provide the data on turnover of goods in retail trade in order to measure monthly changes in turnover. Indices, totals and standard errors are calculated for territory of the Republic of Serbia and the territorial units (NUTS 2). For the Republic of Serbia, these parameters are calculated also by two groups and eight classes of NACE Rev. 2. The calculation is based on stratified simple random sample. This paper shows how estimation procedure for these parameters is implemented in R software.

Stefan C.

7. Statistical data analysis via R and PHP: A case study of the relationship between GDP and foreign direct investments for the Republic of Moldova software
Stefan CIUCU, Cybernetics and Statistics Doctoral School, Bucharest University of Economics Studies
Abstract: This paper provides an overview over a way of integrating R with PHP scripting language in order to analyze statistical data (time series). We analyze the relationship between the foreign direct investments and GDP of the Republic of Moldova over 1992-2012 time period.

Nicolae-Marius J.

8. Multilevel model analysis using R
Nicolae-Marius JULA, Nicolae Titulescu University
Abstract: The complex datasets cannot be analyzed using only simple regressions. Multilevel models (also known as hierarchical linear models, nested models, mixed models, random coefficient, random-effects models, random parameter models, or split-plot designs) are statistical models of parameters that vary at more than one level. Multilevel models can be used on data with many levels, although 2-level models are the most common.
Multilevel models, or mixed effects models, can be estimated in R. There are several packages available in CRAN. In this paper we are presenting some common methods to analyze these models.

Eliza-Olivia L., Ana-Maria Z., Cristina M.

9. A study on intragenerational mobility with R
Eliza-Olivia LUNGU, National Research Institute for Labour and Social Protection
Ana-Maria ZAMFIR, National Research Institute for Labour and Social Protection
Cristina MOCANU, National Research Institute for Labour and Social Protection
Abstract: We explore the early career mobility of the Romanian higher education graduates employing several R packages for social networks: igraph, sna, network, statnet. The nodes are represented by occupations, while the links represent movements of individuals from one job to another. A job change is dened as an experience of inter-organisational mobility. The network is constructed as a weighted and directed one with self-loops. Considering that the occupations are related to each other via transferable skills, we visualize paths of mobility and calculate network indicators in order to understand models of connectivity between occupations.

Monica Mihaela M.-M., Eliza-Olivia L.

10. Nonparametric regression models estimation in R
Monica Mihaela MAER MATEI, Bucharest University of Economic Studies, National Scientific Research Institute for Labour and Social Protection
Eliza-Olivia LUNGU, National Research Institute for Labour and Social Protection

Mihaela Janina M.

11. Demographic research on the socio economic background of students of the Ecological University of Bucharest
Mihaela Janina MIHAILA, Ecological University of Bucharest
Abstract: The paper relates about a socio demographic and economic research performed on first year students at the Ecological University of Bucharest, where we are focusing on understanding and investigating the conditions inside the families and the socio environment in the home towns on these students. This research is key in understanding the correlations between the socio economic conditions inside the family and the geographical area and the actual career options and decisions of the newly admitted student to our faculties.

Bogdan O., Raluca Mariana D.

12. Integrating R and Hadoop for big data analysis
Bogdan OANCEA, Nicolae Titulescu University of Bucharest
Raluca Mariana DRAGOESCU, Bucharest University of Economic Studies
Abstract: Analyzing and working with big data could be very difficult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, big data requires large clusters with tens, hundreds or even thousands of computing nodes. Official statistics is increasingly considering big data for deriving new statistics because big data sources could produce more relevant and timely statistics than traditional sources. One of the software tools successfully used for storage and large-scale processing of big data-sets on clusters of commodity hardware is Hadoop. Hadoop framework contains libraries, a distributed file-system (HDFS), a resource-management platform and implements a version MapReduce programming model for large scale data processing. In this paper we investigate the possibilities of integrating Hadoop with R which is a popular software used environment for statistical computing and data visualization. We present three ways of integrating them: R with Streaming, Rhipe and RHadoop and we emphasize the advantages and disadvantages of each solution.

Marius Florin R., Ioana M., Razvan N.

13. Using R to Get Value Out of Public Data
Marius Florin RADU, Babes-Bolyai University
Ioana MURESAN, Babes-Bolyai University
Razvan NISTOR, Babes-Bolyai University
Abstract: Public sector information contains great value for the citizens in general. Data stored on computers of public institutions doesn’t have value on its own. It has to be processed and analyzed to obtain information, and further on, information should be made available as public good, in order to facilitate its transformation to knowledge. R is a free software programming language, an environment and toolkit of modules addressed to anyone working with statistics. R can ease the road from public data to civic wisdom. This article is a brief review of R capabilities to extract, transform, analyze, and visualize public data. Second part of the article presents an example of a full-fledged web application written entirely in R. The application uses loosely structured government data about Romanian Auto Park in order to present it in a friendly dashboard.

Elena R.

14. Data Editing and Imputation in Business Surveys
Elena ROMASCANU, National Institute of Statistics
Abstract: .

Muhammad S., Vasile P.

15. On Consistency of Parameter estimates and forecasting Volatility with GARCH Models
Muhammad SHERAZ, Faculty of Mathematics and Computer Science, University of Bucharest
Vasile PREDA, Faculty of Mathematics and Computer Science, University of Bucharest
Abstract: The GARCH models [1986] were designed to capture certain characteristics of time series and particularly in financial time series these models play very important role to study the stylized facts for example volatility clustering, fat-tails and leverage effect. A rapid and growing interest has been developed for the methods of estimation and forecasting since the introduction of these models. In this article we study some real data of financial time series to present the performance of GARCH models for distributions of simulated parameters to investigate the insight on consistency of the parameter estimates and in addition we discuss the forecasting performance of selected models.

Mihaela S.

16. On Consistency of Parameter estimates and forecasting Volatility with GARCH Models
Mihaela SIMIONESCU (Bratu), Institute for Economic Forecasting of the Romanian Academy
Abstract: Bayesian econometrics knew a considerable increase in popularity in the last years, joining the interests of various groups of researchers in economic sciences and additional ones as specialists in econometrics, commerce, industry, marketing, finance, micro-economy, macro-economy and other domains. The purpose of this research is to achieve an introduction in Bayesian approach applied in economics, starting with Bayes theorem. For the Bayesian linear regression models the methodology of estimation was presented, realizing two empirical studies for data taken from the Romanian economy. Thus, an autoregressive model of order 2 and a multiple regression model were built for the index of consumer prices. The Gibbs sampling algorithm was used for estimation in R software, computing the posterior means and the standard deviations. The parameters’ stability proved to be greater than in the case of estimations based on the methods of classical Econometrics.

Florin P.

17. Methodological considerations on the size of Coefficient of Intensity of Structural Changes (CISS)
Florin PAVELESCU, Institute of National Economy of the Romanian Academy
Abstract: In the paper there are brought arguments in favour of emphasizing the modeling factors of the Coefficient of Intensity of Structural Changes (CISS) in order to obtain a better interpretation of the significance of the respective statistics. Also, it is highlighted the impact of characteristic features of structural changes on the correlation between CISS computed at the economic branch level and sectorial level respectively. At the end of the paper, it is presented a numerical example, which gives the occasion to review all the necessary steps for identification of CISS modeling factors. Potentially, the respective steps could carried out by using of R Software.

Carmen U.

18. Using R as an alternative teaching tool in the Ecological University of Bucharest
Carmen UNGUREANU, Ecological University of Bucharest
Abstract: In a global world universities want to offer the best education to their students so that they can be competitive on the labour market both in the country where they studied and beyond its borders.
The Romanian education system - currently undergoing reform - attaches great importance to the use of traditional efficient teaching tools, along with new alternative ones.
The R data analysis system represents such an alternative method that the Ecological University of Bucharest uses in order to stimulate the student’s creativity in problem solving.

Nicoleta C., Ciprian-Alexandru A., Ana Maria D.

19. R – a Global Sensation in Data Science
Nicoleta CARAGEA, Ecological University of Bucharest
Ciprian-Alexandru ALEXANDRU, Ecological University of Bucharest
Ana Maria DOBRE, National Institute of Statistics, Roman
Abstract: The main intention of this paper is to expose the evolution of R, as the most used data analysis tool among statisticians. Its flexibility and complexity simply gained the statisticians and data scientists.
The paper examines some of the reasons behind the popularity of R, using tools like SWOT analysis.
R software environment offers integrated tools for all kind of data analysis, from computations and data mining to high-effects visualization. As an example, we performed in this paper an example of 3D plotting.

The R Project - The Use of R in Official Statistics - uRos2014 - Presentations