ON THE USE OF R PROGRAMMING LANGUAGE IN THE ANALYSES OF SPATIAL DATA

R is a powerful and increasingly popular programming language with strong graphical and presentation features and large expan-dability. Although primarily intended for statistical computing, R has paved its way to the field of GIS through the development of specialized extension packages. It offers a wide range of functions at all GIS levels: data acquisition, data manipulation, graphical re-presentation and quantitative analysis. The paper presents R as an open source alternative to the existing commercial GIS software. It proves especially well when advanced quantitative methods on spatial data are needed (e.g. spatial modelling). We demonstrate R capabilities through spatial analysis of forest area in Snežnik (South Slovenia), where the possibilities of data import, conversion and export into various GIS formats and possibilities of geostatistics, spatial modelling and spatial visualization are demonstrated.


INTRODUCTION UVOD
R is a free (open-source) software environment for statistical computing and graphics (http://cran.r-project.org/). It is a versatile, object-oriented programming language with strong graphic and demonstration functions. The R community continues to evolve and expand, actively developing new extensions known as libraries. There are currently close to 4,000 such libraries, which enable a variety of specialized data analyses and management, as well as data visualizati-on. R is a command-line program that generally does not possess a graphic interface. This lack of interface may prove particularly challenging for a new user. However, it is specifically the software's code that gives R the capability to repeat commands and to create a user's own methods for data processing and visualization. This demands a qualitative understanding of any acquired data. From this perspective, R greatly exceeds the so-called 'black-box' programs.
Current academic research on the use of the R programming environment suggests that R is predominantly Izvirni znanstveni članek / Original scientific paper
Although R is primarily intended for statistical computing (e.g. linear and nonlinear models, regression, multivariate statistics, data mining and machine learning) and for the presentation of results in the form of graphs, its relatively intuitive programming enables integration with other research methods, including GIS. The R programming environment offers a wide range of functions at all three levels of GIS: (1) data collection, (2) data manipulation and (3) data presentation. The complexity of spatiotemporal data analysis has necessitated the development of the 'sp' library, which sets a framework for structuring and storing spatial data. The 'sp' library delineates different classes of spatial data, thereby determining the structure and organization of spatial data and methods. These functions are adapted to individual classes (Bivand et al., 2013). The 'sp' library consists of methods for the point, line, polygon and raster data processing, which provides a wide variety of spatial data processing options within the R environment. It is possible to convert between different classes and save the objects in the form of standard GIS formats and spatial reference systems.
In addition to the 'sp' library, the following libraries are relevant for the spatial data processing: • rgdal -to read and write a variety of established raster data (e.g. GeoTIFF, ERDAS Imagine, SDTS, ECW, MrSID, JPEG2000, DTED, NITF) and vector data (SHP, ESRI ArcSDE, MapInfo (tab and mid / mif), GML, KML, PostGIS, Oracle Spatial) to define and transform the projections; • maptools -to read, record and display the most common vector data, in addition to many other features; • raster -special features for handling raster data; • gstat -geostatistical methods (e.g. (co)kriging, variograms); • geoR -spatial modelling (e.g. Frequentist inference, Bayesian); and • rgeos -an interface for operations on topologies (e.g. intersection, union, buffer zone).
The aim of this article is to demonstrate the R's potential for the spatial data processing and presentation. To demonstrate the competency of R for GIS analyses, we performed a spatiotemporal analysis of actual measured data. Our main focus here is to demonstrate the programming environment, and the spatial analysis itself may be meaningful (i.e. species distribution modelling at a small scale). We illustrate working with vector data, the spatial interpolation of point data and working with raster data. We present an overall picture of large amounts of data processing (e.g. LiDAR) and spatial modelling. At each step, the data are also presented graphically to show the R's capability as a visualization tool.

MATERIALS AND METHODS MATERIALI IN METODE
All of the data in this paper were obtained from a Snežnik (south Slovenia) forest. The study area measures 20 ha and increases in altitude from 820 m to 880 m. Silver fir and European beech are the dominant tree species. The terrain is characterized by abundant sinkholes. The following data are available for the study area: Air temperature: air temperature was measured at 65 locations, at the intersections of a 50 × 50 m grid using DL-120 TH temperature loggers (sensor SHT 11 sensor, accuracy ± 0.5 °C) from May 1, 2008, to February 28, 2009, every 10 minutes.
LiDAR data: 3D point cloud measurements using a Riegl LM5600 laser scanner, mounted on a helicopter with a relative horizontal accuracy of 10 cm, a relative vertical accuracy of 3 cm and a 180 kHz laser pulse frequency. The density of points is 30 points/m 2 , with a footprint of 30 cm.
Vegetation data: a summer survey of shrubs, herbs and mosses at 65 locations, at the intersections of the 50 × 50 m grid, in accordance with the Central European method (Braun-Blanquet, 1964).
For detailed description of materials and methods, see Kobal (2011).

RESULTS AND DISCUSSION REZULTATI IN RAZPRAVA 3.1 Manipulation of vector data 3.1 Upravljanje vektorskih podatkov
At all 65 locations, coordinates were recorded using GPS devices and exported to a text file. This text file was imported into the R environment (with the function 'read.table') using the library 'maptools' (Lewin-Koh and Bivand, 2012). A shapefile was created using the function 'writePointsShape'. At this stage, the attribute table only contained a plot name to which, using the function 'match', minimum temperature values for a random date and a random time (e.g., temperature on May 08, 2008, at 8:00 am) were ascribed. Figure 1 shows the location of the temperature loggers and the symbol size representing the temperature value.

Geospatial spatial interpolation Geostatistična prostorska interpolacija
The next step was to perform a spatial interpolation (kriging) of the temperature throughout the research area using the library gstat (Pebesma, 2012). During the spatial interpolation procedure, it is first necessary to select the variogram model (functions 'variogram' and 'fit.variogram'), which is a function of the spatial dependence of random variables. There are different models of variograms. In our study, we selected a circular model, with a threshold of 2.67°C/m 2 (this represents the spatial variance), a variogram range of 164 m (i.e., the maximum distance between two points where values of air temperature are related), and a nugget of 0 (i.e. measurement error or variability at the local level). Thus, the point measure-ments were used to create a continuous temperature field in raster format (using the function 'krige') ( Figure 1).

LiDAR data processing Obdelava podatkov LiDAR
In this part, we present the power of R as a tool for large amounts of data processing, programming and adapting basic functions. The raw LiDAR data for 1 km 2 has a size of 539,468 KB (539 MB) and contains 20,736,221 rows and 62,208,663 data points.
In R, we wrote an algorithm to eliminate points that represent forest trees in the whole cloud of points, yielding a point of the terrain. The algorithm is based on a point classification that incorporates the distance and angle between the lowest point and its neighbor- ing points within a certain area. The algorithm also provides a visual check to remove certain points of the terrain (Figure 2). A digital elevation model (DEM) was produced based on these classified points. We used functions from the library raster for this stage (Hijmans and van Etten, 2012). Within each of the 2 × 2 m raster cells, we calculated an average altitude value (the z value). Where zero points occurred in the raster cell (i.e. with the presence of large trees with dense canopy), the value was interpolated within these cells in relation to the neighboring cells. The DEM was displayed as a 3D image ( Figure 3).
In addition, we calculate the topographic position index (TPI), as an example of processing raster data in R. The TPI (Weiss, 2001) is based on neighbourhood cells statistics (Jenness, 2006). It is defined as the difference between a chosen cell elevation and the average elevation of the neighbouring cells around a chosen cell ( Figure 4). A positive value indicates that the chosen cell is at a higher elevation than its surroundings, whereas a negative value indicates that the cell is lower. If the chosen cell is significantly higher than the surrounding neighbourhood, it may be at or near the top of a hill or ridge (TPI > 0). Significantly low values suggest that the cell is at or near the bottom of a valley (TPI <0). TPI values close to zero could mean either a flat area or a mid-slope area, so the cell slope can be used to distinguish the two (TPI ≈ 0). The function for the topographic position index is part of the library raster.

Spatial modeling Prostorsko modeliranje
The purpose of this section is to demonstrate the potential of R for use in spatial modelling. We used data on the presence of plant species in our study area. The authors are aware that modelling occurrences of species in such a small area is not a meaningful study, In our spatial model (logistic regression), the occurrence of a selected species, Doronicum austriacum Jacq., was used as a dependent variable, while the independent variables used were topography and climate characteristic of a site. A developed GLM model was used to predict the plant species reaction to changes in temperature. A logistic regression analysis (library base) was used to predict the probability of the presence of Doronicum austriacum. Air temperature was chosen as one explanatory variable (presented in detail in section 3.2), and the topographic position index was chosen as another. The latter was calculated using the digital elevation model described in section 3.3.
Temperature and the topographic position index proved to be statistically significant variables (p temp = 0.0116; p tpi = 0.0390). Both of the logistic regression coefficients are negative, meaning that the probability of species occurrence decreases with increasing temperature and topographic position index ( Figure 5). This result reflects the ecological niche of the Doronicum austriacum species: it occurs mainly in the colder sinkholes, where the value of the topographic position index is negative and the air temperature is lower.

Visualization of spatial data Predstavitev prostorskih podatkov
As a demonstration of R's potential for producing spatial animations, we have modelled the probability of Doronicum austriacum occurrence with a potential temperature increase of 1°C by the year 2040. For each decade, we produced a map of species distribution and calculated the area in which the species was maintained. To define the presence and absence of the species, a probability (p = 0.5) was used as a threshold. From the initial surface area of 2.1 ha in 2010, the area of species distribution will be reduced to 0.02 ha by 2040 ( Figure 6). The final stage revealed the potential of using R for visualization of 3D LiDAR point cloud data. We used a library RGL (Adler and Murdoch, 2012). The library allows 3D real-time visualization, including a variety of animations (Figure 7). The points are coloured according to their z-value, which represents elevation in this study.

CONCLUSIONS ZAKLJUČKI
R has become not only a high quality open-source software environment for statistical computing and graphics but also a high performance geographic information system tool that can be used for geospatial data production, analysis, and mapping. A number of studies (Iranpanah et al., 2009, van Etten and Hijmans 2010, Bojanowski et al., 2013, Zwertvaegher et al., 2013 have demonstrated that R, in combination with spatial libraries, is a powerful tool for many research fields and scientific tasks within the domain of environmental science.
Using the R software environment for spatial-temporal analysis provides important opportunities for the research community to understand the local, regional and global dynamics of spatiotemporal processes. R allows the implementation of various algorithms, such as those used in this study. Within one programming environment, R provides unlimited possibilities for analyzing and processing spatial data using advanced quantitative methods. This is particularly significant when attempting to solve complex research questions. R allows the usage of many control flows, loops and user-defined functions, as well as multiple input and output data formats and the opportunity to codify the existing data and functions. The entire process of analyzing data within R is run through a written script and syntax, which means that it is simple to rerun these analyses if needed. The fact that R is open-source software is also a significant advantage. If the time course that is used to run scripts is excluded, the software is of no cost to the user. Further, R may benefit many sectors, not just research.
However, there are some disadvantages of analyzing spatial data within the R environment. Unlike desktop GIS tools, R requires complex scripting interaction with the map. However, this disadvantage can be overcome with the use of libraries, such as RSAGA (R + SAGA), spgrass6 (R + GRASS), RgoogleMap (R + GoogleMap) and RpyGeo (R + ArcGIS), which make R more competitive with traditional GIS tools.
Because spatial analysis often involves vast amounts of data processing, it is significant that R is capable of implementing highly complex processes. Although such processing on a single computer may overburden the available processor core, this limita- tion can be overcome by dividing the servers in a 'cluster supercomputer' (Schmidberger et al., 2009).

ACKNOWLEDGEMENT ZAHVALA
The work was written in the context of EU regional funding initiative -INTERREG IV Alpine Space program; project ''NewFor'' (NEW technologies for a better mountain FORest timber mobilization) and project V4-1141. The data were collected in the previous research project V4-0541, young researcher program (MK) and research program P4-0107.