How to Get Started With Data SGP

Data sgp is a free-to-use, publicly available, and community-supported platform for the visualization and analysis of multi-proxy sedimentary geochemical data (iron, carbon, sulfur, major and trace metals) from the Neoproterozoic through the Paleozoic. The project aims to assemble or generate new geochemical data from ‘background’ intervals in order to address specific research questions and provide an opportunity for researchers worldwide to use the data.

Data services make sense of massive collections of structured and unstructured data from a variety of sources–from customer records in online transaction processing databases, to property damage data from insurance claims, to images or videos stored in big data lakes. These services apply governance principles, organization, and maintenance to create data that is useful to applications and easily accessible by users.

A successful data steward has both a big-picture view of how an organization uses its data, as well as deep knowledge about the down-to-earth details of how it’s created, managed, manipulated, stored, and used. They serve as a bridge between technical staff and the business professionals who use data in their daily work.

They are also responsible for creating, promoting, and supporting the organizational data governance policies and processes that make sure that the organization gets the most value out of its data. This includes providing training and guidance to business units so they can use the data to help improve their own businesses. Strong data stewards inspire employees throughout the organization to be smarter, faster, and safer when working with data.

To get started with data sgp, you’ll need to download the sgpdata r package and familiarize yourself with its features. The package is an add-on to the open source programming language, R. It provides advanced functions for performing SGP analyses and requires some familiarity with the syntax of the language. The SGP Data Analysis Vignette is an excellent resource for learning how to use the sgpdata package.

When using sgpdata, you’ll need to format your data in either WIDE or LONG formats. The lower level functions, studentGrowthPercentiles and studentGrowthProjections, use the WIDE data format, while the higher level wrapper functions, prepareSGP and updateSGP, utilize the LONG format. If you plan on running SGP analyses operationally, it is likely best to use the LONG data format for all of your analyses.

The sgpdata package requires the following 7 variables in your sgpdata file: VALID_CASE, CONTENT_AREA, YEAR, ID, SCALE_SCORE, GRADE and ACHIEVEMENT_LEVEL. These variables are required for all SGP analyses. The sgptData_LONG data set contains the same 7 variables, plus demographic/student categorization variables, for all 5 years of assessment data. This data set is needed if you plan on running student growth projections and achievement plots. The sgpdata_WIDE data set is identical to sgpdata_LONG except it omits the demographic/student categorization variables. This is useful if you want to run just a single student growth percentile or achievement plot. Both data sets are available for download from our GitHub repository.