Welcome to the DBCA Data Catalogue

How can I get write access?

First, register. Your username should be firstname_lastname. The password can be chosen freely, it is not linked to your DBCA account. It is safe to let your browser remember the login credentials.

Secondly, become an "editor" or "admin" member of any "organisation" you wish to contribute to:

  • Get any existing member of that organisation to add you as admin or editor via:
  • Organisation page > Manage > Members
  • Existing user: find user by typing in their name
  • Role: Admin or Editor
  • Add Member

If you wish, you can set a profile picture here using your DBCA email address.

If you use R, you can add to your .Renviron using your own API key from your data catalogue profile:

CKAN_URL="https://data.dbca.wa.gov.au" CKAN_API_KEY="xxx"

This allows you to configure ckanr through

ckanr::ckanr_setup(url = Sys.getenv("CKAN_URL"), key = Sys.getenv("CKAN_API_KEY"))

What is the purpose of this Data Catalogue?

This new facility stores information about the Division's digital knowledge assets, including datasets, images, documents and other resources. This facility is in its infancy, but we are working towards fully documenting our digital data assets. Divisional staff are strongly encouraged to get in touch and document their valuable datasets.

Glossary

  • CKAN - the name of the software behind this catalogue
  • A Dataset is really a metadata entry - it defines title, description and a set of digital resources. This reflects the idea that a dataset is more than just data (read: numbers in a spreadsheet) - there are also facets of attribution, ownership, QA, geo-reference, update and release life cycle, licensing, which provide context and preserve meaning of the dataset.
  • Resources are any digital resources attached to a dataset. Resources can be files or URLs. The main resource of a dataset are of course the actual data, which may reside in data warehouse such as NatureMap (to be linked as URL resource, see example) or a spreadsheet (to be attached as file resource, see example). Ancillary resources may include a spatial file of surveyed transects (example), a plain text file with an R script (example), an SPSS save file, a SigmaPlot workbook (example), a PDF report (example) or figure (example) or any other derived product or analysis.
  • Organisations own datasets and permissions to edit them. Only members of organisations (added by catalog admins or organisation admins) can edit datasets. Each Departmental program has been registered as an organisation, so nominated program members can add datasets for their program.
  • Datasets can be allocated to any number of thematic (i.e. cross-cutting)  Groups. This enables datasets from different organisations (DPaW programs) to be grouped by any theme like "datasets relevant to oil spill response".

Using the Data Catalogue

Who can access this catalogue?

Everyone in the Department can view this catalogue (it's intranet only); only specifically appointed registered users can edit datasets of their own program.

Can I upload sensitive datasets?

This catalogue is safe to hold sensitive, unreleased data. In due course an external data catalogue will be officially announced that will contain a subset of information suitable for public access.

Is my data secure from unauthorised access?

Data in this catalogue is as secure from unauthorised access as our "T-drive", as it can only be accessed from the Departmental intranet.

Is my data safe from accidental loss?

While you are required by departmental policy to keep your own backups, data on the catalogue is safe from accidental loss (we keep rolling backups). Additionally, thorough documentation of your datasets will protect it from loss of context by preserving the meaning and value of your data for the Department beyond individual projects' life time, and promote correct attribution of your work.

Are there guidelines and rules?

  • Data should be warehoused in the most appropriate system e.g. species data should be stored in BioSys and linked from this catalogue. If there is no more appropriate place, the catalogue itself can store data (preferably in CSV spreadsheets).
  • By using this catalogue, you also agree to adhere to relevant Departmental data policies and to not abuse this catalogue or the data contained within.

Contributing datasets

How can I contribute my data?

While any departmental staff member can discover and access data, write access is restricted to authorised and trained contributors. Get in touch to receive training and receive write access to the catalogue. Contributors are also encouraged to:

Where should data be stored?

Unless a designated place for your data exists (e.g. NatureMap), this catalogue is able to hold spreadsheets (e.g. CSV) and provide neat previews and API access to it. If your data is stored in a designated data warehouse, such as NatureMap, you can create a "dataset" here which describes and links to your data on e.g. a NatureMap theme page.

Advanced users

Can I access the data from my favourite analytical environment?

Yes you CKAN! All metadata, datasets, resources, and functionality of this catalogue are accessible not only through the GUI with previews and metadata (for humans), but also through an API (for software and programmers). Compatible resources show a green "Data API" button, which displays examples to access and subset the respective resource using HTTP requests, Python, JavaScript and SQL. Any program capable of sending HTTP requests can directly interact with CSV data in this catalogue. A working example in R:

data <- read.table("http://internal-data.dpaw.wa.gov.au/dataset/1486857c-ecfa-4501-b4d5-f83fa84a57aa/resource/580a15c6-89ad-4bc9-be78-edf7ba980853/download/simulateddata1.csv", header=T, sep=",", stringsAsFactors=T); head(data)

Scientific programmers are encouraged to:

How can I benefit from this?

Example: Marine Science have lots of relatively simple, univariate observations of a metric (temperature, number of something, etc.) and often need just a simple (standardised) visualisation. Compatible datasets have been brought into a certain standard (here's a list of them) and labelled with a special keyword. The timeseries explorer lets you visualise those compatible datasets in a standard way and gives you the resulting figure as PDF, and the underlying R code (so you can reproduce the figure at any time) as text files.

Who made this?

The software behind this catalogue is based on the Open Knowledge Foundation's software CKAN. Many governments and NGOs use CKAN, amongst others, data.gov.au and data.gov.uk use CKAN.

Our instance was implemented by the [Marine Science Information Management] (https://confluence.dec.wa.gov.au/display/MSIM/Home) Team in collaboration with the Office for Information Management, with particular thanks to Adon Metcalfe and Scott Percival. Preliminary work was funded by Chevron Gorgon Offsets in September 2013.

Our instance is deployed compliant to the Office of the Auditor General's Audit recommendations.

This catalogue has been used successfully since January 2014 by Marine Science, who use it to maintain datasets, work flows, reports and summaries of their heavily data-driven, annual reports.