-
Notifications
You must be signed in to change notification settings - Fork 22
/
tools.qmd
44 lines (40 loc) · 6.86 KB
/
tools.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Tools
Below are some of the tools and packages used in workflows. R and Python package "Type" is BIO for packages specifically for biological applications, and GEN for generic packages.
## R
Package | Type | Description
-----|-------|------
[bdveRse](https://bdverse.org/) | BIO | A family of R packages for biodiversity data.
[ecocomDP](https://cran.r-project.org/web/packages/ecocomDP/index.html) | BIO | Work with the Ecological Community Data Design Pattern. 'ecocomDP' is a flexible data model for harmonizing ecological community surveys, in a research question agnostic format, from source data published across repositories, and with methods that keep the derived data up-to-date as the underlying sources change.
[EDIorg/EMLasseblyline](https://ediorg.github.io/EMLassemblyline/) | BIO | For scientists and data managers to create high quality EML metadata for dataset publication.
[finch](https://cran.r-project.org/web/packages/finch/index.html) | BIO | Parse Darwin Core Files
[iobis/obistools](https://iobis.github.io/obistools/) | BIO | Tools for data enhancement and quality control.
[robis](https://cran.r-project.org/web/packages/robis/index.html) | BIO | R client for the OBIS API
[ropensci/EML](https://docs.ropensci.org/EML/) | BIO | Provides support for the serializing and parsing of all low-level EML concepts
[taxize](https://cran.r-project.org/web/packages/taxize/index.html) | BIO | Interacts with a suite of web 'APIs' for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more.
[worrms](https://cran.r-project.org/web/packages/worrms/index.html) | BIO | Client for [World Register of Marine Species](http://www.marinespecies.org/). Includes functions for each of the API methods, including searching for names by name, date and common names, searching using external identifiers, fetching synonyms, as well as fetching taxonomic children and taxonomic classification.
[Hmisc](https://www.rdocumentation.org/packages/Hmisc/versions/4.6-0) | GEN | Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, and recoding variables. Particularly check out the [describe()](https://www.rdocumentation.org/packages/Hmisc/versions/4.6-0/topics/describe) function.
[lubridate](https://cran.r-project.org/web/packages/lubridate/index.html) | GEN | Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects.
[stringr](https://cran.r-project.org/web/packages/stringr/index.html) | GEN | Simple, Consistent Wrappers for Common String Operations
[tidyverse](https://cran.r-project.org/web/packages/tidyverse/index.html) | GEN | The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.
[uuid](https://cran.r-project.org/web/packages/uuid/index.html) | GEN | Tools for generating and handling of UUIDs (Universally Unique Identifiers).
## Python
Package | Type | Description
-----|-------|------
[metapype](https://pypi.org/project/metapype/) | BIO | A lightweight Python 3 library for generating EML metadata
[python-dwca-reader](https://python-dwca-reader.readthedocs.io/en/latest/index.html) | BIO | A simple Python package to read and parse Darwin Core Archive (DwC-A) files, as produced by the GBIF website, the IPT and many other biodiversity informatics tools.
[pyworms](https://github.com/iobis/pyworms) | BIO | Python client for the World Register of Marine Species (WoRMS) REST service.
[numpy](https://numpy.org/) | GEN | NumPy (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering. It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems.
[pandas](https://pandas.pydata.org/) | GEN | pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Super helpful when manipulating tabular data!
[uuid](https://docs.python.org/3/library/uuid.html) | GEN | This module provides immutable UUID objects (class UUID) and the functions uuid1(), uuid3(), uuid4(), uuid5() for generating version 1, 3, 4, and 5 UUIDs as specified in RFC 4122. Built in -- part of the Python standard library.
[obis-qc](https://github.com/iobis/obis-qc) | BIO | Quality checks on occurrence records. Checks `occurrenceStatus`, `individualCount`, `eventDate`, `decimalLatitude`, `decimalLongitude`, `coordinateUncertaintyInMeters`, `minimumDepthInMeters`, `maximumDepthInMeters`, `scientificName`, `scientificNameID`. Checks from [Vandepitte et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4309024/pdf/bau125.pdf) flags not implemented: 3, 9, 14, 15, 16, 10, 17, 21-30.
[biopython](https://biopython.org/) | BIO | Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
## Google Sheets
Package | Description
--------|------------
[Google Sheet DarwinCore Archive Assistant add-on](https://dwcaassistant.com/) | Google Sheet add-on which assists the creation of Darwin Core Archives (DwCA) and publising to Zenodo. DwCA's are stored into user's Google Drive and can be downloaded for upload into IPT installations or other software which is able to read DwC-archives.
## Validators
Name | Description
-----|------------
[Darwin Core Archive Validator](https://tools.gbif.org/dwca-validator/) | This validator verifies the structural integrity of a Darwin Core Archive. It does not check the data values, such as coordinates, dates or scientific names.
[GBIF DATA VALIDATOR](https://www.gbif.org/tools/data-validator) | The GBIF data validator is a service that allows anyone with a GBIF-relevant dataset to receive a report on the syntactical correctness and the validity of the content contained within the dataset.
[LifeWatch Belgium](https://www.lifewatch.be/data-services/) | Through this interactive section of the LifeWatch.be portal users can upload their own data using a standard data format, and choose from several web services, models and applications to process the data.