For the analysis of new problems, we use our in-house development. Using modular design principles, a data pipeline suited for the problem at hand can be built and evaluated. A data pipeline consists of readers, tranformers, classifiers, regressors, normalizer, parameter optimizer, output generators and other modules.
For each problem the best solution among a set of candidate solutions is chosen. The optimized solution is then adapted and tested for production. All components run on a Java Virtual Machine and are therefore platform independent. The software can process simple vectors and sequences / time series data. Structured as well as unstructured data can be processed with a corresponding reader.
The library is released under an open source license and can be downloaded here: https://github.com/eonum/pipeline
Below you can find a selection of pipeline modules.
Classifier / Regressors
- Random Forests / Decision Trees / CART
- Neural Nets
- Recurrent Neural Nets (Long Short Term Memory)
- Support Vector Machines
- Linear Regression
- Logistic Regression
- Gradient Boosting
- Nearest Neighbor
- Ensemble Methods (Bagging / Boosting)
Optimization
- Genetic Algorithms
- Gradient descent
Clustering / Data Mining
- Self organizing maps / Kohonen map
- Gaussian Mixture Models
- K-Means Clustering
- EM Fuzzy Clustering
Transformators
- Principal Component Analysis
- Feature extraction and selection
- Dynamic Time Warping
Validation
- k-fold cross validation
- evaluation metrics (RMSE, AUC, Recognition Rate, LogLoss)
- Validation of meta parameters of entire data pipelines.