Questions tagged [unsupervised-learning]

Unsupervised learning refers to machine learning contexts in which there is no prior 'training' period in which the learning agent is trained on objects of known type. As such, supervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimisation or maximisation of mathematical properties and not on an attempt to classify by understanding the right context.

Unsupervised learning (or clustering) refers to machine learning algorithms in which there is no 'label' available for the training data and the model tries to learn the underlying manifold. As such, unsupervised learning includes such disciplines as mathematical clustering, whereby data is segmented into clusters based on the minimization or maximization of mathematical properties and not on an attempt to classify by understanding the right context.

547 questions
6
votes
1 answer

principal component analysis (PCA) in R: which function to use?

Can anyone explain what the major differences between the prcomp and princomp functions are? Is there any particular reason why I should choose one over the other? In case this is relevant, the type of application I am looking at is a quality…
AndraD
  • 2,672
  • 6
  • 34
  • 48
6
votes
1 answer

Semi-supervised Naive Bayes with NLTK

I have built a semi-supervised version of NLTK's Naive Bayes in Python based on the EM (expectation-maximization algorithm). However, in some iterations of EM I am getting negative log-likelihoods (the log-likelihoods of EM must be positive in every…
5
votes
1 answer

Passing Target/Label data to Scikit-learn GridSearchCV's fit method for OneClassSVM

From my understanding, One-Class SVM's are trained without target/label data. One answer at Use of OneClassSVM with GridSearchCV suggests passing Target/Label data to GridSearchCV's fit method when the classifier is the OneClassSVM. How does the…
5
votes
1 answer

BERT performing worse than word2vec

I am trying to use BERT for a document ranking problem. My task is pretty straightforward. I have to do a similarity ranking for an input document. The only issue here is that I don’t have labels - so it’s more of a qualitative analysis. I am on my…
5
votes
2 answers

Custom Hebbian Layer Implementation in Keras - input/output dims and lateral node connections

I'm trying to implement an unsupervised ANN using Hebbian updating in Keras. I found a custom Hebbian layer made by Dan Saunders here - https://github.com/djsaunde/rinns_python/blob/master/hebbian/hebbian.py (I hope it is not poor form to ask…
5
votes
2 answers

How to programmatically determine the column indices of principal components using FactoMineR package?

Given a data frame containing mixed variables (i.e. both categorical and continuous) like, digits = 0:9 # set seed for reproducibility set.seed(17) # function to create random string createRandString <- function(n = 5000) { a <- do.call(paste0,…
mnm
  • 1,695
  • 2
  • 15
  • 39
5
votes
1 answer

How to prepare a dataset for speech recognition

I need to train a Bidirectional LSTM model to recognize discrete speech (individual numbers from 0 to 9) I have recorded speech from 100 speakers. What should I do next? (Suppose I am splitting them into individual .wav files containing one number…
5
votes
1 answer

scipy.optimize + kmeans clustering

I have the following setup for kmeans clustering algorithm that I am implementing for a project: import numpy as np import scipy import sys import random import matplotlib.pyplot as plt import operator class KMeansClass: #takes in an npArray…
anonuser0428
  • 8,987
  • 18
  • 55
  • 81
5
votes
8 answers

K- Means algorithm

I'm trying to program a k-means algorithm in Java. I have calculated a number of arrays, each of them containing a number of coefficients. I need to use a k-means algorithm in order to group all this data. Do you know any implementation of this…
dedalo
  • 2,441
  • 12
  • 30
  • 34
4
votes
4 answers

Selecting an appropriate similarity metric & assessing the validity of a k-means clustering model

I have implemented k-means clustering for determining the clusters in 300 objects. Each of my object has about 30 dimensions. The distance is calculated using the Euclidean metric. I need to know How would I determine if my algorithms works…
4
votes
0 answers

Unsupervised clustering of words in R without knowing k

As a beginner in NLP, I am trying to find the best way to cluster single words with unsupervised clustering, specifically where the number of clusters k is not known in advance. I have a group of words that contains clusters of words are very…
the_darkside
  • 5,688
  • 7
  • 36
  • 83
4
votes
1 answer

Implement CVAE for a single image

I have a multi-dimensional, hyper-spectral image (channels, width, height = 15, 2500, 2500). I want to compress its 15 channel dimensions into 5 channels.So, the output would be (channels, width, height = 5, 2500, 2500). One simple way to do is to…
4
votes
0 answers

Why grpreg library and gglasso library in R are giving different results for group LASSO?

I have been trying to do unsupervised feature selection using LASSO (by removing class column). The dataset includes categorical (factor) and continuous (numeric) variables. Here is the link. I built a design matrix using model.matrix() which…
4
votes
2 answers

Clustering images based on their similarity

I am facing a problem of image clustering based on their similarity, without knowing the number of clusters. Ideally i would like to achieve something that resembles this http://cs231n.github.io/assets/cnnvis/tsne.jpeg…
4
votes
1 answer

How to build an unsupervised CNN model with keras/tensorflow?

I'm trying to build a CNN for an image-to-image translation application, the input of the model is an image, and the output is a confidence map. There are no labeled confidence as the ground truth during training, but a loss function is designed to…
Jemma
  • 55
  • 1
  • 5
1 2
3
36 37