Questions tagged [weka]

Weka (Waikato Environment for Knowledge Analysis) is an open source machine learning library written in Java.

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

Weka is open source software issued under the GNU General Public License.

Weka's main user interface is the Explorer, but essentially the same functionality can be accessed through the component-based Knowledge Flow interface and from the command line. There is also the Experimenter, which allows the systematic comparison of the predictive performance of Weka's machine learning algorithms on a collection of datasets.

The Explorer interface features several panels providing access to the main components of the workbench:

  • The Preprocess panel has facilities for importing data from a database, a CSV file, etc., and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria.
  • The Classify panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc., or the model itself (if the model is amenable to visualization like, e.g., a decision tree).
  • The Associate panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data.
  • The Cluster panel gives access to the clustering techniques in Weka, e.g., the simple k-means algorithm. There is also an implementation of the expectation maximization algorithm for learning a mixture of normal distributions.
  • The Select attributes panel provides algorithms for identifying the most predictive attributes in a dataset.
  • The Visualize panel shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators.

Online Resources:

Use Weka in your Java Code

Weka on Sourceforge

Weka on GitHub

2896 questions
59
votes
4 answers

How to interpret weka classification?

How can we interpret the classification result in weka using naive bayes? How is mean, std deviation, weight sum and precision calculated? How is kappa statistic, mean absolute error, root mean squared error etc calculated? What is the…
user349821
  • 599
  • 1
  • 5
  • 4
55
votes
7 answers

How to read a text file with mixed encodings in Scala or Java?

I am trying to parse a CSV file, ideally using weka.core.converters.CSVLoader. However the file I have is not a valid UTF-8 file. It is mostly a UTF-8 file but some of the field values are in different encodings, so there is no encoding in which the…
Daniel Mahler
  • 6,213
  • 3
  • 40
  • 82
29
votes
5 answers

Cross Validation in Weka

I've always thought from what I read that cross validation is performed like this: In k-fold cross-validation, the original sample is randomly partitioned into k subsamples. Of the k subsamples, a single subsample is retained as the validation…
Titus Pullo
  • 3,303
  • 10
  • 41
  • 60
28
votes
4 answers

How to perform one operation on each executor once in spark

I have a weka model stored in S3 which is of size around 400MB. Now, I have some set of record on which I want to run the model and perform prediction. For performing prediction, What I have tried is, Download and load the model on driver as a…
Neha
  • 497
  • 2
  • 6
  • 14
23
votes
6 answers

Text mining with PHP

I'm doing a project for a college class I'm taking. I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes…
garyc40
  • 333
  • 1
  • 3
  • 7
20
votes
17 answers

Increase heap size in java for weka

I'm trying to increase the heap size in java for weka which keeps crashing. I used the suggested line: > java -Xmx500m -classpath but I get the following error: -classpath requires class path specification I'm not sure what this means. Any…
screechOwl
  • 23,958
  • 54
  • 146
  • 246
20
votes
8 answers

Convert CSV to ARFF using weka

I've been trying to get this dataset http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime+Unnormalized into Weka and no luck at all. I converted it to CSV and then loaded it into Weka and then tried to convert it to ARFF but still giving me…
LumberJack
  • 211
  • 1
  • 2
  • 3
18
votes
2 answers

Skip feature when classifying, but show feature in output

I've created a dataset which contains +/- 13000 rows with +/- 50 features. I know how to output every classification result: prediction and actual, but I would like to be able to output some sort of ID with those results. So i've added a ID column…
zeebonk
  • 4,274
  • 3
  • 19
  • 28
17
votes
2 answers

Sentiment analysis with NLTK python for sentences using sample data or webservice?

I am embarking upon a NLP project for sentiment analysis. I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task. Here is my…
Ke.
  • 2,224
  • 5
  • 32
  • 70
17
votes
2 answers

How to read the classifier confusion matrix in WEKA

Sorry, I am new to WEKA and just learning. In my decision tree (J48) classifier output, there is a confusion Matrix: a b <----- classified as 130 8 a = functional 15 150 b = non-functional How do I read this matrix? What's the…
JakeSays
  • 1,898
  • 6
  • 27
  • 39
16
votes
3 answers

Beginner's resources/introductions to classification algorithms

everybody. I am entirely new to the topic of classification algorithms, and need a few good pointers about where to start some "serious reading". I am right now in the process of finding out, whether machine learning and automated classification…
16
votes
2 answers

Learning Weka on the Command Line

I am fairly new to Weka and even more new to Weka on the command line. I find documentation is poor and I am struggling to figure out a few things to do. For example, want to take two .arff files, one for training, one for testing and get an…
Reily Bourne
  • 4,109
  • 8
  • 23
  • 38
16
votes
2 answers

What are data requirements for FP-Growth in Weka?

I'd like to use FP-Growth association rule algorithm on my dataset (model) in Weka. Unfortunately, this algorithm is greyed out. What are preconditions I have to meet in order to make use of it?
ŁukaszBachman
  • 32,989
  • 10
  • 61
  • 69
14
votes
2 answers

Java Weka: How to specify split percentage?

I have written the code to create the model and save it. It works fine. My understanding is data, by default, is split in 10 folds. I want data to be split into two sets (training and testing) when I create the model. On Weka UI, I can do it by…
rishi
  • 2,397
  • 4
  • 22
  • 45
13
votes
5 answers

Visualizing Weka classification tree

I am using few data sets available online and trying to visualize tree. However, it does not let me visualize tree option at all. Could anyone please guide me how to get the tree diagram in weka by using data sets available online?
Ramakrishna
1
2 3
99 100