Questions tagged [standardization]

Standardization, or normalization, is a process used to make a vector of real number values have a mean of zero and a standard deviation of one. Also called standard scores or z-scores.

28 questions
104
votes
9 answers

Have there ever been silent behavior changes in C++ with new standard versions?

(I'm looking for an example or two to prove the point, not a list.) Has it ever been the case that a change in the C++ standard (e.g. from 98 to 11, 11 to 14 etc.) changed the behavior of existing, well-formed, defined-behavior user code - silently?…
einpoklum
  • 86,754
  • 39
  • 223
  • 453
3
votes
4 answers

Why don't the authors of the C99 standard specify a standard for the size of floating point types?

I noticed on Windows and Linux x86, float is a 4-byte type, double is 8, but long double is 12 and 16 on x86 and x86_64 respectively. C99 is supposed to be breaking such barriers with the specific integral sizes. The initial technological limitation…
j riv
  • 3,289
  • 6
  • 35
  • 53
2
votes
0 answers

reverse the scale of the test outcome in the LSTM

I am using standardized predictors in training set to train an LSTM model. After I predict the outcome in test set, I need to reverse the predicted score back to the original scale. Normally I could just use the predicted score * SD of the trainning…
2
votes
2 answers

Why hasn't C++ standardized overloads of algorithms which operate on entire containers?

Standard ISO C++ has a rich algorithm library including plenty of syntactic sugar like std::max_element, std::fill, std::count, etc. I'm having a hard time understanding why ISO saw fit to standardize many such trivial algorithms, yet not overloads…
Tumbleweed53
  • 1,334
  • 5
  • 9
2
votes
1 answer

RegEx question: standardization of medical terms

I need to detect words as 'bot/hersen/levermetastase' and transform them into 'botmetastase, hersenmetastase, levermetastase'. But also 'lever/botmetastase' into 'levermetastase, botmetastase'. So I need to be sure the "word/word/word metastase" is…
LaureAnne
  • 23
  • 4
2
votes
1 answer

StandardScaler giving non-uniform standard deviation

My problem setup is as follows: Python 3.7, Pandas version 1.0.3, and sklearn version 0.22.1. I am applying a StandardScaler (to every column of a float matrix) per usual. However, the columns that I get out do not have standard deviation =1, while…
Zhubarb
  • 8,409
  • 17
  • 65
  • 100
1
vote
1 answer

Standardizing a vector in R so that values shift towards boundaries

I have vector as follows - a <- c(0.211, 0.028, 0.321, 0.072, -0.606, -0.364, -0.066, 0.172, -0.917, 0.062, 0.117, -0.136, -0.296, 0.022, 0.046, -0.19, 0.057, -0.625, -0.01, 0.158, 0.407, -0.328, -0.347, -0.512, -0.101, 0.008, -0.406, -0.014,…
Saurabh
  • 614
  • 1
  • 15
1
vote
1 answer

How do you remerge the response variable to the data frame after removing it for standardization?

I have a dataset with 61 columns (60 explanatory variables and 1 response variable). All the explantory variables all numerical, and the response is categorical (Default).Some of the ex. variables have negative values (financial data), and therefore…
thosed
  • 13
  • 3
1
vote
1 answer

Standardization Result is different between Patsy & Pandas - Python

I found an interesting question and I would love to hear your interpretation. from patsy import dmatrix,demo_data df = pd.DataFrame(demo_data("a", "b", "x1", "x2", "y", "z column")) Patsy_Standarlize_Output = dmatrix("standardize(x2) +…
vae
  • 87
  • 6
1
vote
1 answer

What is the correct way to use standardization/normalization in combination with K-Fold Cross Validation?

I have always learned that standardization or normalization should be fit only on the training set, and then be used to transform the test set. So what I'd do is: scaler = StandardScaler() scaler.fit_transform(X_train) scaler.transform(X_test) Now…
1
vote
1 answer

How to implement PySpark StandardScaler on subset of columns?

I want to use pyspark StandardScaler on 6 out of 10 columns in my dataframe. This will be part of a pipeline. The inputCol parameter seems to expect a vector, which I can pass in after using VectorAssembler on all my features, but this scales all 10…
Insu Q
  • 353
  • 1
  • 9
1
vote
1 answer

Sklearn.pipeline producing incorrect result

I am trying to construct a pipeline with a StandardScaler() and LogisticRegression(). I get different results when I code it with and without the pipeline. Here's my code without the pipeline: clf_LR = linear_model.LogisticRegression() scalar =…
0
votes
1 answer

How to find out StandardScaling parameters .mean_ and .scale_ when using Column Transformer from Scikit-learn?

I want to apply StandardScaler only to the numerical parts of my dataset using the function sklearn.compose.ColumnTransformer, (the rest is already one-hot encoded). I would like to see .scale_ and .mean_ parameters fitted to the training data, but…
0
votes
1 answer

How to standardize city names inserted by user

I need to write a small ETL pipeline because I need to move some data from a source database to a target database (a datawarehouse) to perform some analysis on data. Among those data, I need to clean and conform the name of cities. Cities are…
Ciccio
  • 1,703
  • 3
  • 23
  • 62
0
votes
0 answers

Median centralization and median standardization

I have doubts on making my samples comparable with each other. I have 3 replicates for each 2 group (Test and Control). I want to look at how proteins change. For that, I firstly did median centralization for each column of my replicate. Then, I…
1
2