Difference between revisions of "Logistic regression"

From Libre Pathology
Jump to navigation Jump to search
Line 1: Line 1:
'''Logistic regression''', abbreviated '''LR''', is a type of curve fitting. It used for categorical outcome variables. Examples of categorical outcomes variables are: (1) pass/fail, (2) dead/alive.
'''Logistic regression''', abbreviated '''LR''', is a type of curve fitting. It used for categorical dependent (outcome) variables. Examples of categorical dependent variables are: (1) pass/fail, (2) dead/alive.


==Special types of LR==
==Special types of LR==

Revision as of 16:36, 24 March 2018

Logistic regression, abbreviated LR, is a type of curve fitting. It used for categorical dependent (outcome) variables. Examples of categorical dependent variables are: (1) pass/fail, (2) dead/alive.

Special types of LR

Multinomial LR

If the dependent variable is categorical and the categories are mutually exclusive (e.g. benign, suspicious, malignant, insufficient), multinomial logistic regression is used.

Ordered LR

If the dependent variable is categorical and can be ordered in a meaningful way (e.g. Grade 1, Grade 2, Grade 3), ordered logistic regrssion is used.[1]

GNU/Octave example

% -------------------------------------------------------------------------------------------------
% TO RUN THIS FROM WITHIN OCTAVE
% 	run logistic_regression_example.m
%
% TO RUN FROM THE COMMAND LINE
% 	octave logistic_regression_example.m > tmp.txt
% -------------------------------------------------------------------------------------------------
clear all;
close all;

% Data from example on 'Logistic regression' page of Wikipedia - https://en.wikipedia.org/wiki/Logistic_regression
y = [ 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1 ];
x = [ 0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75, 3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50 ];

y=transpose(y);
x=transpose(x);

[theta, beta, dev, dl, d2l, gamma] = logistic_regression(y,x,1)

% calculate standard error
se = sqrt (diag (inv (-d2l)))

% create array  
%   [[ alpha 	se_alpha	zvalue_alpha 	zvalue_alpha^2		P_Wald_alpha	]
%    [ beta_1	se_beta_1	zvalue_beta_1	zvalue_beta_1^2		P_Wald_beta_1	]
%    [ beta_2	se_beta_2	zvalue_beta_2	zvalue_beta_2^2		P_Wald_beta_2	]
%    [ ...	...		...		...			...		]
%    [ beta_n	se_beta_n	zvalue_beta_n	zvalue_beta_n^2		P_Wald_beta_n  	]]
logit_arr=zeros(size(beta,1)+1,5);
logit_arr(1,1)=theta;
logit_arr(2:end,1)=beta;
logit_arr(1:end,2)=se;
logit_arr(1:end,3)=logit_arr(1:end,1)./logit_arr(1:end,2);	% zvalue_i = coefficient_i/se_coefficient_i
logit_arr(1:end,4)=logit_arr(1:end,3).*logit_arr(1:end,3);	% zvalue_i^2
logit_arr(1:end,5)=1-chi2cdf(logit_arr(1:end,4),1)		% Wald statistic is calculated as per formula in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1065119/

% calculate the probability for a range of values
studytime_arr= [ 1, 2, 3, 4, 5]
len_studytime_arr=size(studytime_arr,2);

% As per the manual ( https://www.gnu.org/software/octave/doc/v4.0.3/Correlation-and-Regression-Analysis.html ) GNU/Octave function fits:
%	logit (gamma_i (x)) = theta_i - beta' * x,   i = 1 ... k-1

for ctr=1:len_studytime_arr
  P_logit(ctr)=1/(1+exp(theta-studytime_arr(ctr)*beta));
end

% print to screen output
P_logit
logit_arr

% create plot
len_x = size(x,1)

for ctr=1:len_x
  P_fitted(ctr)=1/(1+exp(theta-x(ctr)*beta));
end

plot(x,y,'o')
hold on;
grid on;
plot(x,P_fitted,'-')
xlabel('Study time (hours)');
ylabel('Probability of passing the exam (-)');

See also

References

  1. Ordinal Logistic Regression | R Data Analysis Examples. UCLA. URL: https://stats.idre.ucla.edu/r/dae/ordinal-logistic-regression/. Accessed on: March 24, 2018

External links