Dataset statistics
Number of variables | 17 |
---|---|
Number of observations | 520 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 269 |
Duplicate rows (%) | 51.7% |
Total size in memory | 490.8 KiB |
Average record size in memory | 966.6 B |
Variable types
BOOL | 14 |
---|---|
CAT | 2 |
NUM | 1 |
Dataset has 269 (51.7%) duplicate rows | Duplicates |
Reproduction
Analysis started | 2020-10-06 21:50:53.252642 |
---|---|
Analysis finished | 2020-10-06 21:50:56.751784 |
Duration | 3.5 seconds |
Software version | pandas-profiling v2.9.0 |
Download configuration | config.yaml |
Age
Real number (ℝ≥0)
Distinct | 51 |
---|---|
Distinct (%) | 9.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 48.02884615 |
---|---|
Minimum | 16 |
Maximum | 90 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 4.2 KiB |
Quantile statistics
Minimum | 16 |
---|---|
5-th percentile | 30 |
Q1 | 39 |
median | 47.5 |
Q3 | 57 |
95-th percentile | 68 |
Maximum | 90 |
Range | 74 |
Interquartile range (IQR) | 18 |
Descriptive statistics
Standard deviation | 12.151466 |
---|---|
Coefficient of variation (CV) | 0.2530034962 |
Kurtosis | -0.1917094141 |
Mean | 48.02884615 |
Median Absolute Deviation (MAD) | 9 |
Skewness | 0.3293593578 |
Sum | 24975 |
Variance | 147.6581258 |
Monotocity | Not monotonic |
Value | Count | Frequency (%) | |
35 | 30 | 5.8% | |
48 | 28 | 5.4% | |
30 | 25 | 4.8% | |
43 | 25 | 4.8% | |
40 | 24 | 4.6% | |
55 | 22 | 4.2% | |
47 | 21 | 4.0% | |
38 | 20 | 3.8% | |
53 | 20 | 3.8% | |
58 | 18 | 3.5% | |
45 | 18 | 3.5% | |
50 | 18 | 3.5% | |
39 | 16 | 3.1% | |
54 | 16 | 3.1% | |
57 | 15 | 2.9% | |
60 | 15 | 2.9% | |
68 | 10 | 1.9% | |
66 | 9 | 1.7% | |
28 | 9 | 1.7% | |
42 | 9 | 1.7% | |
72 | 9 | 1.7% | |
56 | 8 | 1.5% | |
36 | 8 | 1.5% | |
46 | 8 | 1.5% | |
61 | 8 | 1.5% | |
Other values (26) | 111 | 21.3% |
Value | Count | Frequency (%) | |
16 | 1 | 0.2% | |
25 | 2 | 0.4% | |
26 | 1 | 0.2% | |
27 | 6 | 1.2% | |
28 | 9 | 1.7% | |
29 | 1 | 0.2% | |
30 | 25 | 4.8% | |
31 | 3 | 0.6% | |
32 | 5 | 1.0% | |
33 | 4 | 0.8% |
Value | Count | Frequency (%) | |
90 | 2 | 0.4% | |
85 | 2 | 0.4% | |
79 | 1 | 0.2% | |
72 | 9 | 1.7% | |
70 | 5 | 1.0% | |
69 | 5 | 1.0% | |
68 | 10 | 1.9% | |
67 | 8 | 1.5% | |
66 | 9 | 1.7% | |
65 | 6 | 1.2% |
Gender
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
Male | |
---|---|
Female |
Value | Count | Frequency (%) | |
Male | 328 | 63.1% | |
Female | 192 | 36.9% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 6 |
---|---|
Median length | 4 |
Mean length | 4.738461538 |
Min length | 4 |
Most occurring characters
Value | Count | Frequency (%) | |
e | 712 | 28.9% | |
a | 520 | 21.1% | |
l | 520 | 21.1% | |
M | 328 | 13.3% | |
F | 192 | 7.8% | |
m | 192 | 7.8% |
Most occurring categories
Value | Count | Frequency (%) | |
Lowercase Letter | 1944 | 78.9% | |
Uppercase Letter | 520 | 21.1% |
Most frequent Uppercase Letter characters
Value | Count | Frequency (%) | |
M | 328 | 63.1% | |
F | 192 | 36.9% |
Most frequent Lowercase Letter characters
Value | Count | Frequency (%) | |
e | 712 | 36.6% | |
a | 520 | 26.7% | |
l | 520 | 26.7% | |
m | 192 | 9.9% |
Most occurring scripts
Value | Count | Frequency (%) | |
Latin | 2464 | 100.0% |
Most frequent Latin characters
Value | Count | Frequency (%) | |
e | 712 | 28.9% | |
a | 520 | 21.1% | |
l | 520 | 21.1% | |
M | 328 | 13.3% | |
F | 192 | 7.8% | |
m | 192 | 7.8% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 2464 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
e | 712 | 28.9% | |
a | 520 | 21.1% | |
l | 520 | 21.1% | |
M | 328 | 13.3% | |
F | 192 | 7.8% | |
m | 192 | 7.8% |
Polyuria
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 262 | 50.4% | |
Yes | 258 | 49.6% |
Polydipsia
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 287 | 55.2% | |
Yes | 233 | 44.8% |
SuddenWeightLoss
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 303 | 58.3% | |
Yes | 217 | 41.7% |
Weakness
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
Yes | |
---|---|
No |
Value | Count | Frequency (%) | |
Yes | 305 | 58.7% | |
No | 215 | 41.3% |
Polyphagia
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 283 | 54.4% | |
Yes | 237 | 45.6% |
GenitalThrush
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 404 | 77.7% | |
Yes | 116 | 22.3% |
VisualBlurring
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 287 | 55.2% | |
Yes | 233 | 44.8% |
Itching
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 267 | 51.3% | |
Yes | 253 | 48.7% |
Irritability
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 394 | 75.8% | |
Yes | 126 | 24.2% |
DelayedHealing
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 281 | 54.0% | |
Yes | 239 | 46.0% |
PartialParesis
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 296 | 56.9% | |
Yes | 224 | 43.1% |
MuscleStiffness
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 325 | 62.5% | |
Yes | 195 | 37.5% |
Alopecia
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 341 | 65.6% | |
Yes | 179 | 34.4% |
Obesity
Boolean
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
No | |
---|---|
Yes |
Value | Count | Frequency (%) | |
No | 432 | 83.1% | |
Yes | 88 | 16.9% |
Class
Categorical
Distinct | 2 |
---|---|
Distinct (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.2 KiB |
Positive | |
---|---|
Negative |
Value | Count | Frequency (%) | |
Positive | 320 | 61.5% | |
Negative | 200 | 38.5% |
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Length
Max length | 8 |
---|---|
Median length | 8 |
Mean length | 8 |
Min length | 8 |
Most occurring characters
Value | Count | Frequency (%) | |
i | 840 | 20.2% | |
e | 720 | 17.3% | |
t | 520 | 12.5% | |
v | 520 | 12.5% | |
P | 320 | 7.7% | |
o | 320 | 7.7% | |
s | 320 | 7.7% | |
N | 200 | 4.8% | |
g | 200 | 4.8% | |
a | 200 | 4.8% |
Most occurring categories
Value | Count | Frequency (%) | |
Lowercase Letter | 3640 | 87.5% | |
Uppercase Letter | 520 | 12.5% |
Most frequent Uppercase Letter characters
Value | Count | Frequency (%) | |
P | 320 | 61.5% | |
N | 200 | 38.5% |
Most frequent Lowercase Letter characters
Value | Count | Frequency (%) | |
i | 840 | 23.1% | |
e | 720 | 19.8% | |
t | 520 | 14.3% | |
v | 520 | 14.3% | |
o | 320 | 8.8% | |
s | 320 | 8.8% | |
g | 200 | 5.5% | |
a | 200 | 5.5% |
Most occurring scripts
Value | Count | Frequency (%) | |
Latin | 4160 | 100.0% |
Most frequent Latin characters
Value | Count | Frequency (%) | |
i | 840 | 20.2% | |
e | 720 | 17.3% | |
t | 520 | 12.5% | |
v | 520 | 12.5% | |
P | 320 | 7.7% | |
o | 320 | 7.7% | |
s | 320 | 7.7% | |
N | 200 | 4.8% | |
g | 200 | 4.8% | |
a | 200 | 4.8% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 4160 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
i | 840 | 20.2% | |
e | 720 | 17.3% | |
t | 520 | 12.5% | |
v | 520 | 12.5% | |
P | 320 | 7.7% | |
o | 320 | 7.7% | |
s | 320 | 7.7% | |
N | 200 | 4.8% | |
g | 200 | 4.8% | |
a | 200 | 4.8% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
Age | Gender | Polyuria | Polydipsia | SuddenWeightLoss | Weakness | Polyphagia | GenitalThrush | VisualBlurring | Itching | Irritability | DelayedHealing | PartialParesis | MuscleStiffness | Alopecia | Obesity | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 40 | Male | No | Yes | No | Yes | No | No | No | Yes | No | Yes | No | Yes | Yes | Yes | Positive |
1 | 58 | Male | No | No | No | Yes | No | No | Yes | No | No | No | Yes | No | Yes | No | Positive |
2 | 41 | Male | Yes | No | No | Yes | Yes | No | No | Yes | No | Yes | No | Yes | Yes | No | Positive |
3 | 45 | Male | No | No | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | No | No | No | Positive |
4 | 60 | Male | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Positive |
5 | 55 | Male | Yes | Yes | No | Yes | Yes | No | Yes | Yes | No | Yes | No | Yes | Yes | Yes | Positive |
6 | 57 | Male | Yes | Yes | No | Yes | Yes | Yes | No | No | No | Yes | Yes | No | No | No | Positive |
7 | 66 | Male | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | No | Yes | Yes | No | No | Positive |
8 | 67 | Male | Yes | Yes | No | Yes | Yes | Yes | No | Yes | Yes | No | Yes | Yes | No | Yes | Positive |
9 | 70 | Male | No | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No | No | No | Yes | No | Positive |
Last rows
Age | Gender | Polyuria | Polydipsia | SuddenWeightLoss | Weakness | Polyphagia | GenitalThrush | VisualBlurring | Itching | Irritability | DelayedHealing | PartialParesis | MuscleStiffness | Alopecia | Obesity | Class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
510 | 67 | Male | No | No | No | Yes | No | No | No | Yes | No | Yes | No | No | Yes | No | Negative |
511 | 66 | Male | No | No | No | Yes | Yes | No | Yes | Yes | No | Yes | Yes | Yes | Yes | No | Negative |
512 | 43 | Male | No | No | No | No | No | No | No | No | No | No | No | No | Yes | No | Negative |
513 | 62 | Female | Yes | Yes | Yes | Yes | No | No | Yes | No | No | No | Yes | No | No | Yes | Positive |
514 | 54 | Female | Yes | Yes | Yes | Yes | Yes | No | No | No | No | No | Yes | No | No | No | Positive |
515 | 39 | Female | Yes | Yes | Yes | No | Yes | No | No | Yes | No | Yes | Yes | No | No | No | Positive |
516 | 48 | Female | Yes | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | Yes | No | No | No | Positive |
517 | 58 | Female | Yes | Yes | Yes | Yes | Yes | No | Yes | No | No | No | Yes | Yes | No | Yes | Positive |
518 | 32 | Female | No | No | No | Yes | No | No | Yes | Yes | No | Yes | No | No | Yes | No | Negative |
519 | 42 | Male | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Negative |
Most frequent
Age | Gender | Polyuria | Polydipsia | SuddenWeightLoss | Weakness | Polyphagia | GenitalThrush | VisualBlurring | Itching | Irritability | DelayedHealing | PartialParesis | MuscleStiffness | Alopecia | Obesity | Class | count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 30 | Male | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Negative | 16 |
20 | 38 | Male | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Negative | 7 |
42 | 46 | Male | No | No | No | Yes | No | No | No | Yes | No | Yes | No | No | Yes | No | Negative | 7 |
65 | 53 | Male | No | No | No | Yes | Yes | No | Yes | Yes | No | Yes | Yes | Yes | Yes | No | Negative | 7 |
0 | 27 | Male | No | No | No | No | No | No | No | No | No | No | No | No | No | No | Negative | 6 |
24 | 39 | Female | Yes | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | Yes | No | No | No | Positive | 6 |
29 | 40 | Male | No | No | Yes | No | No | No | No | No | No | No | No | No | No | Yes | Negative | 6 |
34 | 43 | Female | Yes | Yes | Yes | Yes | Yes | No | Yes | No | No | No | Yes | Yes | No | Yes | Positive | 6 |
36 | 43 | Male | No | No | No | Yes | No | Yes | No | Yes | No | Yes | No | No | Yes | No | Negative | 6 |
41 | 45 | Male | No | No | No | No | Yes | Yes | No | No | No | No | No | No | No | No | Negative | 6 |