نمای کلی (مجموعه داده‌ی تشخیص دیابت در مراحل اولیه)

Dataset statistics

Number of variables17
Number of observations520
Missing cells0
Missing cells (%)0.0%
Duplicate rows269
Duplicate rows (%)51.7%
Total size in memory490.8 KiB
Average record size in memory966.6 B

Variable types

BOOL14
CAT2
NUM1

Warnings

Dataset has 269 (51.7%) duplicate rows Duplicates

Reproduction

Analysis started2020-10-06 21:50:53.252642
Analysis finished2020-10-06 21:50:56.751784
Duration3.5 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

متغیرها (ابعاد مسئله)

Age
Real number (ℝ≥0)

Distinct51
Distinct (%)9.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean48.02884615
Minimum16
Maximum90
Zeros0
Zeros (%)0.0%
Memory size4.2 KiB
2020-10-07T01:20:56.873338image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum16
5-th percentile30
Q139
median47.5
Q357
95-th percentile68
Maximum90
Range74
Interquartile range (IQR)18

Descriptive statistics

Standard deviation12.151466
Coefficient of variation (CV)0.2530034962
Kurtosis-0.1917094141
Mean48.02884615
Median Absolute Deviation (MAD)9
Skewness0.3293593578
Sum24975
Variance147.6581258
MonotocityNot monotonic
2020-10-07T01:20:57.068282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
35305.8%
 
48285.4%
 
30254.8%
 
43254.8%
 
40244.6%
 
55224.2%
 
47214.0%
 
38203.8%
 
53203.8%
 
58183.5%
 
45183.5%
 
50183.5%
 
39163.1%
 
54163.1%
 
57152.9%
 
60152.9%
 
68101.9%
 
6691.7%
 
2891.7%
 
4291.7%
 
7291.7%
 
5681.5%
 
3681.5%
 
4681.5%
 
6181.5%
 
Other values (26)11121.3%
 
ValueCountFrequency (%) 
1610.2%
 
2520.4%
 
2610.2%
 
2761.2%
 
2891.7%
 
2910.2%
 
30254.8%
 
3130.6%
 
3251.0%
 
3340.8%
 
ValueCountFrequency (%) 
9020.4%
 
8520.4%
 
7910.2%
 
7291.7%
 
7051.0%
 
6951.0%
 
68101.9%
 
6781.5%
 
6691.7%
 
6561.2%
 

Gender
Categorical

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
Male
328 
Female
192 
ValueCountFrequency (%) 
Male32863.1%
 
Female19236.9%
 
2020-10-07T01:20:57.260903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-07T01:20:57.383979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T01:20:57.473498image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.738461538
Min length4

Overview of Unicode Properties

Unique unicode characters6
Unique unicode categories2 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e71228.9%
 
a52021.1%
 
l52021.1%
 
M32813.3%
 
F1927.8%
 
m1927.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter194478.9%
 
Uppercase Letter52021.1%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M32863.1%
 
F19236.9%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e71236.6%
 
a52026.7%
 
l52026.7%
 
m1929.9%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin2464100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e71228.9%
 
a52021.1%
 
l52021.1%
 
M32813.3%
 
F1927.8%
 
m1927.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII2464100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e71228.9%
 
a52021.1%
 
l52021.1%
 
M32813.3%
 
F1927.8%
 
m1927.8%
 

Polyuria
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
262 
Yes
258 
ValueCountFrequency (%) 
No26250.4%
 
Yes25849.6%
 
2020-10-07T01:20:57.607597image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Polydipsia
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
287 
Yes
233 
ValueCountFrequency (%) 
No28755.2%
 
Yes23344.8%
 
2020-10-07T01:20:57.651336image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
303 
Yes
217 
ValueCountFrequency (%) 
No30358.3%
 
Yes21741.7%
 
2020-10-07T01:20:57.695156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Weakness
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
Yes
305 
No
215 
ValueCountFrequency (%) 
Yes30558.7%
 
No21541.3%
 
2020-10-07T01:20:57.740040image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Polyphagia
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
283 
Yes
237 
ValueCountFrequency (%) 
No28354.4%
 
Yes23745.6%
 
2020-10-07T01:20:57.784354image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
404 
Yes
116 
ValueCountFrequency (%) 
No40477.7%
 
Yes11622.3%
 
2020-10-07T01:20:57.829304image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
287 
Yes
233 
ValueCountFrequency (%) 
No28755.2%
 
Yes23344.8%
 
2020-10-07T01:20:57.872905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Itching
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
267 
Yes
253 
ValueCountFrequency (%) 
No26751.3%
 
Yes25348.7%
 
2020-10-07T01:20:57.916507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
394 
Yes
126 
ValueCountFrequency (%) 
No39475.8%
 
Yes12624.2%
 
2020-10-07T01:20:57.960324image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
281 
Yes
239 
ValueCountFrequency (%) 
No28154.0%
 
Yes23946.0%
 
2020-10-07T01:20:58.004181image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
296 
Yes
224 
ValueCountFrequency (%) 
No29656.9%
 
Yes22443.1%
 
2020-10-07T01:20:58.048261image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
325 
Yes
195 
ValueCountFrequency (%) 
No32562.5%
 
Yes19537.5%
 
2020-10-07T01:20:58.092209image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Alopecia
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
341 
Yes
179 
ValueCountFrequency (%) 
No34165.6%
 
Yes17934.4%
 
2020-10-07T01:20:58.136057image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Obesity
Boolean

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
No
432 
Yes
88 
ValueCountFrequency (%) 
No43283.1%
 
Yes8816.9%
 
2020-10-07T01:20:58.179991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Class
Categorical

Distinct2
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size4.2 KiB
Positive
320 
Negative
200 
ValueCountFrequency (%) 
Positive32061.5%
 
Negative20038.5%
 
2020-10-07T01:20:58.263100image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-10-07T01:20:58.386120image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T01:20:58.464302image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length8
Min length8

Overview of Unicode Properties

Unique unicode characters10
Unique unicode categories2 ?
Unique unicode scripts1 ?
Unique unicode blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
i84020.2%
 
e72017.3%
 
t52012.5%
 
v52012.5%
 
P3207.7%
 
o3207.7%
 
s3207.7%
 
N2004.8%
 
g2004.8%
 
a2004.8%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter364087.5%
 
Uppercase Letter52012.5%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
P32061.5%
 
N20038.5%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
i84023.1%
 
e72019.8%
 
t52014.3%
 
v52014.3%
 
o3208.8%
 
s3208.8%
 
g2005.5%
 
a2005.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin4160100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
i84020.2%
 
e72017.3%
 
t52012.5%
 
v52012.5%
 
P3207.7%
 
o3207.7%
 
s3207.7%
 
N2004.8%
 
g2004.8%
 
a2004.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII4160100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
i84020.2%
 
e72017.3%
 
t52012.5%
 
v52012.5%
 
P3207.7%
 
o3207.7%
 
s3207.7%
 
N2004.8%
 
g2004.8%
 
a2004.8%
 

اثر متقابل داده‌ها

2020-10-07T01:20:55.777753image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

همبستگی‌ها

2020-10-07T01:20:58.599720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-10-07T01:20:58.704784image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-10-07T01:20:58.811621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-10-07T01:20:58.949163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-10-07T01:20:59.356033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

مقادیر گم‌شده

2020-10-07T01:20:56.062639image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-10-07T01:20:56.535856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

نمونه‌ی داده‌ها

First rows

AgeGenderPolyuriaPolydipsiaSuddenWeightLossWeaknessPolyphagiaGenitalThrushVisualBlurringItchingIrritabilityDelayedHealingPartialParesisMuscleStiffnessAlopeciaObesityClass
040MaleNoYesNoYesNoNoNoYesNoYesNoYesYesYesPositive
158MaleNoNoNoYesNoNoYesNoNoNoYesNoYesNoPositive
241MaleYesNoNoYesYesNoNoYesNoYesNoYesYesNoPositive
345MaleNoNoYesYesYesYesNoYesNoYesNoNoNoNoPositive
460MaleYesYesYesYesYesNoYesYesYesYesYesYesYesYesPositive
555MaleYesYesNoYesYesNoYesYesNoYesNoYesYesYesPositive
657MaleYesYesNoYesYesYesNoNoNoYesYesNoNoNoPositive
766MaleYesYesYesYesNoNoYesYesYesNoYesYesNoNoPositive
867MaleYesYesNoYesYesYesNoYesYesNoYesYesNoYesPositive
970MaleNoYesYesYesYesNoYesYesYesNoNoNoYesNoPositive

Last rows

AgeGenderPolyuriaPolydipsiaSuddenWeightLossWeaknessPolyphagiaGenitalThrushVisualBlurringItchingIrritabilityDelayedHealingPartialParesisMuscleStiffnessAlopeciaObesityClass
51067MaleNoNoNoYesNoNoNoYesNoYesNoNoYesNoNegative
51166MaleNoNoNoYesYesNoYesYesNoYesYesYesYesNoNegative
51243MaleNoNoNoNoNoNoNoNoNoNoNoNoYesNoNegative
51362FemaleYesYesYesYesNoNoYesNoNoNoYesNoNoYesPositive
51454FemaleYesYesYesYesYesNoNoNoNoNoYesNoNoNoPositive
51539FemaleYesYesYesNoYesNoNoYesNoYesYesNoNoNoPositive
51648FemaleYesYesYesYesYesNoNoYesYesYesYesNoNoNoPositive
51758FemaleYesYesYesYesYesNoYesNoNoNoYesYesNoYesPositive
51832FemaleNoNoNoYesNoNoYesYesNoYesNoNoYesNoNegative
51942MaleNoNoNoNoNoNoNoNoNoNoNoNoNoNoNegative

سطرهای تکراری

Most frequent

AgeGenderPolyuriaPolydipsiaSuddenWeightLossWeaknessPolyphagiaGenitalThrushVisualBlurringItchingIrritabilityDelayedHealingPartialParesisMuscleStiffnessAlopeciaObesityClasscount
430MaleNoNoNoNoNoNoNoNoNoNoNoNoNoNoNegative16
2038MaleNoNoNoNoNoNoNoNoNoNoNoNoNoNoNegative7
4246MaleNoNoNoYesNoNoNoYesNoYesNoNoYesNoNegative7
6553MaleNoNoNoYesYesNoYesYesNoYesYesYesYesNoNegative7
027MaleNoNoNoNoNoNoNoNoNoNoNoNoNoNoNegative6
2439FemaleYesYesYesYesYesNoNoYesYesYesYesNoNoNoPositive6
2940MaleNoNoYesNoNoNoNoNoNoNoNoNoNoYesNegative6
3443FemaleYesYesYesYesYesNoYesNoNoNoYesYesNoYesPositive6
3643MaleNoNoNoYesNoYesNoYesNoYesNoNoYesNoNegative6
4145MaleNoNoNoNoYesYesNoNoNoNoNoNoNoNoNegative6