Essential numerical summaries for missing values

This section introduces you to multiple essential functions to summarize missing values into a single number.

Series summaries

Dummy data for demostration purposes

import pandas as pd
import pandas_missing
s = pd.Series(range(0, 100))
s[40:60] = None
s[90:95] = None

s
0      0.0
1      1.0
2      2.0
3      3.0
4      4.0
      ... 
95    95.0
96    96.0
97    97.0
98    98.0
99    99.0
Length: 100, dtype: float64

Missingness number


source

PandasMissingSeries.number_missing

 PandasMissingSeries.number_missing ()

Return the number of missing values in Series.

s.missing.number_missing()
25

source

PandasMissingSeries.number_complete

 PandasMissingSeries.number_complete ()

Return the number of non-missing values in the Series.

s.missing.number_complete()
75

Missingness proportion


source

PandasMissingSeries.proportion_missing

 PandasMissingSeries.proportion_missing ()

Return the proportion of missing values in the Series.

s.missing.proportion_missing()
0.25

source

PandasMissingSeries.proportion_complete

 PandasMissingSeries.proportion_complete ()

Return the proportion of non-missing values in the Series

s.missing.proportion_complete()
0.75

Missingness percentage


source

PandasMissingSeries.percentage_missing

 PandasMissingSeries.percentage_missing ()

Return the percentage of missing values in the Series

s.missing.percentage_missing()
25.0

source

PandasMissingSeries.percentage_complete

 PandasMissingSeries.percentage_complete ()

Return the percentage of non-missing values in the Series

s.missing.percentage_complete()
75.0

DataFrame summaries

Dummy data for demostration purposes

import pandas as pd
import pandas_missing
df = pd.DataFrame.from_dict(
    {
       "a": range(0, 10),
       "b": range(10, 20),
       "c": range(20, 30),
       "d": range(30, 40),
       "e": range(40, 50)
    }
)

df.iloc[1:4, 0] = None
df.iloc[9, 0] = None
df.iloc[5:7, 1] = None

df
a b c d e
0 0.0 10.0 20 30 40
1 NaN 11.0 21 31 41
2 NaN 12.0 22 32 42
3 NaN 13.0 23 33 43
4 4.0 14.0 24 34 44
5 5.0 NaN 25 35 45
6 6.0 NaN 26 36 46
7 7.0 17.0 27 37 47
8 8.0 18.0 28 38 48
9 NaN 19.0 29 39 49

Overall missingness number


source

PandasMissingDataFrame.number_missing

 PandasMissingDataFrame.number_missing ()

Return the number of missing values in the entire DataFrame.

df.missing.number_missing()
6

source

PandasMissingDataFrame.number_complete

 PandasMissingDataFrame.number_complete ()

Return the number of non-missing values in the entire DataFrame.

df.missing.number_complete()
44

Overall missingness proportion


source

PandasMissingDataFrame.proportion_missing

 PandasMissingDataFrame.proportion_missing ()

Return the proportion of missing values in the entire DataFrame.

df.missing.proportion_missing()
0.12

source

PandasMissingDataFrame.proportion_complete

 PandasMissingDataFrame.proportion_complete ()

Return the proportion of non-missing values in the entire DataFrame.

df.missing.proportion_complete()
0.88

Overall missingness percentage


source

PandasMissingDataFrame.percentage_missing

 PandasMissingDataFrame.percentage_missing ()

Return the percentage of missing values in the entire DataFrame.

df.missing.percentage_missing()
12.0

source

PandasMissingDataFrame.percentage_complete

 PandasMissingDataFrame.percentage_complete ()

Return the percentage of non-missing values in the entire DataFrame.

df.missing.percentage_complete()
88.0

Missingness number throughout variables


source

PandasMissingDataFrame.number_variable_missing

 PandasMissingDataFrame.number_variable_missing ()

Return the number of variables with missing values.

df.missing.number_variable_missing()
2

source

PandasMissingDataFrame.number_variable_complete

 PandasMissingDataFrame.number_variable_complete ()

Return the number of variables with non-missing values.

df.missing.number_variable_complete()
3

Missingness proportion throughout variables


source

PandasMissingDataFrame.proportion_variable_missing

 PandasMissingDataFrame.proportion_variable_missing ()

Return the proportion of variables with missing values.

df.missing.proportion_variable_missing()
0.4

source

PandasMissingDataFrame.proportion_variable_complete

 PandasMissingDataFrame.proportion_variable_complete ()

Return the proportion of variables with non-missing values.

df.missing.proportion_variable_complete()
0.6

Missingness percentage throughout variables


source

PandasMissingDataFrame.percentage_variable_missing

 PandasMissingDataFrame.percentage_variable_missing ()

Return the percentage of variables with missing values.

df.missing.percentage_variable_missing()
40.0

source

PandasMissingDataFrame.percentage_variable_complete

 PandasMissingDataFrame.percentage_variable_complete ()

Return the percentage of variables with non-missing values.

df.missing.percentage_variable_complete()
60.0

Missingness number throughout cases


source

PandasMissingDataFrame.number_case_missing

 PandasMissingDataFrame.number_case_missing ()

Return the number of cases with missing values.

df.missing.number_case_missing()
6

source

PandasMissingDataFrame.number_case_complete

 PandasMissingDataFrame.number_case_complete ()

Return the number of cases with non-missing values.

df.missing.number_case_complete()
4

Missingness proportion throughout cases


source

PandasMissingDataFrame.proportion_case_missing

 PandasMissingDataFrame.proportion_case_missing ()

Return the proportion of cases with missing values.

df.missing.proportion_case_missing()
0.6

source

PandasMissingDataFrame.proportion_case_complete

 PandasMissingDataFrame.proportion_case_complete ()

Return the proportion of cases with non-missing values.

df.missing.proportion_case_complete()
0.4

Missingness percentage throughout cases


source

PandasMissingDataFrame.percentage_case_missing

 PandasMissingDataFrame.percentage_case_missing ()

Return the percentage of cases with missing values.

df.missing.percentage_case_missing()
60.0

source

PandasMissingDataFrame.percentage_case_complete

 PandasMissingDataFrame.percentage_case_complete ()

Return the percentage of cases with non-missing values.

df.missing.percentage_case_complete()
40.0