{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "inl5d-ASWTdu"
},
"source": [
"# Big Data seaborn Solution\n",
"## Big Data definition\n",
"\n",
"**Big Data** - files that are complicated to process using conventional computer or software.\n",
"\n",
"## Example\n",
"Table with more than 1,048,576 rows would be impossible to process in MS Excel. Python, R and MatLab are limited by RAM and memory.\n",
"\n",
"## Solution\n",
"Use specialized environment as Metacentrum computers, specialized servers (Mazlík) or Google Collab!"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "n-TTp50DX_h7"
},
"source": [
"# Practical example\n",
"\n",
"## MICROSTRUCTURE OF INFANT FORMULA RELATED TO ITS FUNCTION in 2D\n",
"\n",
"* Infant formula \n",
" * pH 7\n",
" * drag5 (Far-Red)\n",
" * processed using starDist \n",
" * 660 000 rows. 35 columns of data (350+ MB csv file)\n",
" * Dataset/Results_aptamil.csv\n",
"\n",
"* Human milk (pH 7)\n",
" * pH 7\n",
" * drag5 (Far-Red)\n",
" * processed using starDist\n",
" * 1 650 000 rows. 35 columns of data (150+ MB csv file)\n",
" * Dataset/Results_MM.csv\n",
"\n",
"## Goal\n",
"\n",
"\n",
"1. Load both data sets\n",
"2. Clear NaN values\n",
"3. Visualize Area distribution\n",
"4. Decide statistical distribution (parametric/nonparametric)\n",
"5. Define statistic hypothesis\n",
"6. Compare Area distributions\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ay3rZRl1nEqx"
},
"source": [
"## Expected results from 3D measurements\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!pip install -q -r requirements.txt"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"executionInfo": {
"elapsed": 1591,
"status": "ok",
"timestamp": 1669202735385,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "LjIAEuPbG-Js"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import matplotlib"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"executionInfo": {
"elapsed": 23080,
"status": "ok",
"timestamp": 1669202820272,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "F1aOBUj0HIwA"
},
"outputs": [],
"source": [
"data1Path = \"Dataset/Results_MM.csv\" #@param {type:\"string\"}\n",
"\n",
"data1=pd.read_csv(data1Path) "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 236
},
"executionInfo": {
"elapsed": 16,
"status": "ok",
"timestamp": 1669202820277,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "jkJuJNFQJMr3",
"outputId": "54228aaf-8dc1-451f-af86-0807f911a03f"
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Label | \n",
" Area | \n",
" Mean | \n",
" StdDev | \n",
" Mode | \n",
" Min | \n",
" Max | \n",
" X | \n",
" Y | \n",
" ... | \n",
" %Area | \n",
" RawIntDen | \n",
" Slice | \n",
" FeretX | \n",
" FeretY | \n",
" FeretAngle | \n",
" MinFeret | \n",
" AR | \n",
" Round | \n",
" Solidity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Data1 | \n",
" 2541 | \n",
" 56.0 | \n",
" 0.0 | \n",
" 56 | \n",
" 56 | \n",
" 56 | \n",
" 598.54486 | \n",
" 170.83176 | \n",
" ... | \n",
" 100 | \n",
" 142296.0 | \n",
" 1 | \n",
" 593 | \n",
" 142 | \n",
" 98.53077 | \n",
" 55.75403 | \n",
" 1.06228 | \n",
" 0.94137 | \n",
" 0.97319 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Data1 | \n",
" 2419 | \n",
" 96.0 | \n",
" 0.0 | \n",
" 96 | \n",
" 96 | \n",
" 96 | \n",
" 2718.95432 | \n",
" 624.50827 | \n",
" ... | \n",
" 100 | \n",
" 232224.0 | \n",
" 1 | \n",
" 2711 | \n",
" 653 | \n",
" 72.47443 | \n",
" 52.97387 | \n",
" 1.10986 | \n",
" 0.90102 | \n",
" 0.97032 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Data1 | \n",
" 1855 | \n",
" 4.0 | \n",
" 0.0 | \n",
" 4 | \n",
" 4 | \n",
" 4 | \n",
" 237.42237 | \n",
" 661.82938 | \n",
" ... | \n",
" 100 | \n",
" 7420.0 | \n",
" 1 | \n",
" 224 | \n",
" 683 | \n",
" 56.30993 | \n",
" 48.00000 | \n",
" 1.03663 | \n",
" 0.96466 | \n",
" 0.96867 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Data1 | \n",
" 2596 | \n",
" 65.0 | \n",
" 0.0 | \n",
" 65 | \n",
" 65 | \n",
" 65 | \n",
" 293.91371 | \n",
" 820.11710 | \n",
" ... | \n",
" 100 | \n",
" 168740.0 | \n",
" 1 | \n",
" 266 | \n",
" 807 | \n",
" 163.66396 | \n",
" 56.02472 | \n",
" 1.05496 | \n",
" 0.94790 | \n",
" 0.97320 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Data1 | \n",
" 1409 | \n",
" 31.0 | \n",
" 0.0 | \n",
" 31 | \n",
" 31 | \n",
" 31 | \n",
" 2998.58375 | \n",
" 1046.93364 | \n",
" ... | \n",
" 100 | \n",
" 43679.0 | \n",
" 1 | \n",
" 2989 | \n",
" 1068 | \n",
" 68.42869 | \n",
" 39.00000 | \n",
" 1.15185 | \n",
" 0.86817 | \n",
" 0.96739 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 36 columns
\n",
"
"
],
"text/plain": [
" Label Area Mean StdDev Mode Min Max X Y ... \\\n",
"0 1 Data1 2541 56.0 0.0 56 56 56 598.54486 170.83176 ... \n",
"1 2 Data1 2419 96.0 0.0 96 96 96 2718.95432 624.50827 ... \n",
"2 3 Data1 1855 4.0 0.0 4 4 4 237.42237 661.82938 ... \n",
"3 4 Data1 2596 65.0 0.0 65 65 65 293.91371 820.11710 ... \n",
"4 5 Data1 1409 31.0 0.0 31 31 31 2998.58375 1046.93364 ... \n",
"\n",
" %Area RawIntDen Slice FeretX FeretY FeretAngle MinFeret AR \\\n",
"0 100 142296.0 1 593 142 98.53077 55.75403 1.06228 \n",
"1 100 232224.0 1 2711 653 72.47443 52.97387 1.10986 \n",
"2 100 7420.0 1 224 683 56.30993 48.00000 1.03663 \n",
"3 100 168740.0 1 266 807 163.66396 56.02472 1.05496 \n",
"4 100 43679.0 1 2989 1068 68.42869 39.00000 1.15185 \n",
"\n",
" Round Solidity \n",
"0 0.94137 0.97319 \n",
"1 0.90102 0.97032 \n",
"2 0.96466 0.96867 \n",
"3 0.94790 0.97320 \n",
"4 0.86817 0.96739 \n",
"\n",
"[5 rows x 36 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 364
},
"executionInfo": {
"elapsed": 2465,
"status": "ok",
"timestamp": 1669202822729,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "3_dyCfMCmHpw",
"outputId": "37dd2d1c-1df4-4879-c8c4-7efd4b5a1914"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\schatzm\\Anaconda3\\envs\\julab\\lib\\site-packages\\numpy\\lib\\function_base.py:4527: RuntimeWarning: invalid value encountered in subtract\n",
" diff_b_a = subtract(b, a)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Area | \n",
" Mean | \n",
" StdDev | \n",
" Mode | \n",
" Min | \n",
" Max | \n",
" X | \n",
" Y | \n",
" XM | \n",
" ... | \n",
" %Area | \n",
" RawIntDen | \n",
" Slice | \n",
" FeretX | \n",
" FeretY | \n",
" FeretAngle | \n",
" MinFeret | \n",
" AR | \n",
" Round | \n",
" Solidity | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" ... | \n",
" 1615323.0 | \n",
" 1.615323e+06 | \n",
" 1615323.0 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
" 1.615323e+06 | \n",
"
\n",
" \n",
" mean | \n",
" 8.076620e+05 | \n",
" 1.331084e+02 | \n",
" 6.671007e+03 | \n",
" 2.810688e+02 | \n",
" 6.488793e+03 | \n",
" 6.444779e+03 | \n",
" 7.109614e+03 | \n",
" 1.440901e+03 | \n",
" 1.394038e+03 | \n",
" 1.440899e+03 | \n",
" ... | \n",
" 100.0 | \n",
" 4.633171e+05 | \n",
" 1.0 | \n",
" 1.436129e+03 | \n",
" 1.392875e+03 | \n",
" 1.049987e+02 | \n",
" 1.030235e+01 | \n",
" 1.249056e+00 | \n",
" 8.442305e-01 | \n",
" 8.899525e-01 | \n",
"
\n",
" \n",
" std | \n",
" 4.663037e+05 | \n",
" 2.516350e+02 | \n",
" 6.086050e+03 | \n",
" 1.091096e+03 | \n",
" 6.101919e+03 | \n",
" 6.101029e+03 | \n",
" 6.399855e+03 | \n",
" 8.383394e+02 | \n",
" 8.243418e+02 | \n",
" 8.383402e+02 | \n",
" ... | \n",
" 0.0 | \n",
" 7.302585e+05 | \n",
" 0.0 | \n",
" 8.382947e+02 | \n",
" 8.243501e+02 | \n",
" 4.822601e+01 | \n",
" 6.097962e+00 | \n",
" 3.688458e-01 | \n",
" 1.536807e-01 | \n",
" 5.326279e-02 | \n",
"
\n",
" \n",
" min | \n",
" 1.000000e+00 | \n",
" 1.200000e+01 | \n",
" 1.000000e+00 | \n",
" 0.000000e+00 | \n",
" 1.000000e+00 | \n",
" 1.000000e+00 | \n",
" 1.000000e+00 | \n",
" 3.409090e+00 | \n",
" 3.166670e+00 | \n",
" 3.409090e+00 | \n",
" ... | \n",
" 100.0 | \n",
" 7.000000e+01 | \n",
" 1.0 | \n",
" 1.000000e+00 | \n",
" 1.000000e+00 | \n",
" 1.507440e+00 | \n",
" 3.000000e+00 | \n",
" 1.000000e+00 | \n",
" 1.111700e-01 | \n",
" 3.585100e-01 | \n",
"
\n",
" \n",
" 25% | \n",
" 4.038315e+05 | \n",
" 3.700000e+01 | \n",
" 1.956000e+03 | \n",
" 0.000000e+00 | \n",
" 1.768000e+03 | \n",
" 1.723000e+03 | \n",
" 2.069000e+03 | \n",
" 7.161416e+02 | \n",
" 6.896124e+02 | \n",
" 7.161596e+02 | \n",
" ... | \n",
" 100.0 | \n",
" 1.616980e+05 | \n",
" 1.0 | \n",
" 7.120000e+02 | \n",
" 6.880000e+02 | \n",
" 5.446232e+01 | \n",
" 7.000000e+00 | \n",
" 1.061510e+00 | \n",
" 8.218400e-01 | \n",
" 8.709700e-01 | \n",
"
\n",
" \n",
" 50% | \n",
" 8.076620e+05 | \n",
" 5.800000e+01 | \n",
" 4.868000e+03 | \n",
" 0.000000e+00 | \n",
" 4.595000e+03 | \n",
" 4.534000e+03 | \n",
" 5.249000e+03 | \n",
" 1.434662e+03 | \n",
" 1.368470e+03 | \n",
" 1.434676e+03 | \n",
" ... | \n",
" 100.0 | \n",
" 3.169700e+05 | \n",
" 1.0 | \n",
" 1.430000e+03 | \n",
" 1.367000e+03 | \n",
" 1.255377e+02 | \n",
" 8.000000e+00 | \n",
" 1.109590e+00 | \n",
" 9.012400e-01 | \n",
" 9.014100e-01 | \n",
"
\n",
" \n",
" 75% | \n",
" 1.211492e+06 | \n",
" 1.160000e+02 | \n",
" 9.709000e+03 | \n",
" 0.000000e+00 | \n",
" 9.492000e+03 | \n",
" 9.442000e+03 | \n",
" 1.045600e+04 | \n",
" 2.148138e+03 | \n",
" 2.046322e+03 | \n",
" 2.148138e+03 | \n",
" ... | \n",
" 100.0 | \n",
" 5.530990e+05 | \n",
" 1.0 | \n",
" 2.143000e+03 | \n",
" 2.045000e+03 | \n",
" 1.444623e+02 | \n",
" 1.100000e+01 | \n",
" 1.216780e+00 | \n",
" 9.420500e-01 | \n",
" 9.230800e-01 | \n",
"
\n",
" \n",
" max | \n",
" 1.615323e+06 | \n",
" 2.469600e+04 | \n",
" 3.233200e+04 | \n",
" 1.428012e+04 | \n",
" 3.233200e+04 | \n",
" 3.233200e+04 | \n",
" 3.233200e+04 | \n",
" 3.189167e+03 | \n",
" 3.226500e+03 | \n",
" 3.189167e+03 | \n",
" ... | \n",
" 100.0 | \n",
" 1.135496e+08 | \n",
" 1.0 | \n",
" 3.187000e+03 | \n",
" 3.228000e+03 | \n",
" 1.791449e+02 | \n",
" 1.612912e+02 | \n",
" 8.994840e+00 | \n",
" 1.000000e+00 | \n",
" 1.000000e+00 | \n",
"
\n",
" \n",
"
\n",
"
8 rows × 35 columns
\n",
"
"
],
"text/plain": [
" Area Mean StdDev Mode \\\n",
"count 1.615323e+06 1.615323e+06 1.615323e+06 1.615323e+06 1.615323e+06 \n",
"mean 8.076620e+05 1.331084e+02 6.671007e+03 2.810688e+02 6.488793e+03 \n",
"std 4.663037e+05 2.516350e+02 6.086050e+03 1.091096e+03 6.101919e+03 \n",
"min 1.000000e+00 1.200000e+01 1.000000e+00 0.000000e+00 1.000000e+00 \n",
"25% 4.038315e+05 3.700000e+01 1.956000e+03 0.000000e+00 1.768000e+03 \n",
"50% 8.076620e+05 5.800000e+01 4.868000e+03 0.000000e+00 4.595000e+03 \n",
"75% 1.211492e+06 1.160000e+02 9.709000e+03 0.000000e+00 9.492000e+03 \n",
"max 1.615323e+06 2.469600e+04 3.233200e+04 1.428012e+04 3.233200e+04 \n",
"\n",
" Min Max X Y XM \\\n",
"count 1.615323e+06 1.615323e+06 1.615323e+06 1.615323e+06 1.615323e+06 \n",
"mean 6.444779e+03 7.109614e+03 1.440901e+03 1.394038e+03 1.440899e+03 \n",
"std 6.101029e+03 6.399855e+03 8.383394e+02 8.243418e+02 8.383402e+02 \n",
"min 1.000000e+00 1.000000e+00 3.409090e+00 3.166670e+00 3.409090e+00 \n",
"25% 1.723000e+03 2.069000e+03 7.161416e+02 6.896124e+02 7.161596e+02 \n",
"50% 4.534000e+03 5.249000e+03 1.434662e+03 1.368470e+03 1.434676e+03 \n",
"75% 9.442000e+03 1.045600e+04 2.148138e+03 2.046322e+03 2.148138e+03 \n",
"max 3.233200e+04 3.233200e+04 3.189167e+03 3.226500e+03 3.189167e+03 \n",
"\n",
" ... %Area RawIntDen Slice FeretX FeretY \\\n",
"count ... 1615323.0 1.615323e+06 1615323.0 1.615323e+06 1.615323e+06 \n",
"mean ... 100.0 4.633171e+05 1.0 1.436129e+03 1.392875e+03 \n",
"std ... 0.0 7.302585e+05 0.0 8.382947e+02 8.243501e+02 \n",
"min ... 100.0 7.000000e+01 1.0 1.000000e+00 1.000000e+00 \n",
"25% ... 100.0 1.616980e+05 1.0 7.120000e+02 6.880000e+02 \n",
"50% ... 100.0 3.169700e+05 1.0 1.430000e+03 1.367000e+03 \n",
"75% ... 100.0 5.530990e+05 1.0 2.143000e+03 2.045000e+03 \n",
"max ... 100.0 1.135496e+08 1.0 3.187000e+03 3.228000e+03 \n",
"\n",
" FeretAngle MinFeret AR Round Solidity \n",
"count 1.615323e+06 1.615323e+06 1.615323e+06 1.615323e+06 1.615323e+06 \n",
"mean 1.049987e+02 1.030235e+01 1.249056e+00 8.442305e-01 8.899525e-01 \n",
"std 4.822601e+01 6.097962e+00 3.688458e-01 1.536807e-01 5.326279e-02 \n",
"min 1.507440e+00 3.000000e+00 1.000000e+00 1.111700e-01 3.585100e-01 \n",
"25% 5.446232e+01 7.000000e+00 1.061510e+00 8.218400e-01 8.709700e-01 \n",
"50% 1.255377e+02 8.000000e+00 1.109590e+00 9.012400e-01 9.014100e-01 \n",
"75% 1.444623e+02 1.100000e+01 1.216780e+00 9.420500e-01 9.230800e-01 \n",
"max 1.791449e+02 1.612912e+02 8.994840e+00 1.000000e+00 1.000000e+00 \n",
"\n",
"[8 rows x 35 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data1.describe()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"executionInfo": {
"elapsed": 4899,
"status": "ok",
"timestamp": 1669202827618,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "XTzOCJ84Ib98"
},
"outputs": [],
"source": [
"data2Path = \"Dataset/Results_aptamil.csv\" #@param {type:\"string\"}\n",
"\n",
"data2=pd.read_csv(data2Path) "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 236
},
"executionInfo": {
"elapsed": 13,
"status": "ok",
"timestamp": 1669202827618,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "14vh_AgwJNUH",
"outputId": "ab098b45-d535-447b-ba59-03d21714a83e"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Label | \n",
" Area | \n",
" Mean | \n",
" StdDev | \n",
" Mode | \n",
" Min | \n",
" Max | \n",
" X | \n",
" Y | \n",
" ... | \n",
" %Area | \n",
" RawIntDen | \n",
" Slice | \n",
" FeretX | \n",
" FeretY | \n",
" FeretAngle | \n",
" MinFeret | \n",
" AR | \n",
" Round | \n",
" Solidity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" Data2 | \n",
" 2776 | \n",
" 15.0 | \n",
" 0.0 | \n",
" 15 | \n",
" 15 | \n",
" 15 | \n",
" 2574.66967 | \n",
" 80.45245 | \n",
" ... | \n",
" 100 | \n",
" 41640.0 | \n",
" 1 | \n",
" 2510 | \n",
" 79 | \n",
" 0.84876 | \n",
" 29.09975 | \n",
" 4.99331 | \n",
" 0.20027 | \n",
" 0.94857 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" Data2 | \n",
" 2628 | \n",
" 14.0 | \n",
" 0.0 | \n",
" 14 | \n",
" 14 | \n",
" 14 | \n",
" 1015.16629 | \n",
" 128.37938 | \n",
" ... | \n",
" 100 | \n",
" 36792.0 | \n",
" 1 | \n",
" 1013 | \n",
" 92 | \n",
" 95.04245 | \n",
" 53.00000 | \n",
" 1.28544 | \n",
" 0.77795 | \n",
" 0.97010 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" Data2 | \n",
" 337 | \n",
" 13.0 | \n",
" 0.0 | \n",
" 13 | \n",
" 13 | \n",
" 13 | \n",
" 2602.27448 | \n",
" 103.22404 | \n",
" ... | \n",
" 100 | \n",
" 4381.0 | \n",
" 1 | \n",
" 2598 | \n",
" 93 | \n",
" 105.25512 | \n",
" 20.78831 | \n",
" 1.07470 | \n",
" 0.93050 | \n",
" 0.92837 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" Data2 | \n",
" 491 | \n",
" 16.0 | \n",
" 0.0 | \n",
" 16 | \n",
" 16 | \n",
" 16 | \n",
" 2038.76069 | \n",
" 290.29837 | \n",
" ... | \n",
" 100 | \n",
" 7856.0 | \n",
" 1 | \n",
" 2030 | \n",
" 280 | \n",
" 127.69424 | \n",
" 24.00000 | \n",
" 1.05842 | \n",
" 0.94480 | \n",
" 0.96464 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" Data2 | \n",
" 509 | \n",
" 5.0 | \n",
" 0.0 | \n",
" 5 | \n",
" 5 | \n",
" 5 | \n",
" 2717.95972 | \n",
" 401.13654 | \n",
" ... | \n",
" 100 | \n",
" 2545.0 | \n",
" 1 | \n",
" 2706 | \n",
" 408 | \n",
" 33.11134 | \n",
" 24.00000 | \n",
" 1.09402 | \n",
" 0.91406 | \n",
" 0.95140 | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 36 columns
\n",
"
"
],
"text/plain": [
" Label Area Mean StdDev Mode Min Max X Y ... \\\n",
"0 1 Data2 2776 15.0 0.0 15 15 15 2574.66967 80.45245 ... \n",
"1 2 Data2 2628 14.0 0.0 14 14 14 1015.16629 128.37938 ... \n",
"2 3 Data2 337 13.0 0.0 13 13 13 2602.27448 103.22404 ... \n",
"3 4 Data2 491 16.0 0.0 16 16 16 2038.76069 290.29837 ... \n",
"4 5 Data2 509 5.0 0.0 5 5 5 2717.95972 401.13654 ... \n",
"\n",
" %Area RawIntDen Slice FeretX FeretY FeretAngle MinFeret AR \\\n",
"0 100 41640.0 1 2510 79 0.84876 29.09975 4.99331 \n",
"1 100 36792.0 1 1013 92 95.04245 53.00000 1.28544 \n",
"2 100 4381.0 1 2598 93 105.25512 20.78831 1.07470 \n",
"3 100 7856.0 1 2030 280 127.69424 24.00000 1.05842 \n",
"4 100 2545.0 1 2706 408 33.11134 24.00000 1.09402 \n",
"\n",
" Round Solidity \n",
"0 0.20027 0.94857 \n",
"1 0.77795 0.97010 \n",
"2 0.93050 0.92837 \n",
"3 0.94480 0.96464 \n",
"4 0.91406 0.95140 \n",
"\n",
"[5 rows x 36 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 364
},
"executionInfo": {
"elapsed": 1849,
"status": "ok",
"timestamp": 1669202829457,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "lkRe2DKvmwsS",
"outputId": "7c5922bf-047b-4c08-bc93-9cbdc9f57a6f"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\schatzm\\Anaconda3\\envs\\julab\\lib\\site-packages\\numpy\\lib\\function_base.py:4527: RuntimeWarning: invalid value encountered in subtract\n",
" diff_b_a = subtract(b, a)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" Area | \n",
" Mean | \n",
" StdDev | \n",
" Mode | \n",
" Min | \n",
" Max | \n",
" X | \n",
" Y | \n",
" XM | \n",
" ... | \n",
" %Area | \n",
" RawIntDen | \n",
" Slice | \n",
" FeretX | \n",
" FeretY | \n",
" FeretAngle | \n",
" MinFeret | \n",
" AR | \n",
" Round | \n",
" Solidity | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" ... | \n",
" 666214.0 | \n",
" 6.662140e+05 | \n",
" 666214.0 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
" 666214.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 333107.500000 | \n",
" 107.186941 | \n",
" 4778.728803 | \n",
" 107.668110 | \n",
" 4758.339600 | \n",
" 4671.752471 | \n",
" 4898.814561 | \n",
" 1462.707549 | \n",
" 1378.698306 | \n",
" 1462.708680 | \n",
" ... | \n",
" 100.0 | \n",
" 2.964580e+05 | \n",
" 1.0 | \n",
" 1458.731747 | \n",
" 1377.911724 | \n",
" 97.782395 | \n",
" 8.929439 | \n",
" 1.235981 | \n",
" 0.834769 | \n",
" 0.890030 | \n",
"
\n",
" \n",
" std | \n",
" 192319.560456 | \n",
" 303.942842 | \n",
" 4594.148721 | \n",
" 620.511333 | \n",
" 4627.082933 | \n",
" 4580.314488 | \n",
" 4698.678348 | \n",
" 820.313937 | \n",
" 786.322358 | \n",
" 820.315810 | \n",
" ... | \n",
" 0.0 | \n",
" 6.119494e+05 | \n",
" 0.0 | \n",
" 820.405865 | \n",
" 786.316653 | \n",
" 47.704262 | \n",
" 5.969685 | \n",
" 0.278721 | \n",
" 0.120508 | \n",
" 0.043218 | \n",
"
\n",
" \n",
" min | \n",
" 1.000000 | \n",
" 11.000000 | \n",
" 1.000000 | \n",
" 0.000000 | \n",
" 1.000000 | \n",
" 1.000000 | \n",
" 1.000000 | \n",
" 3.425930 | \n",
" 3.029410 | \n",
" 3.425930 | \n",
" ... | \n",
" 100.0 | \n",
" 2.900000e+01 | \n",
" 1.0 | \n",
" 1.000000 | \n",
" 1.000000 | \n",
" 0.848760 | \n",
" 3.000000 | \n",
" 1.000000 | \n",
" 0.078880 | \n",
" 0.371820 | \n",
"
\n",
" \n",
" 25% | \n",
" 166554.250000 | \n",
" 31.000000 | \n",
" 873.000000 | \n",
" 0.000000 | \n",
" 848.000000 | \n",
" 830.000000 | \n",
" 882.000000 | \n",
" 765.134703 | \n",
" 708.711415 | \n",
" 765.137985 | \n",
" ... | \n",
" 100.0 | \n",
" 4.565525e+04 | \n",
" 1.0 | \n",
" 761.000000 | \n",
" 708.000000 | \n",
" 45.000000 | \n",
" 6.000000 | \n",
" 1.091640 | \n",
" 0.791510 | \n",
" 0.869570 | \n",
"
\n",
" \n",
" 50% | \n",
" 333107.500000 | \n",
" 46.000000 | \n",
" 3257.365065 | \n",
" 0.000000 | \n",
" 3173.000000 | \n",
" 3072.000000 | \n",
" 3337.000000 | \n",
" 1470.000000 | \n",
" 1378.719670 | \n",
" 1470.022270 | \n",
" ... | \n",
" 100.0 | \n",
" 1.744200e+05 | \n",
" 1.0 | \n",
" 1466.000000 | \n",
" 1378.000000 | \n",
" 116.565050 | \n",
" 7.000000 | \n",
" 1.158650 | \n",
" 0.863080 | \n",
" 0.894740 | \n",
"
\n",
" \n",
" 75% | \n",
" 499660.750000 | \n",
" 75.000000 | \n",
" 7718.000000 | \n",
" 0.000000 | \n",
" 7704.000000 | \n",
" 7545.000000 | \n",
" 7952.000000 | \n",
" 2163.673910 | \n",
" 2044.148023 | \n",
" 2163.689600 | \n",
" ... | \n",
" 100.0 | \n",
" 3.975745e+05 | \n",
" 1.0 | \n",
" 2160.000000 | \n",
" 2044.000000 | \n",
" 135.000000 | \n",
" 9.000000 | \n",
" 1.263408 | \n",
" 0.916050 | \n",
" 0.918600 | \n",
"
\n",
" \n",
" max | \n",
" 666214.000000 | \n",
" 58667.000000 | \n",
" 21305.000000 | \n",
" 10052.812470 | \n",
" 21305.000000 | \n",
" 21305.000000 | \n",
" 21305.000000 | \n",
" 3139.107140 | \n",
" 3086.326090 | \n",
" 3139.107140 | \n",
" ... | \n",
" 100.0 | \n",
" 2.731261e+08 | \n",
" 1.0 | \n",
" 3136.000000 | \n",
" 3089.000000 | \n",
" 179.292680 | \n",
" 359.467960 | \n",
" 12.677090 | \n",
" 1.000000 | \n",
" 1.000000 | \n",
"
\n",
" \n",
"
\n",
"
8 rows × 35 columns
\n",
"
"
],
"text/plain": [
" Area Mean StdDev \\\n",
"count 666214.000000 666214.000000 666214.000000 666214.000000 \n",
"mean 333107.500000 107.186941 4778.728803 107.668110 \n",
"std 192319.560456 303.942842 4594.148721 620.511333 \n",
"min 1.000000 11.000000 1.000000 0.000000 \n",
"25% 166554.250000 31.000000 873.000000 0.000000 \n",
"50% 333107.500000 46.000000 3257.365065 0.000000 \n",
"75% 499660.750000 75.000000 7718.000000 0.000000 \n",
"max 666214.000000 58667.000000 21305.000000 10052.812470 \n",
"\n",
" Mode Min Max X \\\n",
"count 666214.000000 666214.000000 666214.000000 666214.000000 \n",
"mean 4758.339600 4671.752471 4898.814561 1462.707549 \n",
"std 4627.082933 4580.314488 4698.678348 820.313937 \n",
"min 1.000000 1.000000 1.000000 3.425930 \n",
"25% 848.000000 830.000000 882.000000 765.134703 \n",
"50% 3173.000000 3072.000000 3337.000000 1470.000000 \n",
"75% 7704.000000 7545.000000 7952.000000 2163.673910 \n",
"max 21305.000000 21305.000000 21305.000000 3139.107140 \n",
"\n",
" Y XM ... %Area RawIntDen Slice \\\n",
"count 666214.000000 666214.000000 ... 666214.0 6.662140e+05 666214.0 \n",
"mean 1378.698306 1462.708680 ... 100.0 2.964580e+05 1.0 \n",
"std 786.322358 820.315810 ... 0.0 6.119494e+05 0.0 \n",
"min 3.029410 3.425930 ... 100.0 2.900000e+01 1.0 \n",
"25% 708.711415 765.137985 ... 100.0 4.565525e+04 1.0 \n",
"50% 1378.719670 1470.022270 ... 100.0 1.744200e+05 1.0 \n",
"75% 2044.148023 2163.689600 ... 100.0 3.975745e+05 1.0 \n",
"max 3086.326090 3139.107140 ... 100.0 2.731261e+08 1.0 \n",
"\n",
" FeretX FeretY FeretAngle MinFeret \\\n",
"count 666214.000000 666214.000000 666214.000000 666214.000000 \n",
"mean 1458.731747 1377.911724 97.782395 8.929439 \n",
"std 820.405865 786.316653 47.704262 5.969685 \n",
"min 1.000000 1.000000 0.848760 3.000000 \n",
"25% 761.000000 708.000000 45.000000 6.000000 \n",
"50% 1466.000000 1378.000000 116.565050 7.000000 \n",
"75% 2160.000000 2044.000000 135.000000 9.000000 \n",
"max 3136.000000 3089.000000 179.292680 359.467960 \n",
"\n",
" AR Round Solidity \n",
"count 666214.000000 666214.000000 666214.000000 \n",
"mean 1.235981 0.834769 0.890030 \n",
"std 0.278721 0.120508 0.043218 \n",
"min 1.000000 0.078880 0.371820 \n",
"25% 1.091640 0.791510 0.869570 \n",
"50% 1.158650 0.863080 0.894740 \n",
"75% 1.263408 0.916050 0.918600 \n",
"max 12.677090 1.000000 1.000000 \n",
"\n",
"[8 rows x 35 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data2.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uVAKRsN3I-JH"
},
"source": [
"use"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pIaMXH7tsD6y"
},
"source": [
"Select columns of interest"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
},
"executionInfo": {
"elapsed": 852,
"status": "ok",
"timestamp": 1669202830300,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "BL2PoishI9pP",
"outputId": "19c8fc37-daa0-4a2f-e662-0b3a99c8e13b"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Label | \n",
" Area | \n",
" Feret | \n",
" AR | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Data2 | \n",
" 2776 | \n",
" 135.01481 | \n",
" 4.99331 | \n",
"
\n",
" \n",
" 1 | \n",
" Data2 | \n",
" 2628 | \n",
" 68.26419 | \n",
" 1.28544 | \n",
"
\n",
" \n",
" 2 | \n",
" Data2 | \n",
" 337 | \n",
" 22.80351 | \n",
" 1.07470 | \n",
"
\n",
" \n",
" 3 | \n",
" Data2 | \n",
" 491 | \n",
" 27.80288 | \n",
" 1.05842 | \n",
"
\n",
" \n",
" 4 | \n",
" Data2 | \n",
" 509 | \n",
" 27.45906 | \n",
" 1.09402 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 666209 | \n",
" Data2 | \n",
" 1787 | \n",
" 62.64982 | \n",
" 1.59116 | \n",
"
\n",
" \n",
" 666210 | \n",
" Data2 | \n",
" 180 | \n",
" 18.78829 | \n",
" 1.38055 | \n",
"
\n",
" \n",
" 666211 | \n",
" Data2 | \n",
" 45 | \n",
" 8.94427 | \n",
" 1.16188 | \n",
"
\n",
" \n",
" 666212 | \n",
" Data2 | \n",
" 75 | \n",
" 12.36932 | \n",
" 1.28449 | \n",
"
\n",
" \n",
" 666213 | \n",
" Data2 | \n",
" 62 | \n",
" 15.52417 | \n",
" 2.93601 | \n",
"
\n",
" \n",
"
\n",
"
666214 rows × 4 columns
\n",
"
"
],
"text/plain": [
" Label Area Feret AR\n",
"0 Data2 2776 135.01481 4.99331\n",
"1 Data2 2628 68.26419 1.28544\n",
"2 Data2 337 22.80351 1.07470\n",
"3 Data2 491 27.80288 1.05842\n",
"4 Data2 509 27.45906 1.09402\n",
"... ... ... ... ...\n",
"666209 Data2 1787 62.64982 1.59116\n",
"666210 Data2 180 18.78829 1.38055\n",
"666211 Data2 45 8.94427 1.16188\n",
"666212 Data2 75 12.36932 1.28449\n",
"666213 Data2 62 15.52417 2.93601\n",
"\n",
"[666214 rows x 4 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d1Area = data1[['Label', 'Area','Feret','AR']]\n",
"d2Area = data2[['Label', 'Area','Feret','AR']]\n",
"\n",
"d1Area.dropna(how='all')\n",
"d2Area.dropna(how='all')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZHONz0vKsAAw"
},
"source": [
"Rename Labels entry"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1669202832905,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "SIreZBeWsAe7"
},
"outputs": [],
"source": [
"d1Area = d1Area.replace({'Data1':'MM'})\n",
"d2Area = d2Area.replace({'Data2':'Aptamil'})"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1669202834252,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "pH06ItUzRcy2",
"outputId": "17436b84-3af4-471c-c5a3-b56ea9fa59b4"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Label | \n",
" Area | \n",
" Feret | \n",
" AR | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" MM | \n",
" 2541 | \n",
" 60.67125 | \n",
" 1.06228 | \n",
"
\n",
" \n",
" 1 | \n",
" MM | \n",
" 2419 | \n",
" 59.77458 | \n",
" 1.10986 | \n",
"
\n",
" \n",
" 2 | \n",
" MM | \n",
" 1855 | \n",
" 50.47772 | \n",
" 1.03663 | \n",
"
\n",
" \n",
" 3 | \n",
" MM | \n",
" 2596 | \n",
" 60.44005 | \n",
" 1.05496 | \n",
"
\n",
" \n",
" 4 | \n",
" MM | \n",
" 1409 | \n",
" 46.23851 | \n",
" 1.15185 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Label Area Feret AR\n",
"0 MM 2541 60.67125 1.06228\n",
"1 MM 2419 59.77458 1.10986\n",
"2 MM 1855 50.47772 1.03663\n",
"3 MM 2596 60.44005 1.05496\n",
"4 MM 1409 46.23851 1.15185"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d1Area.head()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 3,
"status": "ok",
"timestamp": 1669202835402,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "EzzVvwvdf26p",
"outputId": "2df4712f-4067-41d1-a95b-270c08f1bfff"
},
"outputs": [
{
"data": {
"text/plain": [
"count 1.615323e+06\n",
"mean 1.331084e+02\n",
"std 2.516350e+02\n",
"min 1.200000e+01\n",
"25% 3.700000e+01\n",
"50% 5.800000e+01\n",
"75% 1.160000e+02\n",
"max 2.469600e+04\n",
"Name: Area, dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d1Area['Area'].describe() "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 3,
"status": "ok",
"timestamp": 1669202837501,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "S-UDEkvCN4ii",
"outputId": "09e0fbad-8ed2-4ac6-d4c4-7fd8e89b29f4"
},
"outputs": [
{
"data": {
"text/plain": [
"58.0"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_median = d1Area['Area'].median()\n",
"\n",
"# Take a look\n",
"df_median"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 5,
"status": "ok",
"timestamp": 1669202839022,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "TbU_0QMogDlD",
"outputId": "82308db4-46ed-4fd6-8e23-7ec9baba0c94"
},
"outputs": [
{
"data": {
"text/plain": [
"count 666214.000000\n",
"mean 107.186941\n",
"std 303.942842\n",
"min 11.000000\n",
"25% 31.000000\n",
"50% 46.000000\n",
"75% 75.000000\n",
"max 58667.000000\n",
"Name: Area, dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d2Area['Area'].describe() "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 4,
"status": "ok",
"timestamp": 1669202841776,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "8BXqQ4fqO6tD",
"outputId": "abbd5013-935b-46f0-88f5-deee7b98d2fa"
},
"outputs": [
{
"data": {
"text/plain": [
"46.0"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_median = d2Area['Area'].median()\n",
"\n",
"# Take a look\n",
"df_median"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1669202843961,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "_EERDAeZWcK4"
},
"outputs": [],
"source": [
"result = pd.concat([d1Area, d2Area])\n",
"\n",
"#del d1Area, d2Area"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"executionInfo": {
"elapsed": 1346,
"status": "ok",
"timestamp": 1669202845786,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "KIRm1WzEXRTp",
"outputId": "265f1567-ffee-40ca-f54e-214f614ac3fc"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Label | \n",
" Area | \n",
" Feret | \n",
" AR | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" MM | \n",
" 2541 | \n",
" 60.67125 | \n",
" 1.06228 | \n",
"
\n",
" \n",
" 1 | \n",
" MM | \n",
" 2419 | \n",
" 59.77458 | \n",
" 1.10986 | \n",
"
\n",
" \n",
" 2 | \n",
" MM | \n",
" 1855 | \n",
" 50.47772 | \n",
" 1.03663 | \n",
"
\n",
" \n",
" 3 | \n",
" MM | \n",
" 2596 | \n",
" 60.44005 | \n",
" 1.05496 | \n",
"
\n",
" \n",
" 4 | \n",
" MM | \n",
" 1409 | \n",
" 46.23851 | \n",
" 1.15185 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Label Area Feret AR\n",
"0 MM 2541 60.67125 1.06228\n",
"1 MM 2419 59.77458 1.10986\n",
"2 MM 1855 50.47772 1.03663\n",
"3 MM 2596 60.44005 1.05496\n",
"4 MM 1409 46.23851 1.15185"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.head()\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 382
},
"executionInfo": {
"elapsed": 155310,
"status": "ok",
"timestamp": 1669203452247,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "9bbf4oBiQSZy",
"outputId": "decdfcfe-b3e5-4f14-cad8-21d240a97d72"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\schatzm\\Anaconda3\\envs\\julab\\lib\\site-packages\\seaborn\\distributions.py:254: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`\n",
" baselines.iloc[:, cols] = (curves\n"
]
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.set_theme(style=\"ticks\", palette=\"pastel\") \n",
"sns.displot(result[result.index.duplicated()], x=\"Area\", hue=\"Label\" , multiple=\"stack\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 528
},
"executionInfo": {
"elapsed": 2720,
"status": "ok",
"timestamp": 1669204627122,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "_DUixvzaPjmN",
"outputId": "78b48092-8a17-4c98-8536-c75bb1f97044"
},
"outputs": [],
"source": [
"sns.set_theme(style=\"ticks\", palette=\"pastel\")\n",
"fig = matplotlib.pyplot.gcf()\n",
"fig.set_size_inches(12, 8)\n",
"\n",
"# Load the example tips dataset\n",
"tips = sns.load_dataset(\"tips\")\n",
"\n",
"# Draw a nested boxplot to show bills by day and time\n",
"ax = sns.boxplot(x=\"Area\", y=\"Label\",\n",
" hue=\"Label\", palette=[\"m\", \"g\"],\n",
" data=result, \n",
" showfliers = False) #get rid of outliers\n",
" # data=result[np.mod(np.arange(result.index.size),3)!=0])\n",
"ax.set(xlabel='Area', ylabel='Label', title=\"Milk Area\")\n",
"\n",
"# Improve the legend\n",
"sns.move_legend(\n",
" ax, loc=\"lower right\", ncol=3, frameon=True, columnspacing=1, handletextpad=0\n",
")\n",
"\n",
"sns.despine(offset=10, trim=True)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 543,
"status": "ok",
"timestamp": 1669204789432,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "B6CBSzEPs0dC",
"outputId": "873039ea-0634-46e0-9742-4cf4b38d9ca9"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"p = 0\n",
"null hypothesis: Data1 (MM) comes from a normal distribution\n",
"The null hypothesis can be rejected\n",
"p = 0\n",
"null hypothesis: Data2 (Aptamil) from a normal distribution\n",
"The null hypothesis can be rejected\n"
]
}
],
"source": [
"from scipy.stats import normaltest\n",
"\n",
"k2, p = normaltest(d1Area['Area'])\n",
"alpha = 1e-3\n",
"print(\"p = {:g}\".format(p))\n",
"print('null hypothesis: Data1 (MM) comes from a normal distribution')\n",
"if p < alpha: # null hypothesis: Data1 (MM) comes from a normal distribution\n",
" print(\"The null hypothesis can be rejected\")\n",
"else:\n",
" print(\"The null hypothesis cannot be rejected\")\n",
"\n",
"k2, p = normaltest(d2Area['Area'])\n",
"alpha = 1e-3\n",
"print(\"p = {:g}\".format(p))\n",
"print('null hypothesis: Data2 (Aptamil) from a normal distribution')\n",
"if p < alpha: # null hypothesis: Data2 (Aptamil) from a normal distribution\n",
" print(\"The null hypothesis can be rejected\")\n",
"else:\n",
" print(\"The null hypothesis cannot be rejected\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e43whHStuq_M"
},
"source": [
"Selecting non parametric test, and testing:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 9,
"status": "ok",
"timestamp": 1669204793360,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "YZwHKnkHgGXb",
"outputId": "6ac7fc9d-2755-436e-eec8-5702fa55db17"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"null hypothesis: data sets are from the same distribution\n",
"Statistics=6272.500, p=0.0018810635846991\n",
"Different distribution (reject H0)\n"
]
}
],
"source": [
"# Mann-Whitney U test\n",
"from numpy.random import seed\n",
"from numpy.random import randn\n",
"from scipy.stats import mannwhitneyu\n",
"# import random \n",
"from random import sample \n",
"data1=d1Area['Area'].sample(n=100, random_state=1)\n",
"data2=d2Area['Area'].sample(n=100, random_state=1)\n",
"print('null hypothesis: data sets are from the same distribution')\n",
"# compare samples\n",
"stat, p = mannwhitneyu(data1, data2)\n",
"print('Statistics=%.3f, p=%.16f' % (stat, p))\n",
"# interpret\n",
"alpha = 0.05\n",
"if p > alpha:\n",
"\tprint('Same distribution (fail to reject H0)')\n",
"else:\n",
"\tprint('Different distribution (reject H0)')"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 5,
"status": "ok",
"timestamp": 1669204842639,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "L_Djt50nXEA6",
"outputId": "bef1b030-7b13-40fc-d673-729de4eac569"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.0018810635846990642]\n"
]
}
],
"source": [
"# pvalues with scipy:\n",
"stat_results = [\n",
" mannwhitneyu(data1, data2, alternative=\"two-sided\"),\n",
" # mannwhitneyu(flight, sound, alternative=\"two-sided\"),\n",
" # mannwhitneyu(robots, sound, alternative=\"two-sided\")\n",
"]\n",
"\n",
"pvalues = [result.pvalue for result in stat_results]\n",
"print(pvalues)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 443
},
"executionInfo": {
"elapsed": 2054,
"status": "ok",
"timestamp": 1669204996810,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "mJGj5nvZXKpc",
"outputId": "000ce20a-fd2f-4050-95e0-9365835c4604"
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"with sns.plotting_context(\"notebook\", font_scale=1.4):\n",
" # Create new plot\n",
" fig, ax = plt.subplots(1, 1, figsize=(12, 6))\n",
"\n",
" sns.boxplot(ax=ax, data=result, x='Label', y='Area', \n",
" showfliers = False,\n",
" # palette=subcat_palette,\n",
" # order=subcat_order\n",
" )\n",
" plt.title(\"Aptamil vs MM\", y=1.06)\n",
" # ax.set_ylabel(\"Goal ($)\")\n",
" # ax.set_xlabel(\"Project State\", labelpad=20)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"executionInfo": {
"elapsed": 4,
"status": "ok",
"timestamp": 1669205066379,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "dvu-ustiX1as"
},
"outputs": [],
"source": [
"from statannotations.Annotator import Annotator\n",
"\n",
"subcat_palette = sns.dark_palette(\"#8BF\", reverse=True, n_colors=5)\n",
"states_palette = sns.color_palette(\"YlGnBu\", n_colors=5)\n",
"\n",
"states_order = [\"Successful\", \"Failed\", \"Live\", \"Suspended\", \"Canceled\"]\n",
"subcat_order = ['MM', 'Aptamil']"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 563
},
"executionInfo": {
"elapsed": 3614,
"status": "ok",
"timestamp": 1669205215720,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "d9PHpDQaYAeP",
"outputId": "25342e3c-7501-43c2-e237-099b2d2ceaa1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"p-value annotation legend:\n",
" ns: p <= 1.00e+00\n",
" *: 1.00e-02 < p <= 5.00e-02\n",
" **: 1.00e-03 < p <= 1.00e-02\n",
" ***: 1.00e-04 < p <= 1.00e-03\n",
" ****: p <= 1.00e-04\n",
"\n",
"MM vs. Aptamil: p=1.88e-03\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Putting the parameters in a dictionary avoids code duplication\n",
"# since we use the same for `sns.boxplot` and `Annotator` calls\n",
"plotting_parameters = {\n",
" 'data':result, \n",
" 'x':'Label', \n",
" 'y':'Area', \n",
" 'showfliers': False,\n",
"}\n",
"\n",
"pairs = [('MM', 'Aptamil'), # 'Robots' vs 'Flight'\n",
" # ('Flight', 'Sound'), # 'Flight' vs 'Sound'\n",
" # ('Robots', 'Sound') # 'Robots' vs 'Sound'\n",
" ]\n",
"\n",
"formatted_pvalues = [f\"p={p:.2e}\" for p in pvalues]\n",
"\n",
"with sns.plotting_context('notebook', font_scale=1.4):\n",
" # Create new plot\n",
" fig, ax = plt.subplots(1, 1, figsize=(12, 6))\n",
"\n",
" # Plot with seaborn\n",
" sns.boxplot(**plotting_parameters)\n",
"\n",
" # Add annotations\n",
" annotator = Annotator(ax, pairs, **plotting_parameters)\n",
" annotator.set_custom_annotations(formatted_pvalues)\n",
" annotator.annotate()\n",
"\n",
" # Label and show\n",
" plt.title(\"Aptamil vs MM\", y=1.06)\n",
" \n",
" plt.savefig(\"./plot1A.png\", bbox_inches='tight')\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 563
},
"executionInfo": {
"elapsed": 4407,
"status": "ok",
"timestamp": 1669205268773,
"user": {
"displayName": "Martin Schätz",
"userId": "10828352848441153145"
},
"user_tz": -60
},
"id": "c8NTEfbdYorS",
"outputId": "65ee2730-80de-4f6c-c694-2fdb3382c79d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"p-value annotation legend:\n",
" ns: p <= 1.00e+00\n",
" *: 1.00e-02 < p <= 5.00e-02\n",
" **: 1.00e-03 < p <= 1.00e-02\n",
" ***: 1.00e-04 < p <= 1.00e-03\n",
" ****: p <= 1.00e-04\n",
"\n",
"MM vs. Aptamil: Custom statistical test, P_val:1.881e-03\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with sns.plotting_context(\"notebook\", font_scale=1.4):\n",
" # Create new plot\n",
" fig, ax = plt.subplots(1, 1, figsize=(12, 6))\n",
"\n",
" # Plot with seaborn\n",
" sns.boxplot(ax=ax, **plotting_parameters)\n",
"\n",
" # Add annotations\n",
" annotator = Annotator(ax, pairs, **plotting_parameters)\n",
" annotator.set_pvalues(pvalues)\n",
" annotator.annotate()\n",
"\n",
" # Label and show\n",
" plt.title(\"Aptamil vs MM\", y=1.06)\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Last updated: 2023-01-05T13:50:32.141879+01:00\n",
"\n",
"Python implementation: CPython\n",
"Python version : 3.9.15\n",
"IPython version : 8.8.0\n",
"\n",
"Compiler : MSC v.1929 64 bit (AMD64)\n",
"OS : Windows\n",
"Release : 10\n",
"Machine : AMD64\n",
"Processor : Intel64 Family 6 Model 85 Stepping 7, GenuineIntel\n",
"CPU cores : 40\n",
"Architecture: 64bit\n",
"\n",
"watermark : 2.3.1\n",
"numpy : 1.23.5\n",
"scipy : 1.10.0\n",
"pandas : 1.5.2\n",
"matplotlib : 3.6.2\n",
"bokeh : 3.0.3\n",
"statannotations: 0.5.0\n",
"\n"
]
}
],
"source": [
"from watermark import watermark\n",
"watermark(iversions=True, globals_=globals())\n",
"print(watermark())\n",
"print(watermark(packages=\"watermark,numpy,scipy,pandas,matplotlib,bokeh,statannotations\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.15"
}
},
"nbformat": 4,
"nbformat_minor": 4
}