python 3.x - Pandas DataFrames- value counts from many columns and survey multiple answers -
i'm struggling 2 issues:
i have exemplary dataset cars. 20 respondents asked favorite cars. enumerate maximally 5 cars (columns "answer 1"- "answer 5". how can number of mentions of each car?
for cars mentioned in columns
answer 1
-answer 2
each respondent noted 3 advantages of each car (for example columnsadv car 1_1
,adv car 1_2
,adv car 1_3
related first mentioned car,adv car 2_1
,adv car 2_2
,adv car 2_3
related second car etc.).
how can show how many particular advantages related each car mentioned?
need informations this:ferrari
-engine
3 times,color
- 5 times,price
- 3 times etc,audi
-engine
- 4 times,color
- 3 times,price
- 2 times etc.
the second issue related spss , r variables cases, can't translate python. in case related melting dataframes, attempts weren't effective.
i'll thankful help. csv file available download or below:
no;answer 1;answer 2;answer 3;answer 4;answer 5;adv car 1_1;adv car 1_2;adv car 1_3;adv car 2_1;adv car 2_2;adv car 2_3;adv car 3_1;adv car 3_2;adv car 3_3;adv car 4_1;adv car 4_2;adv car 4_3;adv car 5_1;adv car 5_2;adv car 5_3 1;ferrari;volkswagen;audi;renault;toyota;color;longevity;brand;manufacturer;color;longevity;brand;manufacturer;color;longevity;brand;manufacturer;color;longevity;brand 2;volkswagen;toyota;audi;;;brand;color;price;engine;color;longevity;price;color;price;;;;;; 3;toyota;renault;;;;manufacturer;brand;engine;color;price;brand;;;;;;;;; 4;audi;;;;;engine;color;manufacturer;;;;;;;;;;;; 5;volkswagen;ferrari;renault;;;color;longevity;brand;manufacturer;color;longevity;brand;manufacturer;color;;;;;; 6;renault;toyota;;;;brand;engine;color;price;brand;manufacturer;;;;;;;;; 7;audi;renault;ferrari;toyota;;brand;color;price;engine;color;longevity;price;color;price;longevity;brand;manufacturer;;; 8;toyota;;;;;engine;color;longevity;;;;;;;;;;;; 9;audi;ferrari;volkswagen;;;engine;color;longevity;price;color;price;longevity;brand;manufacturer;;;;;; 10;renault;toyota;;;;color;longevity;price;color;price;;;;;;;;;; 11;ferrari;;;;;longevity;price;color;;;;;;;;;;;; 12;toyota;volkswagen;audi;;;engine;color;manufacturer;engine;color;longevity;price;color;price;;;;;; 13;audi;volkswagen;renault;;;manufacturer;color;longevity;brand;manufacturer;color;longevity;brand;manufacturer;;;;;; 14;volkswagen;;;;;color;longevity;price;;;;;;;;;;;; 15;toyota;volkswagen;;;;brand;color;price;engine;color;longevity;;;;;;;;; 16;ferrari;renault;volkswagen;;;price;engine;color;longevity;price;color;price;longevity;brand;;;;;; 17;audi;toyota;renault;;;color;longevity;brand;manufacturer;color;longevity;brand;manufacturer;color;;;;;; 18;toyota;audi;volkswagen;ferrari;audi;color;longevity;brand;manufacturer;color;longevity;brand;manufacturer;color;longevity;brand;manufacturer;color;longevity;brand 19;ferrari;;;;;engine;;;;;;;;;;;;;; 20;volkswagen;;;;;brand;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;; ;;;;data2 = ['color','engine','price','manufacturer','brand','longevity',''];;;;;;;;;;;;;;;;
df = pd.read_csv('cars.csv',sep=';') df_car = df.melt(id_vars='no',value_vars=['answer 1','answer 2','answer 3','answer 4','answer 5'], value_name='cars').drop('variable',axis=1) df_advantages = df.melt(id_vars='no',value_vars=['adv car 1_1', 'adv car 1_2', 'adv car 1_3', 'adv car 2_1', 'adv car 2_2', 'adv car 2_3', 'adv car 3_1', 'adv car 3_2', 'adv car 3_3', 'adv car 4_1', 'adv car 4_2', 'adv car 4_3', 'adv car 5_1', 'adv car 5_2', 'adv car 5_3'], value_name='advantages').drop('variable',axis=1) df_car.merge(df_advantages, on='no').groupby(['cars','advantages']).count()
output:
no cars advantages audi brand 20 color 29 engine 6 longevity 22 manufacturer 18 price 10 ferrari brand 14 color 19 engine 4 longevity 17 manufacturer 10 price 9 renault brand 17 color 21 engine 4 longevity 15 manufacturer 13 price 10 toyota brand 18 color 27 engine 8 longevity 17 manufacturer 12 price 13 volkswagen brand 17 color 26 engine 6 longevity 20 manufacturer 13 price 12
Comments
Post a Comment