00:35:35 Pedro Ramos: hi 00:36:00 Martin Misson: heya 01:08:26 Marko Prelevikj: enlarge the font please 01:08:54 Marta Castro: [campus52@viz Apache-Spark-Workshop]$ spark-workshop-env/bin/activate bash: spark-workshop-env/bin/activate: Permission denied 01:09:12 Marko Prelevikj: you need to source it 01:09:32 Marko Prelevikj: source ./spark-workshop-env/bin/activate 01:09:39 Marta Castro: Yes, got it thanks 01:10:16 Milena Veneva: Can you, please, enlarge the font? 01:12:04 Milena Veneva: bin 01:27:42 Emre Perihan: py4j.protocol.Py4JNetworkError: Answer from Java side is empty 01:29:54 Bianca De Saedeleer: How do you stay in the function? After the second line (data=line.split('",'), I get IndentationError: expected an indented block 01:30:19 Betzabeth Leon: dd2.take(2) Traceback (most recent call last): File "", line 1, in 01:30:35 Betzabeth Leon: rdd2.take(2) Traceback (most recent call last): File "", line 1, in 01:31:05 Marko Prelevikj: you have to put at least 2 spaces at the beginning of each line 01:32:26 Bianca De Saedeleer: thank you! 01:37:12 Marko Prelevikj: @Emre were you able to connect? 01:37:32 Marko Prelevikj: @Betzabeth it looks like wrong path to the text file? 01:38:19 Milena Veneva: Can you, please, explain what the organize function does once again? I had a connection issue and missed it. 01:39:15 Milena Veneva: Hvala! 01:53:34 Leon Bogdanovic: I keep receiving this error: ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:33193) 01:54:27 Emre Perihan: @Marko unfortunately not 01:54:29 Emre Perihan: py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext 01:55:11 Milena Veneva: --- What does it mean with schema? 01:55:16 Marko Prelevikj: does this occur while creating a SparkContext object? 01:55:24 Emre Perihan: exactly 01:55:44 Emre Perihan: sc = pyspark.SparkContext(appName = "SparkWorkshop", master ="local[1]") 01:56:00 Leon Bogdanovic: For me, too, 01:56:10 Marko Prelevikj: can you try like this? sc = pyspark.SparkContext.getOrCreate(appName = "SparkWorkshop", master ="local[1]") 01:56:24 Marko Prelevikj: I think there are too many users at the moment and it cannot allocate resources for you 01:56:35 Leon Bogdanovic: I've tried it on two accounts. Same error occurs. 01:57:01 Emre Perihan: TypeError: getOrCreate() got an unexpected keyword argument 'appName' 01:57:20 Marko Prelevikj: @Milena it means that we know the structure/schema of what the data looks like 01:57:44 Milena Veneva: Thanks! :) 01:58:17 Gianluca De Moro: I'm not connected to the cluster so maybe I'm wrong but did you load the module? module load Spark/2.4.0-Hadoop-2.7-Java-1.8 01:58:32 Zacarias Benta: yes 01:58:41 Zacarias Benta: I loaded like that 01:58:51 Emre Perihan: @Gıanluca Yes, I did 01:59:49 Marko Prelevikj: there is an example how to use getOrCreate in Lab-1.ipynb, could you take a look at it? 02:00:03 Marko Prelevikj: we are going to go through it in the hands-on session after the break 02:01:48 Emre Perihan: how can I access the file? 02:02:09 Marko Prelevikj: it's in the notebooks folder of the repo you cloned 02:02:36 Marko Prelevikj: you have to first run jupyter notebook in the console 02:02:48 Emre Perihan: oh ok, I dowloaded iı last night now it's updated 02:04:26 Emre Perihan: @Marko still getting the same error 02:04:28 Emre Perihan: py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:36111) 02:05:02 Emre Perihan: from pyspark import SparkContext, SparkConf 02:05:08 Emre Perihan: conf = SparkConf().setAppName("Spark Intro").setMaster("local[1]") 02:05:13 Emre Perihan: typed these 02:08:02 Marko Prelevikj: will look into it during the break 02:08:35 Marta Castro: Is there any cmd to know which localdomain my user campus52 is using? 02:09:23 Milena Veneva: Break? 02:09:48 Marko Prelevikj: it should state the port after creating a SparkContext instance in the logs 02:18:45 Milena Veneva: Can we have a break? So as our eyes relax a bit maybe? 02:19:26 Zacarias Benta: ok 02:19:28 Zacarias Benta: perfect 02:19:29 Marta Castro: Yes 02:19:38 Patricio Puchaicela: ok 02:20:00 Betzabeth Leon: ok 02:49:26 Marta Castro: I got the same partitions as the professor, why is spark creating the same partitions? 02:51:01 Slavko Zitnik: Partitioning should not be random - as one node goes down, only one partition should be recalculated and we must know which data was in it. 02:53:07 Slavko Zitnik: hint: reduce() is an action, while reduceByKey() is a transformation 03:08:30 Zacarias Benta: Where id=weatehr1 03:10:39 Bianca De Saedeleer: Inner join 03:10:51 Marta Castro: Inner? 03:11:07 Zacarias Benta: location 03:17:56 Milena Veneva: Share the screen. 03:19:26 Marta Castro: Yes 03:19:26 Milena Veneva: No. 03:19:30 Miguel Viana: yes 03:19:33 Bianca De Saedeleer: yes 03:19:33 Pedro Ramos: I do 03:19:33 Patricio Puchaicela: yes 03:19:37 Matthieu Salamone: yes 03:34:52 Zacarias Benta: Thanks, it was a great presentation.