Indico style
Indico style - inline minutes
Indico style - numbered
Indico style - numbered + minutes
Indico Weeks View
Back to Conference View
Choose Timezone
Use the event/category timezone
Specify a timezone
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
Africa/Dakar
Africa/Dar_es_Salaam
Africa/Djibouti
Africa/Douala
Africa/El_Aaiun
Africa/Freetown
Africa/Gaborone
Africa/Harare
Africa/Johannesburg
Africa/Juba
Africa/Kampala
Africa/Khartoum
Africa/Kigali
Africa/Kinshasa
Africa/Lagos
Africa/Libreville
Africa/Lome
Africa/Luanda
Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru
Africa/Mbabane
Africa/Mogadishu
Africa/Monrovia
Africa/Nairobi
Africa/Ndjamena
Africa/Niamey
Africa/Nouakchott
Africa/Ouagadougou
Africa/Porto-Novo
Africa/Sao_Tome
Africa/Tripoli
Africa/Tunis
Africa/Windhoek
America/Adak
America/Anchorage
America/Anguilla
America/Antigua
America/Araguaina
America/Argentina/Buenos_Aires
America/Argentina/Catamarca
America/Argentina/Cordoba
America/Argentina/Jujuy
America/Argentina/La_Rioja
America/Argentina/Mendoza
America/Argentina/Rio_Gallegos
America/Argentina/Salta
America/Argentina/San_Juan
America/Argentina/San_Luis
America/Argentina/Tucuman
America/Argentina/Ushuaia
America/Aruba
America/Asuncion
America/Atikokan
America/Bahia
America/Bahia_Banderas
America/Barbados
America/Belem
America/Belize
America/Blanc-Sablon
America/Boa_Vista
America/Bogota
America/Boise
America/Cambridge_Bay
America/Campo_Grande
America/Cancun
America/Caracas
America/Cayenne
America/Cayman
America/Chicago
America/Chihuahua
America/Costa_Rica
America/Creston
America/Cuiaba
America/Curacao
America/Danmarkshavn
America/Dawson
America/Dawson_Creek
America/Denver
America/Detroit
America/Dominica
America/Edmonton
America/Eirunepe
America/El_Salvador
America/Fort_Nelson
America/Fortaleza
America/Glace_Bay
America/Goose_Bay
America/Grand_Turk
America/Grenada
America/Guadeloupe
America/Guatemala
America/Guayaquil
America/Guyana
America/Halifax
America/Havana
America/Hermosillo
America/Indiana/Indianapolis
America/Indiana/Knox
America/Indiana/Marengo
America/Indiana/Petersburg
America/Indiana/Tell_City
America/Indiana/Vevay
America/Indiana/Vincennes
America/Indiana/Winamac
America/Inuvik
America/Iqaluit
America/Jamaica
America/Juneau
America/Kentucky/Louisville
America/Kentucky/Monticello
America/Kralendijk
America/La_Paz
America/Lima
America/Los_Angeles
America/Lower_Princes
America/Maceio
America/Managua
America/Manaus
America/Marigot
America/Martinique
America/Matamoros
America/Mazatlan
America/Menominee
America/Merida
America/Metlakatla
America/Mexico_City
America/Miquelon
America/Moncton
America/Monterrey
America/Montevideo
America/Montserrat
America/Nassau
America/New_York
America/Nipigon
America/Nome
America/Noronha
America/North_Dakota/Beulah
America/North_Dakota/Center
America/North_Dakota/New_Salem
America/Nuuk
America/Ojinaga
America/Panama
America/Pangnirtung
America/Paramaribo
America/Phoenix
America/Port-au-Prince
America/Port_of_Spain
America/Porto_Velho
America/Puerto_Rico
America/Punta_Arenas
America/Rainy_River
America/Rankin_Inlet
America/Recife
America/Regina
America/Resolute
America/Rio_Branco
America/Santarem
America/Santiago
America/Santo_Domingo
America/Sao_Paulo
America/Scoresbysund
America/Sitka
America/St_Barthelemy
America/St_Johns
America/St_Kitts
America/St_Lucia
America/St_Thomas
America/St_Vincent
America/Swift_Current
America/Tegucigalpa
America/Thule
America/Thunder_Bay
America/Tijuana
America/Toronto
America/Tortola
America/Vancouver
America/Whitehorse
America/Winnipeg
America/Yakutat
America/Yellowknife
Antarctica/Casey
Antarctica/Davis
Antarctica/DumontDUrville
Antarctica/Macquarie
Antarctica/Mawson
Antarctica/McMurdo
Antarctica/Palmer
Antarctica/Rothera
Antarctica/Syowa
Antarctica/Troll
Antarctica/Vostok
Arctic/Longyearbyen
Asia/Aden
Asia/Almaty
Asia/Amman
Asia/Anadyr
Asia/Aqtau
Asia/Aqtobe
Asia/Ashgabat
Asia/Atyrau
Asia/Baghdad
Asia/Bahrain
Asia/Baku
Asia/Bangkok
Asia/Barnaul
Asia/Beirut
Asia/Bishkek
Asia/Brunei
Asia/Chita
Asia/Choibalsan
Asia/Colombo
Asia/Damascus
Asia/Dhaka
Asia/Dili
Asia/Dubai
Asia/Dushanbe
Asia/Famagusta
Asia/Gaza
Asia/Hebron
Asia/Ho_Chi_Minh
Asia/Hong_Kong
Asia/Hovd
Asia/Irkutsk
Asia/Jakarta
Asia/Jayapura
Asia/Jerusalem
Asia/Kabul
Asia/Kamchatka
Asia/Karachi
Asia/Kathmandu
Asia/Khandyga
Asia/Kolkata
Asia/Krasnoyarsk
Asia/Kuala_Lumpur
Asia/Kuching
Asia/Kuwait
Asia/Macau
Asia/Magadan
Asia/Makassar
Asia/Manila
Asia/Muscat
Asia/Nicosia
Asia/Novokuznetsk
Asia/Novosibirsk
Asia/Omsk
Asia/Oral
Asia/Phnom_Penh
Asia/Pontianak
Asia/Pyongyang
Asia/Qatar
Asia/Qostanay
Asia/Qyzylorda
Asia/Riyadh
Asia/Sakhalin
Asia/Samarkand
Asia/Seoul
Asia/Shanghai
Asia/Singapore
Asia/Srednekolymsk
Asia/Taipei
Asia/Tashkent
Asia/Tbilisi
Asia/Tehran
Asia/Thimphu
Asia/Tokyo
Asia/Tomsk
Asia/Ulaanbaatar
Asia/Urumqi
Asia/Ust-Nera
Asia/Vientiane
Asia/Vladivostok
Asia/Yakutsk
Asia/Yangon
Asia/Yekaterinburg
Asia/Yerevan
Atlantic/Azores
Atlantic/Bermuda
Atlantic/Canary
Atlantic/Cape_Verde
Atlantic/Faroe
Atlantic/Madeira
Atlantic/Reykjavik
Atlantic/South_Georgia
Atlantic/St_Helena
Atlantic/Stanley
Australia/Adelaide
Australia/Brisbane
Australia/Broken_Hill
Australia/Darwin
Australia/Eucla
Australia/Hobart
Australia/Lindeman
Australia/Lord_Howe
Australia/Melbourne
Australia/Perth
Australia/Sydney
Canada/Atlantic
Canada/Central
Canada/Eastern
Canada/Mountain
Canada/Newfoundland
Canada/Pacific
Europe/Amsterdam
Europe/Andorra
Europe/Astrakhan
Europe/Athens
Europe/Belgrade
Europe/Berlin
Europe/Bratislava
Europe/Brussels
Europe/Bucharest
Europe/Budapest
Europe/Busingen
Europe/Chisinau
Europe/Copenhagen
Europe/Dublin
Europe/Gibraltar
Europe/Guernsey
Europe/Helsinki
Europe/Isle_of_Man
Europe/Istanbul
Europe/Jersey
Europe/Kaliningrad
Europe/Kiev
Europe/Kirov
Europe/Lisbon
Europe/Ljubljana
Europe/London
Europe/Luxembourg
Europe/Madrid
Europe/Malta
Europe/Mariehamn
Europe/Minsk
Europe/Monaco
Europe/Moscow
Europe/Oslo
Europe/Paris
Europe/Podgorica
Europe/Prague
Europe/Riga
Europe/Rome
Europe/Samara
Europe/San_Marino
Europe/Sarajevo
Europe/Saratov
Europe/Simferopol
Europe/Skopje
Europe/Sofia
Europe/Stockholm
Europe/Tallinn
Europe/Tirane
Europe/Ulyanovsk
Europe/Uzhgorod
Europe/Vaduz
Europe/Vatican
Europe/Vienna
Europe/Vilnius
Europe/Volgograd
Europe/Warsaw
Europe/Zagreb
Europe/Zaporozhye
Europe/Zurich
GMT
Indian/Antananarivo
Indian/Chagos
Indian/Christmas
Indian/Cocos
Indian/Comoro
Indian/Kerguelen
Indian/Mahe
Indian/Maldives
Indian/Mauritius
Indian/Mayotte
Indian/Reunion
Pacific/Apia
Pacific/Auckland
Pacific/Bougainville
Pacific/Chatham
Pacific/Chuuk
Pacific/Easter
Pacific/Efate
Pacific/Fakaofo
Pacific/Fiji
Pacific/Funafuti
Pacific/Galapagos
Pacific/Gambier
Pacific/Guadalcanal
Pacific/Guam
Pacific/Honolulu
Pacific/Kanton
Pacific/Kiritimati
Pacific/Kosrae
Pacific/Kwajalein
Pacific/Majuro
Pacific/Marquesas
Pacific/Midway
Pacific/Nauru
Pacific/Niue
Pacific/Norfolk
Pacific/Noumea
Pacific/Pago_Pago
Pacific/Palau
Pacific/Pitcairn
Pacific/Pohnpei
Pacific/Port_Moresby
Pacific/Rarotonga
Pacific/Saipan
Pacific/Tahiti
Pacific/Tarawa
Pacific/Tongatapu
Pacific/Wake
Pacific/Wallis
US/Alaska
US/Arizona
US/Central
US/Eastern
US/Hawaii
US/Mountain
US/Pacific
UTC
Save
Europe/Bratislava
English (United States)
English (United Kingdom)
English (United States)
Español (España)
Français (France)
Polski (Polska)
Português (Brasil)
Türkçe (Türkiye)
Монгол (Монгол)
Українська (Україна)
中文 (中国)
Login
PRACE Seasonal School on Bioinformatics
from
Monday, April 23, 2018 (8:00 AM)
to
Thursday, April 26, 2018 (6:00 PM)
Monday, April 23, 2018
9:00 AM
Opening
-
Lukáš Demovič
(Computing Center of the SAS)
Opening
Lukáš Demovič
(Computing Center of the SAS)
9:00 AM - 9:30 AM
9:30 AM
Introduction to bioinformatics 1/2
-
Erik Bongcam Rudloff
(Swedish University of Agricultural Sciences)
Introduction to bioinformatics 1/2
Erik Bongcam Rudloff
(Swedish University of Agricultural Sciences)
9:30 AM - 10:30 AM
10:30 AM
Coffee break
Coffee break
10:30 AM - 11:00 AM
11:00 AM
Introduction to bioinformatics 2/2
-
Erik Bongcam Rudloff
(Swedish University of Agricultural Sciences)
Introduction to bioinformatics 2/2
Erik Bongcam Rudloff
(Swedish University of Agricultural Sciences)
11:00 AM - 12:00 PM
12:00 PM
Lunch
Lunch
12:00 PM - 1:30 PM
1:30 PM
Parallel programming with R (1/2)
-
Sirkiä Seija
(Centre fo Scientific Computing - CSC, Finland)
Parallel programming with R (1/2)
Sirkiä Seija
(Centre fo Scientific Computing - CSC, Finland)
1:30 PM - 3:15 PM
This lecture is aimed at R users with very limited or no experience in parallel computing. You will learn how and when taking advantage of parallel computing can help you run your R scripts in less time, when not, and how to tell the difference. More importantly, you will get an idea of how to approach parallelising your task in practice. We will consider Intel Math kernel library (MKL) together with Microsoft R Open, and R packages snow and foreach, both used as backend by various Bioconductor and CRAN packages. Lecture will include live coding demos. Prerequisites: experience in using R for data analysis in research
3:15 PM
Coffee break
Coffee break
3:15 PM - 3:45 PM
3:45 PM
Parallel programming with R (2/2)
-
Sirkiä Seija
(Centre fo Scientific Computing - CSC, Finland)
Parallel programming with R (2/2)
Sirkiä Seija
(Centre fo Scientific Computing - CSC, Finland)
3:45 PM - 5:30 PM
This lecture is aimed at R users with very limited or no experience in parallel computing. You will learn how and when taking advantage of parallel computing can help you run your R scripts in less time, when not, and how to tell the difference. More importantly, you will get an idea of how to approach parallelising your task in practice. We will consider Intel Math kernel library (MKL) together with Microsoft R Open, and R packages snow and foreach, both used as backend by various Bioconductor and CRAN packages. Lecture will include live coding demos. Prerequisites: experience in using R for data analysis in research
Tuesday, April 24, 2018
9:00 AM
Analysing large datasets with Apache Spark (1/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (1/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
9:00 AM - 10:30 AM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
10:30 AM
Coffee break
Coffee break
10:30 AM - 11:00 AM
11:00 AM
Analysing large datasets with Apache Spark (2/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (2/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
11:00 AM - 12:30 PM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
12:30 PM
Lunch
Lunch
12:30 PM - 2:00 PM
2:00 PM
Analysing large datasets with Apache Spark (3/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (3/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
2:00 PM - 3:30 PM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
3:30 PM
Coffee break
Coffee break
3:30 PM - 4:00 PM
4:00 PM
Analysing large datasets with Apache Spark (4/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (4/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
4:00 PM - 5:30 PM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
7:00 PM
7:00 PM - 9:30 PM
Wednesday, April 25, 2018
9:00 AM
Analysing large datasets with Apache Spark (5/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (5/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
9:00 AM - 10:30 AM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
10:30 AM
Coffee break
Coffee break
10:30 AM - 11:00 AM
11:00 AM
Analysing large datasets with Apache Spark (6/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (6/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
11:00 AM - 12:30 PM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
12:30 PM
Lunch
Lunch
12:30 PM - 2:00 PM
2:00 PM
Analysing large datasets with Apache Spark (7/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (7/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
2:00 PM - 3:30 PM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
3:30 PM
Coffee break
Coffee break
3:30 PM - 4:00 PM
4:00 PM
Analysing large datasets with Apache Spark (8/8)
-
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
Analysing large datasets with Apache Spark (8/8)
Apurva Nandan
(Centre fo Scientific Computing - CSC, Finland)
4:00 PM - 5:30 PM
With the rapid growth in data volume that is being used in data analysis tasks, it gets more and more challenging for the user to process it using standard methods. Enter Spark, a high-performance distributed computing framework, which allows us to tackle big-data problems by distributing the workload across a cluster of machines. This two day course discusses the advantage of cloud computing for big data based computing, why should you use Spark for big data analysis and why should you care about running Spark on cloud. Next, the technical architechture and use cases of Spark, some ways to set it up, best practices and programming aspects. The first day includes the overview, architectural concepts, programming with Spark's fundamental data structure (RDD) and basics of Machine Learning with Spark. The second day focuses on the SQL module of Spark, which allows the user to analyse data using Spark's distributed collection (Dataframes) by using the traditional SQL queries, best practices when using spark, demo of a working Spark cluster, using Spark Streaming over a live twitter data. Spark can be an ideal platform for bioinformatics when it comes to building analysis pipelines and workflows. Spark supports languages such as R, Python, and SQL which eases the learning for practicing bioinformaticians. Spark is constantly growing with new libraries for bioinformatics analysis, although widespread usage will take sometime because the traditional methods need some rewriting in Spark. But, with the community constantly evolving, it is good chance to learn Spark and implement your own methods in it, for doing large scale data analysis.
Thursday, April 26, 2018
9:00 AM
BLAST (1/2)
-
Kimmo Mattila
(Centre fo Scientific Computing - CSC, Finland)
BLAST (1/2)
Kimmo Mattila
(Centre fo Scientific Computing - CSC, Finland)
9:00 AM - 10:30 AM
Running BLAST in Clusters NCBI BLAST is one of the most of the most frequently used bioinformatics tools. BLAST answers to the question: “What known sequences are significantly similar to my sample sequence”. Answers to this question is needed in numerous bioinformatics analyses and work-flows. As the sequence databases keep growing as well as the sizes of the data sets to be analyzed, a HPC cluster environment is often needed for BLAST analyses. In this half a day session we briefly go though the basic features of BLAST and issues related to maintaining and using BLAST in HPC cluster environments.
10:30 AM
Coffee break
Coffee break
10:30 AM - 11:00 AM
11:00 AM
BLAST (2/2)
-
Kimmo Mattila
(Centre fo Scientific Computing - CSC, Finland)
BLAST (2/2)
Kimmo Mattila
(Centre fo Scientific Computing - CSC, Finland)
11:00 AM - 12:40 PM
Running BLAST in Clusters NCBI BLAST is one of the most of the most frequently used bioinformatics tools. BLAST answers to the question: “What known sequences are significantly similar to my sample sequence”. Answers to this question is needed in numerous bioinformatics analyses and work-flows. As the sequence databases keep growing as well as the sizes of the data sets to be analyzed, a HPC cluster environment is often needed for BLAST analyses. In this half a day session we briefly go though the basic features of BLAST and issues related to maintaining and using BLAST in HPC cluster environments.
12:40 PM
Closing remarks
-
Lukáš Demovič
(Computing Center of the SAS)
Closing remarks
Lukáš Demovič
(Computing Center of the SAS)
12:40 PM - 1:00 PM
1:00 PM
Lunch
Lunch
1:00 PM - 2:30 PM