Prerequisites and co-requisites |
Advanced Topics in Big Data:
- Introduction to Data Mining Algorithms / Methods and the tool WEKA
- relational databases and SQL
- basic concepts of statistics.
Semantic Web:
- Programming experience
- Mathematial logik
Security:
- Fundamentals of applied cryptology: symmetric vs. asymmetrical; hashing; signatures, certificates etc.
- basic skills in handling the Linux console.
- basic knowledge and skills of kerberos;
- basic skills in handling a windows System
- basic knowledge of computer networking
- basic knowledge of security protocols (e.g. TLS)
|
Course content |
Part: Advanced Topics in Big Data
- Big data analytics (e.g. analysis, (b) observation problems, outlier detection)
- Spark ecosystem for data analysis tasks Introduction to the differences / similarities of Hadoop and Spark, as well as possibilities to integrate both platforms.
- First experiences in the application of Hadoop and Spark for data analysis tasks.
- Requirements for the documentation of big data projects and communication of the project results.
Part: Semantics and Ontologies
- Restrictions of the Web
- Layers of semantic Web
- Semantic modeling and ontologies
- Knowledge bases
- Linked open data Representation of facts using the Resource Description Framework (RDF)
- Different notations in RDF and the representation of statements
- The model theory behind RDFS
- Problematic constructs in RDF
- Modeling semantics using RDF schema
- Logic calculi as the basis for semantic deduction
- Description-logic and decision-making
- The Web Ontology Language (OWL) and its features
- Semantic web engineering
Part: Security
- Motivation Infrastructure security: basic "hygiene"
- authentication and authorization
- data protection
- auditing
- identity and access management
- Mandatory Access Control (MAC), Role Based Access Control
- Uncertain data sources: problems, solutions e.g. Functional encryption
- attribute-based encryption
- Cloud security
- Information security analytics: IDS, IPS, SIEM, aggregated logs
- Security of che Hadoop (Sentry, Kerberos, Knox, Ranger...)
|
Learning outcomes |
Students are able to
- carry out different analysis tasks on structured / unstructured data and document the results.
- assess big data processing platforms with regard to security aspects and implement solutions for security problems.
- scale data horizontally (merge data from different data sources, domains, data models, and structures).
- describe analytical methods for different problems, assess the advantages / disadvantages and make a selection based on performance criteria and implement these using the appropriate tools (Spark, MLlib, MADLib, Weka).
- explain the concepts of semantic web and interpret statements with RDF, RDFS, OWL.
- create and expand knowledge databases.
- explain the importance of semantics in the context of big data projects, especially when combining data from different data sources, domains, data models, and structures.
- exchange data between different knowledge databases. explain the principles of knowledge databases and linked open data and explain the Open World Approach.
- interrogate knowledge databases and interact with such a knowledge databases using a framework.
- classify the principles of OWL ontologies, the different types of relationships and the implications of the first-order predicate logic used.
- use semantic web tools, e.g. apply Protégé.
- classify the importance of the security of the underlying infrastructure for data analysis and apply technologies for the improvement of big data systems.
- select and implement a method for authentication and authorization from multiple methods.
- explain the problems of unsafe data sources and select and propose solutions based on cryptology techniques, e.g. functional encryption.
- reflect the problems of using public clouds and propose solutions.
- use the potential of data analytics to improve the security of complex networked systems and to analyze heterogeneous data from distributed sensors (e.g. IDS, SIEM).
|
Planned learning activities and teaching methods |
- Advanced topics in Big Data: integrated lecture
- Semantics and Ontologies: integrated lecture
- Security: lecture and labs
|
Assessment methods and criteria |
Continual assessment.
- Advanced Topics in Big Data: assessment of exercises
- Semantics and Ontologies: written exam, assessment of exercises and project results
- Security: written exam; mandatory attendance for exercises
|
Comment |
Security: mandatory attendance for exercises |
Recommended or required reading |
- Allemang, Dean ; Hendler, James A (2011): Semantic web for the working ontologist effective modeling in RDFS and OWL, second edition. Waltham, Mass.: Morgan Kaufmann.
- Antoniou, G ; Groth, Paul ; Van Harmelen, Frank (2012): A Semantic Web primer, third edition. Cambridge, Mass.: MIT Press.
- Dietrich, David u. a. (Hrsg.) (2015): Data science & big data analytics: discovering, analyzing, visualizing and presenting data. Indianapolis, IN: Wiley.
- DuCharme, Bob (2013): Learning SPARQL.
- Fensel, Dieter (2003): Spinning the semantic Web: bringing the World Wide Web to its full potential. Cambridge, Mass.: MIT Press.
- Hebeler, John (2009): Semantic Web programming Includes index. - Description based on print version record. Indianapolis, Ind.: Wiley Pub..
- Hitzler, Pascal (2008): Semantic web: Grundlagen. Berlin: Springer Berlin.
- Hitzler, Pascal ; Krötzsch, Markus ; Rudolph, Sebastian (2010): Foundations of Semantic Web technologies. Boca Raton: CRC Press.
- Karau, Holden u. a. (2015): Learning Spark. First edition. Beijing; Sebastopol: O’Reilly.
- Pellegrini, Tassilo; Blumauer, Andreas (Hrsg.) (2006): Semantic Web: die vernetzte Wissensgesellschaft. Berlin: Springer.
- Ryza, Sandy u. a. (2015): Advanced analytics with Spark. First edition. Beijing; Sebastopol, CA: O’Reilly.
- Segaran, Toby ; Taylor, Jamie, Evans, Colin (2009): Programming the Semantic Web. Beijing; Sebastopol, CA: O’Reilly.
- Spivey, Ben ; Echeverria, Joey (2015): Hadoop security. First edition. Beijing; Sebastopol: O’Reilly.
- Talabis, Mark (2014): Information security analytics: finding security insights, patterns and anomalies in big data. 1st edition. Waltham, MA: Elsevier.
- White, Tom (2015): Hadoop: the definitive guide. Fourth edition. Beijing: O’Reilly.
- Witten, I. H. ; Frank, Eibe ; Hall, Mark A. (2011): Data mining: practical machine learning tools and techniques. 3rd ed. Burlington, MA: Morgan Kaufmann. (= Morgan Kaufmann series in data management systems).
- Yu, Liyang (2011): A developer’s guide to the semantic web. Heidelberg; New York: Springer.
|
Mode of delivery (face-to-face, distance learning) |
Face-to-face |