Κυριακή, 28 Ιουλίου 2019

Η συναρπαστική διαδρομή της 28χρονη Μαρίας από την Intel σε Google, Microsoft, Salesforce, Netflix!

Στα 20 και κάτι βρήκε λύση σε άλυτο πρόβλημα των Intel επαξεργαστών
Όταν το πάθος σου και η ειδικότητά σου είναι οι αλγόριθµοι Artificial Intelligence, όταν µπορείς να βρεις τη λύση σε άλυτο πρόβληµα των Intel επεξεργαστών, τότε µπορείς να συνεργαστείς µε την Google, τη Microsoft, τη Salesforce αλλά και τη Netflix.
Αυτή είναι η περίπτωση της Μαρίας Δημακοπούλου.
Το 2009 εισήλθε πρώτη στη Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών του ΕΜΠ. Από το 2012 έως το 2015, παράλληλα µε τις σπουδές της, εργαζόταν στην Google στο Παρίσι και στη Νέα Υόρκη στα τµήµατα Operations Research και Ad Exchange αντίστοιχα.
Το 2013 βραβεύθηκε για τη λύση της σε άλυτο πρόβληµα των Ιntel επεξεργαστών, η οποία βελτίωσε χιλιάδες υπολογιστές. Το 2014 τελείωσε τις σπουδές της µε απόλυτο 10, πρώτη φορά στην ιστορία του ΕΜΠ. Το 2015 ανέπτυξε αλγορίθµους που αύξησαν σηµαντικά τα εισοδήµατα του Ad Exchange της Google. Την ίδια χρονιά έγινε δεκτή για διδακτορικές σπουδές στο Stanford στον τοµέα Reinforcement Learning, µε πλήρη υποτροφία από το πανεπιστήµιο. Ολοκλήρωσε το διδακτορικό της το 2018 σε 3,5 χρόνια, σπάνιο για τα αµερικανικά δεδοµένα.
Κατά τη διάρκεια του διδακτορικού της εργάστηκε στις εταιρείες Salesforce και Microsoft. Η διατριβή της πρότεινε τη συνεργατική µέθοδο που επιτρέπει σε οµάδες ροµπότ να επιτύχουν στόχους πρωτοφανούς πολυπλοκότητας. Από το 2018 είναι Senior Research Scientist στην ερευνητική οµάδα της Netflix και είναι υπεύθυνη για τους αλγορίθµους Reinforcement Learning και Causal Inference πίσω από τα Personalized Recommendations της εταιρείας στους χρήστες της. Έχει τιµηθεί µε σηµαντικά βραβεία για τη σταδιοδροµία της, συµπεριλαµβανοµένων των “Intel Innovation Award” και “Google Anita Borg Memorial Award” το 2014 και “Stanford Outstanding Academic Achievement Award” το 2016 και το 2019.
Mεθοδολογία: Από τις συνολικά 100 αρχικές υποψηφιότητες που έλαβε το “Forbes” –είτε μέσω αιτήσεων των ίδιων των ενδιαφερομένων είτε μέσω προτάσεων από ιδρύματα και σχετικούς φορείς–, η συντακτική ομάδα έκανε μια πρώτη διαλογή και κατέληξε σε 50. Η κριτική επιτροπή, η οποία αποτελούνταν από τους κ. Σπύρο Θεοδωρόπουλο (διευθύνοντα σύμβουλο της Chipita), Νίκο Καραμούζη (πρώην πρόεδρο της Eurobank και της Ελληνικής Ένωσης Τραπεζών, πρόεδρο της Grant Thornton), Μιχάλη Μπλέτσα (διευθυντή Πληροφορικής στο Media Lab του MIT), Θεοχάρη Φιλιππόπουλο (πρόεδρο του Capital.gr, πρόεδρο των Αττικών Εκδόσεων), βαθμολόγησε τις 50 υποψηφιότητες με βαθμολογίες από το 1 (χαμηλότερο) έως το 4 (υψηλότερο), με βάση τις οποίες προέκυψαν οι “30”. Σε ό,τι αφορά τη σειρά δημοσίευσης, είναι αλφαβητική, ενώ έχει ομαδοποιηθεί με βάση τη βαθμολογία.

Η σελίδα της Μαρίας Δημακοπούλου στο Stanford
Since 2018, I am a Senior Research Scientist at Netflix and my research focuses on the reinforcement learning, contextual bandit and causal inference methods behind Netflix’s personalized recommendations. Previously, I obtained my PhD in reinforcement learning and causal inference at Stanford University, where I was advised by Benjamin Van Roy and Susan Athey and received the Stanford Outstanding Academic Achievement Award.
In 2015, I received a MSc in Operations Research at Stanford University, graduating first of my class. In 2014, I received a BSc and MSc in Electrical Engineering & Computer Science from National Technical University of Athens (NTUA), Greece, graduating with the highest GPA in NTUA’s 200-year history. Between 2012 and 2015, I spent time at Google Research, where I worked on the design and deployment of large-scale optimization algorithms for Google Technical Infrastructure and Google Ad Exchange. In 2016, I led the design and launched the multi-touch attribution product of Krux (now Salesforce Einstein). In 2018, I joined the Machine Learning group at Microsoft Research NYC, where I worked with Miroslav Dudik and Robert Schapire on reinforcement learning decomposition and off-policy evaluation for high-dimensional contextual bandits.
I have been honored with the Google Anita Borg Memorial Award, the Google Excellence Award, the Intel Honorary Award, the “Arvanitidis Stanford Graduate Fellowship in Memory of William K. Linvill, the Onassis Foundation Graduate Fellowship, the Stanford Outstanding Academic Achievement Award two times, and the Forbes 30 Under 30 Greece.
I enjoy swimming, travelling across the world with great company, and exploring impressionist and surrealist art.

Research

Marginal Posterior Sampling for Slate Bandits

Maria Dimakopoulou, Nikos Vlassis, Tony Jebara (IJCAI 2019)
We introduce a new Thompson sampling-based algorithm, called marginal posterior sampling, for online slate bandits, that is characterized by three key ideas. First, it postulates that the slate-level reward is a monotone function of the marginal unobserved rewards of the actions in the slate’s slots, which it does not attempt to estimate. Second, instead of maintaining a slate-level reward posterior, it maintains posterior distributions for the marginal reward of each slot’s actions. Third, it optimizes at the slot-level rather than the slate-level, which makes it computationally efficient. We demonstrate substantial advantages of marginal posterior sampling over alternative approaches that are widely used in the domain of web services.
[Paper]

On the Design of Estimators for Bandit Off-Policy Evaluation

Nikos Vlassis, Aurelien Bibaut,
Maria Dimakopoulou, Tony Jebara (ICML 2019)

Off-policy evaluation is the problem of estimating the value of a target policy using data collected under a different policy. We describe a framework for designing estimators for bandit off-policy evaluation. Given a base estimator and a parametrized class of control variates, we seek a control variate in that class that reduces the risk of the base estimator. We derive the population risk as a function of the class parameters and we discuss some approaches for optimizing this function. We present our main results in the context of multi-armed bandits, and we decribe a simple design for contextual bandits that gives rise to an estimator that is shown to perform well in multi-class cost-sensitive classification datasets.
[Paper]

Balanced Linear Contextual Bandits

Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens (AAAI 2019; oral)
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of bias. We prove that our algorithms match the state of the art regret bound guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data.
[Paper][Poster]

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

Maria Dimakopoulou, Ian Osband, Benjamin Van Roy (NeurIPS 2018)
We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, the approach is competitive with previously proposed tabular model learning methods (Dimakopoulou and Van Roy, 2018). With a higher-dimensional problem and a neural network value function representation, the approach learns quickly with far fewer agents than alternative exploration schemes.

Coordinated Exploration in Concurrent Reinforcement Learning

Maria Dimakopoulou, Benjamin Van Roy (ICML 2018; long talk)
We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment, while sharing data in real-time. We identify three properties that are essential to efficient coordinated exploration: real-time adaptivity to shared observations, commitment to carry through with action sequences that reveal new information, and diversity across learning opportunities pursued by different agents. We demonstrate that optimism-based approaches fall short with respect to diversity, while naive extensions of Thompson sampling lack commitment. We propose seed sampling that offers a general approach to designing effective coordination algorithms for concurrent reinforcement learning and has substantial advantages over alternative exploration schemes.

Estimation Considerations in Contextual Bandits

Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens
We study a new consideration for the exploration vs. exploitation framework which is that the way exploration is conducted in the present may affect the bias and variance in the potential outcome model estimation in subsequent stages of learning. We show that contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We propose new contextual bandit designs, combining parametric and nonparametric statistical estimation methods with causal inference methods in order to reduce the estimation bias and provide empirical evidence that guides the choice among the alternatives in different scenarios.

Market-based dynamic service mode switching in wireless networks

Maria Dimakopoulou, Nicholas Bambos, Martin Valdez-Vivas, John Apostolopoulos (PIMRC 2017)
We consider a virtualized wireless networking architecture, where infrastructure access points of different carriers form a marketplace of resources and bid service deals to a mobile device. At each point in time the mobile evaluates the available service deals and dynamically decides which one to accept and use in the next transmission interval. Its objective is to minimize the long term cumulative service cost and latency cost to transmit packets in its buffer. We develop a model of this architecture, which allows for the formulation and computation of the optimal control for the mobile to accept an offered deal amongst many and switch into the corresponding service mode. The performance of the optimal and low-complexity heuristic controls is probed via simulation.

Reliable and Efficient Performance Monitoring in Linux

Maria Dimakopoulou, Stéphane Eranian, Nectarios Koziris, Nicholas Bambos (Supercomputing 2016)
We address a published eratum in the Performance Monitoring Unit (PMU) of Intel Sandy Bridge, Ivy Bridge and Haswell processors with hyper-threading enabled which causes cross hyper-thread hardware counter corruption and may produce unreliable results. We propose a cache-coherence style protocol, which we implement in the Linux kernel to address the issue by introducing cross hyper-thread dynamic event scheduling. Additionally, we improve event scheduling efficiency by introducing a bipartite graph matching algorithm which optimally schedules events onto hardware counters consistently. The improvements have been contributed to the upstream Linux kernel v4.1.

News

10/9/2018

I defended my PhD thesis!
The committee of my defense was Benjamin Van Roy, Susan Athey, Emma Brunskill, Balaji Prabhakar and Guido Imbens.

9/4/2018

The paper “Scalable Coordinated Exploration in Concurrent Reinforcement Learning” has been accepted to NIPS 2018.

8/26/2018

A new demo has been uploaded showcasing seed sampling with generalization from the paper
“Scalable Coordinated Exploration in Concurrent Reinforcement Learning”.

7/11/2018

The long talk I gave in ICML 2018 on “Coordinated Exploration in Concurrent Reinforcement Learning”. [Slides] [Video]

5/25/2018

The slides from my seminar talk at Netflix can be found here.

3/5/2018

An animated demo has been uploaded showcasing seed sampling from the paper
“Coordinated Exploration in Concurrent Reinforcement Learning”.

2/2/2018

From June to September 2018, I will join the Machine Learning group at Microsoft Research N

Πηγή: https://kourdistoportocali.com/