微软的Andy Beach谈论机器学习和媒体

蒂姆Siglin: 欢迎回到流媒体西部2017. 我和我的好朋友安迪·比奇在一起. 告诉我你最近在忙什么?

安迪海滩: I'm still working at Microsoft 和 one of the areas that I've been really interested in is how we're exploring machine learning 和 media, so that's what I wanted to give a talk about this year at 流媒体, 来讨论一些不同的选择.

蒂姆Siglin: 事实上是这样的, 我请来了IBM云视频公司的Scott Grizzle, 和, 当然, 他们用沃森做了很多尝试做机器学习, 深度学习, 诸如此类的事情. Is the focus primarily to do this through cloud 和 big data combinations, or how does that work?

安迪海滩: The approach that we've taken is to make machine learning available 和 accessible to anybody that needs it. 如果你是一个数据科学家，你知道R, 有很多方法可以训练你自己的模型, but there're also ways that if you're just a developer 和 you want to include machine learning, we have specific APIs that perform a function from some sort of train model 和 you can implement that API to get the thing you want, 无论是面部识别还是字幕转录, 或者类似的东西.

蒂姆Siglin: So in those cases, you're doing speech-to-text 和 computer vision as part of the machine learning?

安迪海滩: 完全, 然后, 如果你不是开发人员, 但你仍然想要获得这类信息, 然后我们甚至将其产品化, 作为你的媒体服务, 吸收这些元素的能力, upload your content 和 give back 和 interactive player that has the facial recognition widgets in it, or a full transcript of all of your audio that flows next the player as it's playing, 并且允许你在飞行中把它翻译成其他语言.

蒂姆Siglin: What's fascinating to me is having done work with what we use to call index 和 search 和 retrieval way back with some of the companies that did that stuff on st和-alone devices, 从本质上讲，现在你所做的就是利用云的力量, 和 also the distributed big data tables that you get from doing a lot of the analytics. Do people have a way to score correct what the audio transcript shows, 因为我们都知道他们并不完美?

安迪海滩: 所有出来的东西都有某种能力得分, 和 there are abilities to tune that over time 和 confirm how correct it is, or you can go in 和 edit things within your content that need correcting 和 it adapts 和 learns from those corrections.

蒂姆Siglin: 有趣的. 你的目标是一个特定的垂直市场吗? When I worked in Europe through a Framework Package Six Project that had a bunch of guys from the Lurn out in Housby who did naturally speaking. They could do really well saying legal 和 medical because the terminology was very distinct, but generic or general conversation was much more difficult; so how are you guys approaching that?

安迪海滩: There's just a sort of a baseline API when you're talking about the cognitive services piece of it where it's just trying to contextually make sense of the words that it sees, 基于它周围的词语. So, we're trying to underst和 what something is in relationship to the paragraph or something else that's there, 和 that helps frankly a lot with the accuracy; because it's gonna underst和 the difference between certain terminology that might get used because it's putting it into a context.

蒂姆Siglin: Are there specific libraries or market verticals that you're going after? 比如法律，比如医疗?

安迪海滩: 你知道的，它非常开放. I think there are both enterprise applications 和 surveillance 和 educational tracts that are using it. 但, we have entertainment partners who are also using the same services to create functionality today.

蒂姆Siglin: 好的,漂亮的. 你还在做什么? 显然机器学习不是你唯一在做的事情.

安迪海滩: I finally got to actually do some big video projects in the last couple months, which were the first sort of transcoding projects that I've worked on in years 和 it was like working on old muscle memory; pulling back terminology 和 things. 所以这是一件令人兴奋的事情, 但是与此相关, another one of the big areas that I've discovered become important with what I'm doing from a sort of infrastructure perspective to video is I'm doing a lot more around high-scale data. Taking all those data points that we pull out through machine learning or through video player interactions 和诸如此类的事情, 和 how do you put it somewhere 然后 very quickly slice 和 dice it to expose certain trends that you see. I've had to learn a lot more around how containers fit into this 和 how you create large-scale data bases. It's things that I never imagined that I was going to be working with--I was a video jockey at the end of the day. 但 now I'm learning all these new elements 和 it’s kind of exciting.

蒂姆Siglin: 我们都知道元数据是什么 ... 你说的是段落中单词的语境, the metadata itself around a container 和 a format inherently can help you constrain down to particular decision points. 如果它是MPEG-2传输流, more than likely it's gonna only have one or two codecs in there; versus if it's something that's WebM, 它可能不会将AVC作为格式的一部分. As always, thanks for coming 和 stopping by, 和 have a great show.